0

I am trying to implement the minimal finite state transducer described by Mihov and Maurel (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698) in Python3. My program is working but needs a lot of time, so I decided to use cProfile to determine the time-consuming parts. This was the result:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  103.844  103.844 {built-in method builtins.exec}
        1    0.000    0.000  103.844  103.844 <string>:1(<module>)
        1    0.095    0.095  103.843  103.843 fst.py:16(__init__)
     3049    0.007    0.000  103.639    0.034 fst.py:154(find_minimized)
     3049    3.388    0.001  103.427    0.034 fst.py:178(member)
  2765927    5.173    0.000   99.582    0.000 {built-in method builtins.all}
 52885795   54.989    0.000   94.861    0.000 fst.py:184(<genexpr>)
100430746   21.104    0.000   21.104    0.000 fst.py:217(transition)
 94725124   18.815    0.000   18.815    0.000 fst.py:224(output)
     2248    0.179    0.000    0.205    0.000 fst.py:190(copy_state)
     3160    0.026    0.000    0.030    0.000 fst.py:251(clear_state)
     2248    0.014    0.000    0.015    0.000 fst.py:140(new_state)
    11888    0.011    0.000    0.013    0.000 fst.py:231(set_output)
     7494    0.007    0.000    0.013    0.000 fst.py:265(left_string_division)
    11593    0.007    0.000    0.009    0.000 fst.py:124(longest_common_prefix)
     8498    0.007    0.000    0.009    0.000 fst.py:148(set_transition)
    39099    0.006    0.000    0.006    0.000 {method 'add' of 'set' objects}
     7118    0.003    0.000    0.004    0.000 fst.py:205(state_output)
    50320    0.004    0.000    0.004    0.000 {built-in method builtins.len}
    13660    0.003    0.000    0.003    0.000 fst.py:245(final)
     9256    0.002    0.000    0.002    0.000 {method 'pop' of 'dict' objects}
     3442    0.001    0.000    0.002    0.000 fst.py:238(set_final)
     2248    0.001    0.000    0.001    0.000 {method 'keys' of 'dict' objects}
      704    0.001    0.000    0.001    0.000 {method 'split' of 'str' objects}
     3160    0.001    0.000    0.001    0.000 {method 'discard' of 'set' objects}
      547    0.000    0.000    0.000    0.000 fst.py:211(set_state_output)
      112    0.000    0.000    0.000    0.000 fst.py:131(new_temp_state)
      112    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So the time-consuming part seems to be the following function (which represents Lemma 3 in Mihov & Maurel) :

def member(self,state) :
    for test_state in self.states - set(self.temp_states) :
        if (((state in self.final_states and test_state in self.final_states) or \
            (state not in self.final_states and test_state not in self.final_states)) and \
            (not(state in self.final_states) or (self.state_output(state) == self.state_output(test_state))) and \
            (all((self.transition(state,sym) == self.transition(test_state,sym) and \
            self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))) :
            return test_state
    return False

The following part of the if-condition seems to be responsible for the inefficiency:

(all((self.transition(state,sym) == self.transition(test_state,sym) and \
self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))

So my question now is how I can look more efficiently for an equivalent state...

Bob96
  • 47
  • 6
  • a general principle: fail early', logic tests 'shortcircuit' - so the cheapest tests that eliminate the most cases should be up front, also looks like the code repeats some `in` tests with same args - refactor logic to avoid or assign vars with single test result, use var's values in repeated tests – f5r5e5d Mar 11 '18 at 22:31
  • for fast membership tests use `set` or `dictionary` - I assume a state machine has unique states? sometimes the overhead of a single conversion to a `set` is worthwhile for multiple membership searches even if the `set` type doesn't fit the larger use of the data object – f5r5e5d Mar 11 '18 at 22:50

0 Answers0