Python, How to organize a big function that relies on many external data to work

Question

Question: How to organize a big function that relies on many external data to work. should I declare a class and contain those external data? or should I keep the big function and its data in one file? Or there are better ways of doing it?what's the most computationally efficient way? what's the most pythonic, recommended way?

I have a log file to parse, and the log file contains many formats of strings. I wrote a parseLine(inputStr) function to deal with all possible formats. The parseLine() function requires many precompiled regexes, and a quite big dictionary for lookups. I kept the parseLine() function in a file parseLineFile.py

My parseLineFile.py looks like:

regex0 = re.compile('foo')
regex1 = re.compile('bar')
# and many more regexes

set0 = {'f', '0'}
set1 = {'b', 'a'} # could be a big set contains 10s of strings
# and many more sets

def parseLine(inputString, inputDictionary, inputTimeCriteria):
    # pseduo code:
    #   use regex0 to extract date info in inputString
    #   check if date within inputTimeCriteria
    #   use more of previous declared regexes and sets to extract more info, 
    #       branch out to different routines to use more regexes and sets to extract more info
    #   finally use inputDictionary to look up the meaning of extracted info    
    #   return results in some data structure

In my Main code, I import parseLineFile.py
build myDictionary, decide mytimeCriteria and then use parseLine() to parse a file line by line.

I feel that my question is ... not stack-overflow-ic, but if you are to leave a comment of how I should ask a narrower/specific question, that's great! but please also at least mention how you would approach my problem.

I don't think anyone can usefully discuss the computational efficiency of your code when you've not actually shown the relevant bits of it. — Blckknght, Apr 09 '16 at 02:01
Keeping it all in its own .py file is a reasonable choice. Its contained in its own module namespace. No need to move it to a class unless you need to keep multiple instances with their own private data. Stick with what you have now. — tdelaney, Apr 09 '16 at 02:27
@tdelaney, thanks that's the impression I got from what I read from other places as well. Now I can confirm that. But some external data like the "inputDictionary" has to be decided during run time, so it can not be packaged in the .py file. I am not sure how big of an impact is it going to be if I pass a big data structure by reference to my function everytime I run my function. I have an uneducated guess that if I can access "inputDictionary" as a self.myDictionary internally, it would be faster. — YunliuStorage, Apr 09 '16 at 02:39
Not to worry. A big data structure passed by reference is very fast. A simple string or a `dict` with millions of entries have the same reference overhead. — tdelaney, Apr 09 '16 at 02:47

score 0 · Accepted Answer · answered Apr 09 '16 at 02:05

0

It's hard to specifically tell you what you should do for this specific function, but some tips in regards to organizing big functions:

First, identify what conditionals can be moved to their own function. For example, let's say you have this code:

if 'foo' in inputString:
   line = regex()
   line = do_something_else()
elif 'bar' in inputString
   line = regex()
   line = do_something_a_little_different()

You can easily see one abstraction you could do here, and that's to move the functionality in each if block to its own function, so you would create parseFoo and parseBar functions which take a line, and return an expected value.

The main benefit of this is now you have extremely simple functions to unit test with!

Other things I watch out for are:

Are you do many nesting of conditionals? Extract into a function and return early, to reduce nesting
If you're repeating yourself with different inputs, extract into a function
Mentally scan the function a day later and see if I still get it quite easily. If not, extract into smaller bits.

Anyways, more input from you would be ideal but I hope that helps to get you started!

answered Apr 09 '16 at 02:05

Bartek

13,879
1
53
64

I see my folly here... I didn't question the right way. Thank you for your advice to break up big functions, I think it is the right thing to do. My question is more like: should I make this function not a file, but a class? should I put the regexes sets and dictionary as internal variable of the class, or should I just pass them as arguments. Or any other possibilities of organizing this big function – YunliuStorage Apr 09 '16 at 02:12
@YunliuStorage: Your comment here shows exactly why this isn't a good question – tom10 Apr 09 '16 at 02:20
I am aware of that, the most "help" I get from this community are about how bad my questions are. Good questions are usually google-able. – YunliuStorage Apr 09 '16 at 02:32
@YunliuStorage: if you know this type of question isn't a good match for SO, then please stop asking them here. *SO isn't intended to match every programming related educational need you might have*, and these questions diminish the value of the site. For example, it takes a long time to read a question and determine that it can't be answered without a long discussion, and you wasted many peoples' time with this. – tom10 Apr 09 '16 at 12:38
I got my answer, I cleared my misconceptions. My questions diminish the site, and your kind diminish my estimation for this site. – YunliuStorage Apr 10 '16 at 04:38

Python, How to organize a big function that relies on many external data to work

1 Answers1