Junior python programmer here and I've been beating my head against a brick wall on unexpected for loop and dictionary behavior. I'm looping through a CSV file of log entries and parsing the data into a categories dict. When I initialize the categories dict each time through the loop, it works as expected..
Like so:
log_entries = AutoVivification()
# http://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionaries-in-python
def scrublooper(log_file):
for ll in log_file:
# Initialize categories dict every round through the loop
categories = {'requests': {'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 0, 'Pages': 0, 'Content_Files': 0}, 'filter_action': {'re': 0, 'pl': 0, 'bs': 0}}
lld = LogDomain(ll)
domain, hostname, lan_host = lld.domain, lld.hostname, lld.lan_host
mimetypes = url_searcher(Settings.mimetypes, lld.mime_type)
if mimetypes:
category = mimetypes[2]
if not log_entries[lan_host].has_key(domain):
log_entries[lan_host][domain]= categories
log_entries[lan_host][domain]['requests'][category] += 1
print log_entries['192.168.5.210']['google.com']['requests']
print log_entries['192.168.5.210']['webtrendslive.com']['requests']
print log_entries['192.168.5.210']['osnews.com']['requests']
print log_entries['192.168.5.210']['question-defense.com']['requests']
print log_entries['192.168.5.210']['optimost.com']['requests']
Output from this look is what I would expect:
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 95, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 1, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 2, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 18, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 3, 'Pages': 0, 'Content_Files': 0}
HOWEVER! Here is my problem. I don't want to initialize the categories dict every time through the loop. In this simplified example case it doesn't matter, but down the road for this program, it'll cause significant performance degradation (30%).
I need to initialize the categories dict ONCE:
log_entries = AutoVivification()
categories = {'requests': {'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 0, 'Pages': 0, 'Content_Files': 0}, 'filter_action': {'re': 0, 'pl': 0, 'bs': 0}}
def scrublooper(log_file):
for ll in log_file:
lld = LogDomain(ll)
# etc, etc, etc
However, when I initialize the categories dict ANYWHERE outside the for loop (whether in the scrublooper function or simply right after the log_entries variable), the output is:
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 685, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 685, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 685, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 685, 'Pages': 0, 'Content_Files': 0}
{'Content_Visual': 0, 'Content_ProgramsUpdates': 0, 'Content_Text': 685, 'Pages': 0, 'Content_Files': 0}
All 'Conent_Text' values have incremented equally! What is happening here? I'm sure I've violating some python principle but don't know what or how to find out. It took me hours simply to figure out the problem was connected to the categories dict.
Much obliged for any explanation.