7

I'm a Python newbie trying to parse a file to make a table of memory allocations. My input file is in the following format:

48 bytes allocated at 0x8bb970a0
24 bytes allocated at 0x8bb950c0
48 bytes allocated at 0x958bd0e0
48 bytes allocated at 0x8bb9b060
96 bytes allocated at 0x8bb9afe0
24 bytes allocated at 0x8bb9af60    

My first objective is to make a table that counts the instances of a particular number of byte allocations. In other words, my desired output for the above input would be something like:

48 bytes -> 3 times
96 bytes -> 1 times
24 bytes -> 2 times

(for now, I'm not concerned about the memory addresses)

Since I'm using Python, I thought doing this using a dictionary would be the right way to go (based on about 3 hours' worth of reading Python tutorials). Is that a good idea?

In trying to do this using a dictionary, I decided to make the number of bytes the 'key', and a counter as the 'value'. My plan was to increment the counter on every occurrence of the key. As of now, my code snippet is as follows:

# Create an empty dictionary
allocationList = {}

# Open file for reading
with open("allocFile.txt") as fp: 
    for line in fp: 
        # Split the line into a list (using space as delimiter)
        lineList = line.split(" ")

        # Extract the number of bytes
        numBytes = lineList[0];

        # Store in a dictionary
        if allocationList.has_key('numBytes')
            currentCount = allocationList['numBytes']
            currentCount += 1
            allocationList['numBytes'] = currentCount
        else
            allocationList['numBytes'] = 1 

for bytes, count in allocationList.iteritems()
    print bytes, "bytes -> ", count, " times"

With this, I get a syntax error in the 'has_key' call, which leads me to question whether it is even possible to use variables as dictionary keys. All examples I have seen so far assume that keys are available upfront. In my case, I can get my keys only when I'm parsing the input file.

(Note that my input file can run into thousands of lines, with hundreds of different keys)

Thank you for any help you can provide.

Gautam
  • 929
  • 10
  • 19
  • as i see you quoted 'numBytes', so, you are always referring to constant – dmitry Nov 28 '11 at 09:15
  • and you omitted colon in lines after `if allocationList.has_key('numBytes')` and `else` - it should be syntax error – dmitry Nov 28 '11 at 09:17

4 Answers4

10

Learning a language is as much about the syntax and basic types as it is about the standard library. Python already has a class that makes your task very easy: collections.Counter.

from collections import Counter

with open("allocFile.txt") as fp:
    counter = Counter(line.split()[0] for line in fp)

for bytes, count in counter.most_common():
    print bytes, "bytes -> ", count, " times"
Petr Viktorin
  • 58,535
  • 6
  • 72
  • 77
  • I feel your answer is more true than anyone elses here – Jakob Bowyer Nov 28 '11 at 09:39
  • 2
    +1: If you are only interested in the count, `Counter` is the way to go. On the other hand, the OP wrote: *for now, I'm not concerned about the memory addresses* --- I suppose he might sooner or later need a custom solution that goes beyond `Counter`. – Ferdinand Beyer Nov 28 '11 at 10:19
  • Thank you very much for this solution. I tried it, but it didn't work. This is because Counter is available only for Python > 2.7, and I'm using 2.6.4. But it led me to: http://stackoverflow.com/questions/3594514/how-to-find-most-common-elements-of-a-list, and here I found a way to solve my problem. But I'm marking this answer as the solution, because this is probably the best way of solving the problem. – Gautam Nov 28 '11 at 10:37
4

The dict.has_key() method of dictionnary has disappeared in python3, to replace it, use the in keyword :

if numBytes in allocationList:    # do not use numBytes as a string, use the variable directly
    #do the stuff

But in your case, you can also replace all the

if allocationList.has_key('numBytes')
            currentCount = allocationList['numBytes']
            currentCount += 1
            allocationList['numBytes'] = currentCount
        else
            allocationList['numBytes'] = 1 

with one line with get:

allocationList[numBytes] = allocationList.get(numBytes, 0) + 1
Cédric Julien
  • 69,378
  • 13
  • 112
  • 121
4

You get a syntax error because you are missing the colon at the end of this line:

if allocationList.has_key('numBytes')
                                     ^

Your approach is fine, but it might be easier to use dict.get() with a default value:

allocationList[numBytes] = allocationList.get(numBytes, 0) + 1

Since your allocationList is a dictionary and not a list, you might want to chose a different name for the variable.

Ferdinand Beyer
  • 58,119
  • 13
  • 142
  • 141
  • Thanks. I had no clue about the ":". Just figured out that I also need one at the end of my 'for' statement. – Gautam Nov 28 '11 at 10:41
1

You most definitely can use variables as dict keys. However, you have a variable called numBytes, but are using a string containing the text "numBytes" - you're using a string constant, not the variable. That won't cause the error, but is a problem. Instead, try:

if numBytes in allocationList:
    # do stuff

Additionally, consider a Counter. This is a convenient class for handling the case you're looking at.

Michael J. Barber
  • 22,744
  • 8
  • 61
  • 84