1

I am trying to write a function in python that opens a file and parses it into a dictionary. I am trying to make the first item in the list block the key for each item in the dictionary data. Then each item is supposed to be the rest of the list block less the first item. For some reason though, when I run the following function, it parses it incorrectly. I have provided the output below. How would I be able to parse it like I stated above? Any help would be greatly appreciated.

Function:

def parseData() :
    filename="testdata.txt"
    file=open(filename,"r+")

    block=[]
    for line in file:
        block.append(line)
        if line in ('\n', '\r\n'):
            album=block.pop(1)
            data[block[1]]=album
            block=[]
    print data

Input:

Bob Dylan
1966 Blonde on Blonde
-Rainy Day Women #12 & 35
-Pledging My Time
-Visions of Johanna
-One of Us Must Know (Sooner or Later)
-I Want You
-Stuck Inside of Mobile with the Memphis Blues Again
-Leopard-Skin Pill-Box Hat
-Just Like a Woman
-Most Likely You Go Your Way (And I'll Go Mine)
-Temporary Like Achilles
-Absolutely Sweet Marie
-4th Time Around
-Obviously 5 Believers
-Sad Eyed Lady of the Lowlands

Output:

{'-Rainy Day Women #12 & 35\n': '1966 Blonde on Blonde\n',
 '-Whole Lotta Love\n': '1969 II\n', '-In the Evening\n': '1979 In Through the Outdoor\n'}

2 Answers2

3

You can use groupby to group the data using the empty lines as delimiters, use a defaultdict for repeated keys extending the rest of the values from each val returned from groupby after extracting the key/first element.

from itertools import groupby
from collections import defaultdict
d = defaultdict(list)
with open("file.txt") as f:
    for k, val in groupby(f, lambda x: x.strip() != ""):
        # if k is True we have a section
       if k:
            # get key  "k" which is the first line
           # from each section, val will be the remaining lines
           k,*v = val
           # add or add to the existing key/value pairing
           d[k].extend(map(str.rstrip,v))
from pprint import pprint as pp
pp(d)

Output:

{'Bob Dylan\n': ['1966 Blonde on Blonde',
                 '-Rainy Day Women #12 & 35',
                 '-Pledging My Time',
                 '-Visions of Johanna',
                 '-One of Us Must Know (Sooner or Later)',
                 '-I Want You',
                 '-Stuck Inside of Mobile with the Memphis Blues Again',
                 '-Leopard-Skin Pill-Box Hat',
                 '-Just Like a Woman',
                 "-Most Likely You Go Your Way (And I'll Go Mine)",
                 '-Temporary Like Achilles',
                 '-Absolutely Sweet Marie',
                 '-4th Time Around',
                 '-Obviously 5 Believers',
                 '-Sad Eyed Lady of the Lowlands'],
 'Led Zeppelin\n': ['1979 In Through the Outdoor',
                    '-In the Evening',
                    '-South Bound Saurez',
                    '-Fool in the Rain',
                    '-Hot Dog',
                    '-Carouselambra',
                    '-All My Love',
                    "-I'm Gonna Crawl",
                    '1969 II',
                    '-Whole Lotta Love',
                    '-What Is and What Should Never Be',
                    '-The Lemon Song',
                    '-Thank You',
                    '-Heartbreaker',
                    "-Living Loving Maid (She's Just a Woman)",
                    '-Ramble On',
                    '-Moby Dick',
                    '-Bring It on Home']}

For python2 the unpack syntax is slightly different:

with open("file.txt") as f:
    for k, val in groupby(f, lambda x: x.strip() != ""):
        if k:
            k, v = next(val), val
            d[k].extend(map(str.rstrip, v))

If you want to keep the newlines remove the map(str.rstrip..

If you want the album and songs separately for each artist:

from itertools import groupby
from collections import defaultdict

d = defaultdict(lambda: defaultdict(list))
with open("file.txt") as f:
    for k, val in groupby(f, lambda x: x.strip() != ""):
        if k:
            k, alb, songs = next(val),next(val), val
            d[k.rstrip()][alb.rstrip()] = list(map(str.rstrip, songs))

from pprint import pprint as pp

pp(d)



{'Bob Dylan': {'1966 Blonde on Blonde': ['-Rainy Day Women #12 & 35',
                                         '-Pledging My Time',
                                         '-Visions of Johanna',
                                         '-One of Us Must Know (Sooner or '
                                         'Later)',
                                         '-I Want You',
                                         '-Stuck Inside of Mobile with the '
                                         'Memphis Blues Again',
                                         '-Leopard-Skin Pill-Box Hat',
                                         '-Just Like a Woman',
                                         '-Most Likely You Go Your Way '
                                         "(And I'll Go Mine)",
                                         '-Temporary Like Achilles',
                                         '-Absolutely Sweet Marie',
                                         '-4th Time Around',
                                         '-Obviously 5 Believers',
                                         '-Sad Eyed Lady of the Lowlands']},
 'Led Zeppelin': {'1969 II': ['-Whole Lotta Love',
                              '-What Is and What Should Never Be',
                              '-The Lemon Song',
                              '-Thank You',
                              '-Heartbreaker',
                              "-Living Loving Maid (She's Just a Woman)",
                              '-Ramble On',
                              '-Moby Dick',
                              '-Bring It on Home'],
                  '1979 In Through the Outdoor': ['-In the Evening',
                                                  '-South Bound Saurez',
                                                  '-Fool in the Rain',
                                                  '-Hot Dog',
                                                  '-Carouselambra',
                                                  '-All My Love',
                                                  "-I'm Gonna Crawl"]}}
Padraic Cunningham
  • 160,756
  • 20
  • 201
  • 286
  • Surely it should be: `d[k].append(list(map(str.rstrip, v)))`? Otherwise, you have to reparse the lists to find all the albums. And why do the keys have trailing newlines? A `defaultdict(dict)` would probably be better, and then the for loop block becomes simply: `i = list(map(str.rstrip, val)); if len(i) > 1: d[i[0]][i[1]] = i[2:]`. – ekhumoro Jun 07 '15 at 16:37
  • @ekhumoro, the second part of the answer adds the albums and songs as individual elements, I used extend in the first as initially I did not see the relationship between the data , the newline is there because I forget to rstrip which the OP can add if required, I also don't think `if len(i) > 1: d[i[0]][i[1]] = i[2:]` is more readable than what I used or why I would use list(map(str.rstrip, val)) unless I actually knew there was something to add which is what `if k` does. – Padraic Cunningham Jun 07 '15 at 16:49
  • Creating a single list means you can both strip all the parts and check its length (which provides protection against incomplete records). I don't want to make too much of all this, though - I was just making some suggestions for improvements that I thought the OP might want. – ekhumoro Jun 07 '15 at 17:45
  • Would there be a way to remove the trailing newlines in the artist and album names? I am sorry, I forgot to specify that. – user4959809 Jun 07 '15 at 17:46
  • @ekhumoro, I am not even sure what the output format should be as there was no expected output provided, there should be enough in the answer for the OP to piece together in whatever format they desire – Padraic Cunningham Jun 07 '15 at 18:30
  • 1
    @user4959809, I updated the second part of the answer to strip – Padraic Cunningham Jun 07 '15 at 18:35
2

I guess this is what you want?

Even if this is not the format you wanted, there are a few things you might learn from the answer:

And SE does not like a list being continued by code...

#!/usr/bin/env python

""""Parse text files with songs, grouped by album and artist."""


def add_to_data(data, block):
    """
    Parameters
    ----------
    data : dict
    block : list

    Returns
    -------
    dict
    """
    artist = block[0]
    album = block[1]
    songs = block[2:]
    if artist in data:
        data[artist][album] = songs
    else:
        data[artist] = {album: songs}
    return data


def parseData(filename='testdata.txt'):
    """
    Parameters
    ----------
    filename : string
        Path to a text file.

    Returns
    -------
    dict
    """
    data = {}
    with open(filename) as f:
        block = []
        for line in f:
            line = line.strip()
            if line == '':
                data = add_to_data(data, block)
                block = []
            else:
                block.append(line)
        data = add_to_data(data, block)
    return data

if __name__ == '__main__':
    data = parseData()
    import pprint
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(data)

which gives:

{   'Bob Dylan': {   '1966 Blonde on Blonde': [   '-Rainy Day Women #12 & 35',
                                                  '-Pledging My Time',
                                                  '-Visions of Johanna',
                                                  '-One of Us Must Know (Sooner or Later)',
                                                  '-I Want You',
                                                  '-Stuck Inside of Mobile with the Memphis Blues Again',
                                                  '-Leopard-Skin Pill-Box Hat',
                                                  '-Just Like a Woman',
                                                  "-Most Likely You Go Your Way (And I'll Go Mine)",
                                                  '-Temporary Like Achilles',
                                                  '-Absolutely Sweet Marie',
                                                  '-4th Time Around',
                                                  '-Obviously 5 Believers',
                                                  '-Sad Eyed Lady of the Lowlands']},
    'Led Zeppelin': {   '1969 II': [   '-Whole Lotta Love',
                                       '-What Is and What Should Never Be',
                                       '-The Lemon Song',
                                       '-Thank You',
                                       '-Heartbreaker',
                                       "-Living Loving Maid (She's Just a Woman)",
                                       '-Ramble On',
                                       '-Moby Dick',
                                       '-Bring It on Home'],
                        '1979 In Through the Outdoor': [   '-In the Evening',
                                                           '-South Bound Saurez',
                                                           '-Fool in the Rain',
                                                           '-Hot Dog',
                                                           '-Carouselambra',
                                                           '-All My Love',
                                                           "-I'm Gonna Crawl"]}}
Community
  • 1
  • 1
Martin Thoma
  • 91,837
  • 114
  • 489
  • 768