Python - append comma separated lists to a file

Question

While making persistent API calls, I am looping over a large list in order to reorganize my data and save it to a file, like so:

for item in music:
    # initialize data container
    data = defaultdict(list)
    genre = item[0]
    artist= item[1]
    track= item[2]
    # in actual code, api calls happen here, processing genre, artist and track
    data['genre']= genre
    data['artist'] = artist
    data['track'] = track
    # use 'a' -append mode
    with open('data.json', mode='a') as f:
        f.write(json.dumps([data], indent=4))

NOTE: Since I have a window of one hour to make api calls (after which token expires), I must save data to disk on the fly, inside the for loop.

The method above appends data to data.json file, but my dumped lists are not comma separated and file ends up being populated like so:

[
  {
    "genre": "Alternative", 
    "artist": "Radiohead", 
    "album": "Ok computer"
  }
]
[
  {
    "genre": "Eletronic", 
    "artist": "Kraftwerk", 
    "album": "Computer World"
  }
]

So, how can I dump my data ending up with a list of lists separated by commas?

Your representation doesn't make sense. Either you want `{[...], [...]}`, or {...}\n{...}... so which is it? — cs95, May 25 '18 at 04:48
to avoid errors like `ValueError: Extra data: line 21452 column 2 - line 95339735 column 2 (char 649677 - 2869023268)` when an api call is made and returns no dictionary at all. — 8-Bit Borges, May 25 '18 at 04:49
@coldspeed any representation which can be indexed and retrieved later. — 8-Bit Borges, May 25 '18 at 04:52
Rule of thumb: If you're opening the same file in every iteration of your loop, you're doing something wrong. Start by building your result, then dump it to the file. Don't do both at the same time. — Aran-Fey, May 25 '18 at 04:53
data must be saved to file on a regular basis because I'm looping thru API results, whose connection breaks after some time and must be resumed from file's last entry. — 8-Bit Borges, May 25 '18 at 04:56
Don't indent your dumped json. Each line will then contain a valid json document that can be parsed independently. This is called the jsonl format. If you must indent then this answer will help you load your data https://stackoverflow.com/a/50384432/529630 — Dunes, May 25 '18 at 07:28

Rakesh · Answer 1 · 2018-05-25T06:53:51.753

0

One approach is to read the JSON file before writing to it.

Ex:

import json
for item in music:
    # initialize data container
    data = defaultdict(list)
    genre = item[0]
    artist= item[1]
    track= item[2]
    data['genre']= genre
    data['artist'] = artist
    data['track'] = track

    # Read JSON
    with open('data.json', mode='r') as f:
        fileData = json.load(f)
    fileData.append(data)

    with open('data.json', mode='w') as f:
        f.write(json.dumps(fileData, indent=4))

edited May 25 '18 at 06:53

answered May 25 '18 at 04:51

Rakesh

75,210
17
57
95

@data_garden Updated snippet. – Rakesh May 25 '18 at 06:55

score 0 · Answer 2 · answered May 25 '18 at 05:48

Something like this would work

import json

music = [['Alternative', 'Radiohead', 'Ok computer'], ['Eletronic', 'Kraftwerk', 'Computer World']]


output = list()

for item in music:
    data = dict()
    genre = item[0]
    artist= item[1]
    track= item[2]
    data['genre']= genre
    data['artist'] = artist
    data['track'] = track
    output.append(data)


with open('data.json', mode='a') as f:
        f.write(json.dumps(output, indent=4))

My data.json contains:

[
    {
        "genre": "Alternative", 
        "track": "Ok computer", 
        "artist": "Radiohead"
    }, 
    {
        "genre": "Eletronic", 
        "track": "Computer World", 
        "artist": "Kraftwerk"
    }
]

my problem here is that I cannot wait for loop to end in order to save it to file. it must be done while inside the for loop, like in the example. 'music' is just a simplification. in the actual code, there is an api call that processes music data, and I have a window of one hour until my token expires. so I must save data to disk while api calls perists. I cannot append to output AFTER one hour...I hope I made myself clear. — 8-Bit Borges, May 25 '18 at 05:58

8-Bit Borges · Answer 3 · 2018-05-26T03:26:59.637

For large datasets, pandas (for serializing) and pickle (for saving) work together like a charm.

df = pd.DataFrame()

for item in music:
    # initialize data container
    data = defaultdict(list)
    genre = item[0]
    artist= item[1]
    track= item[2]
    # in actual code, api calls happen here, processing genre, artist and track
    data['genre']= genre
    data['artist'] = artist
    data['track'] = track
    df = df.append(data, ignore_index=True)
    df.to_pickle('data.pkl')

Python - append comma separated lists to a file

3 Answers3