I have a script that collects JSON data from Twitter's API. This script collects data and parses it with jq
every minute. This data gets collected into a single file that ends up looking like the following:
[
{"text": "Tweet 01",
"id": "001"
},
{"text": "Tweet 02",
"id": "002"
},
{"text": "Tweet 03",
"id": "003"
}
]
[
{"text": "Tweet 04",
"id": "004"
},
{"text": "Tweet 05",
"id": "005"
},
{"text": "Tweet 06",
"id": "006"
},
{"text": "Tweet 07",
"id": "007"
},
{"text": "Tweet 08",
"id": "008"
}
]
[
{"text": "Tweet 09",
"id": "009"
},
{"text": "Tweet 10",
"id": "010"
}
]
I've previously had a single list of JSON data per file, and Pandas easily can work with one list in a file. But how can I efficiently iterate over these multiple lists, that are NOT comma-separated and are NOT of necessarily the same length?
My ultimate goal is to aggregate ALL the JSON data from this one file and convert it to a CSV file, where each column is a key in the JSON data. It should end up looking like:
text, id
Tweet 01, 001
Tweet 02, 002
Tweet 03, 003
Tweet 04, 004
Tweet 05, 005
Tweet 06, 006
Tweet 07, 007
Tweet 08, 008
Tweet 09, 009
Tweet 10, 010
If I were to try reading the file anyway, the following occurs:
>>> import pandas as pd
>>> df = pd.read_json("sample.json")
>>> df.head()
Traceback (most recent call last):
File "lists.py", line 3, in <module>
df = pd.read_json("sample.json")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/util/_decorators.py", line 214, in wrapper
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 608, in read_json
result = json_reader.read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 731, in read
obj = self._get_object_parser(self.data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 753, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 857, in parse
self._parse_no_numpy()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1089, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data