I am trying to split a column with an array of a list into multiple columns and create multiple rows. Below is the sample data:
signalid monthyear readings
5135 201901 [{"v":"90","q":1,"t":1546444800000,"g":0}]
5135 201901 [{"v":"50","q":1,"t":1546444900000,"g":0}]
5135 201901 [{"v":"40","q":1,"t":1546445800000,"g":0}]
5135 201901 [{"v":"30","q":1,"t":1546446800000,"g":0},{"v":"35","q":1,"t":1546446900000,"g":0}]
Details about data:
Data is fetched from the Cassandra database.
The column data type in Cassandra is text.
dataframe.readings[0]
'[{"v":"9817","q":1,"t":1548979150085,"g":0},{"v":"9821","q":1,"t":1548979151475,"g":0}]'
dataframe.readings.dtype
dtype('O')
Normalize code:
normalizeddataframe = json_normalize(data=dataframe, record_path='readings',
meta=["signalid", "monthyear"])
Expected output:
signalid monthyear v q t g
5135 201901 90 1 1546444800000 0
5135 201901 50 1 1546444900000 0
5135 201901 40 1 1546445800000 0
5135 201901 30 1 1546446800000 0
5135 201901 35 1 1546446900000 0
Result:
TypeError Traceback (most recent call last)
<ipython-input-13-486775fb3431> in <module>
60 #normalizeddataframe = pandas.read_json(dataframe.readings)
61 normalizeddataframe = json_normalize(data=dataframe, record_path='readings',
---> 62 meta=["signalid", "monthyear", "fromtime", "totime", "avg", "insertdate", "max", "min"])
63 #normalizeddataframe = json_normalize(pandas.Series.to_json(dataframe.readings))
64 print("Processig completed");
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
260 records.extend(recs)
261
--> 262 _recursive_extract(data, record_path, {}, level=0)
263
264 result = DataFrame(records)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\normalize.py in _recursive_extract(data, path, seen_meta, level)
236 else:
237 for obj in data:
--> 238 recs = _pull_field(obj, path[0])
239
240 # For repeating the metadata later
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\json\normalize.py in _pull_field(js, spec)
183 result = result[field]
184 else:
--> 185 result = result[spec]
186
187 return result
TypeError: string indices must be integers