4

I'm new to pandas and python and am having trouble working through this. I've got a complex nexted json file I want to load into a pandas dataframe.

I'm using the following code:

import json
import urllib.request
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize

file_str = 'C:\\file.json'

with open(file_str, 'r', encoding="utf-8") as json_file:
   json_work = pd.read_json(json_file, typ='series', orient='columns')

for k, v in json_work.items():
    if v is None:
        json_work[k] = "N/A"

##df = pd.DataFrame.from_dict(json_work)

df = pd.io.json.json_normalize(json_work)

print(df)

As it's written I get this error:

Traceback (most recent call last):
  File "C:/.....hack.py", line 18, in <module>
    df = pd.io.json.json_normalize(json_work)
  File "C:\Users\scoe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\json.py", line 708, in json_normalize
    if any([isinstance(x, dict) for x in compat.itervalues(data[0])]):
  File "C:\Users\scoe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 175, in itervalues
    return iter(obj.values(**kw))
AttributeError: 'str' object has no attribute 'values'

if I swap these two lines to read

df = pd.DataFrame.from_dict(json_work)

##df = pd.io.json.json_normalize(json_work)

the process runs successfully but the result does not look like a dataframe. The output displays like this:

---- more lines above this, its a sample of the middle of the output ----
hrCenterName                                          KW App Development & Maint
hrSignatureLevel                                                              1H
hrSignatureLevelTitle                             Level 1 HR Signature Authority
imName                                                                         @
imProvider                                                                   N/A
...                                                                          ...
primaryOfficePhoneExtension                                                  N/A
---- more lines after this ----

What am I doing wrong?

Boud
  • 26,823
  • 8
  • 58
  • 72
S Coe
  • 73
  • 1
  • 6
  • Can you post an output of: `print(type(json_work))`? – MaxU Jan 23 '17 at 20:27
  • You can replace the loop `for k, v in json_work.items(): if v is None: json_work[k] = "N/A"` with simple `json_work.fillna("N/A")`. **But you don't even want to do that if all you want is to format it so that it renders nicely when printed, just [use the `df.to_string(... formatters` to define custom string-formatting](https://stackoverflow.com/a/50978365/202229), without needlessly wasting memory.** – smci May 24 '19 at 23:27

2 Answers2

6

json_normalize() expects dict or list of dicts...

try this:

with open(file_str, 'r', encoding="utf-8") as json_file:
   json_work = json.load(json_file)

...

df = pd.io.json.json_normalize(json_work)
MaxU
  • 173,524
  • 24
  • 290
  • 329
  • Thank you for that - I'm going to play around with that. It may have worked. The pd data is reference-able now. It does display a little weird when I print the whole dataframe but maybe that's an PyCharm setting SAMPLE alternativeContactName assignedToOrgCode assignedToOrgName \ 0 N/A R501 KW App Development & Maint badgeLabel buildingDescription buildingName \ 0 MITRE K Building K – S Coe Jan 23 '17 at 21:11
  • it was actually the utf-8 encoding that did the trick for me, thanks! – rimkashox May 20 '21 at 13:38
1

I usually use pandas to read JSON and it works just fine, give it a try:

pd.read_json(path)
epattaro
  • 1,942
  • 1
  • 14
  • 25