0

I'm very, very new to python and am looking for the best way to lookup and change the column header of a json api (that I have normalized via pandas) with the values from a second api from the same source. For brevity I've not posted the whole of the code, just the sections to do with the data.

name_df (lookup table)

name = response.json()
df = json_normalize(name['fields'],sep="_")
name_df = df[df.columns[1:3]]
print(name_df.head()) 

        key                    label
0  field_18                Client_ID
1  field_16              Client Name
2  field_19              Client Code
3  field_26              Client Type
4  field_70              Third Party
etc...

data_df (data table)

data = response.json()
json_normalize(data)
data_df = json_normalize(data['records'],sep="_")
print(data_df.head())

field_16   field_16_raw   field_18   field_18_raw   field_19  
Amazon     Amazon         123        123            AMZ       
WallMart   WallMart       888        888            WLM       
etc...

data_df.columns.values

['field_16' 'field_16_raw' 'field_18' 'field_18_raw' 'field_19'
 'field_19_raw' 'field_26' 'field_26_raw' 'field_422' 'field_70'
 'field_70_raw' 'field_71' 'field_71_raw']

My required output is

Client Name   Client Name_raw   Client_ID_18   Client_ID_18_raw   Client Code
Amazon        Amazon             123           123                AMZ       
WallMart      WallMart           888           888                WLM       
etc...

I have been researching methods, however I'm not quite at the stage of being able to adapt the answers I've found to my use case - especially with the complexity of the _raw value appearing in the data table but not the lookup table. I can see the shape of the solution here, however it's using lookups to rows rather than columns so when I try to adapt it's not working.

Elegant way to replace values in pandas.DataFrame from another DataFrame

I'm also struggling to find a way to replace the variables of field_X rather than have to name the columns specifically. And I'm at the point now of overcomplicating where I am positive there must be a simple answer.

Thanks in advance for any advice

CatParky
  • 189
  • 4
  • 18

1 Answers1

0

This is mainly for me in the future (waves at me)

So I used a combination of the knackpy client from https://github.com/cityofaustin/knackpy and pandas to join the field headings to the title then to reorder the columns format the dates and output to csv. Here is my redacted python code:

## https://github.com/cityofaustin/knackpy
from knackpy import Knack
import json
import pandas as pd
from pandas.io.json import json_normalize

#  download data from Knack
#  will fetch records in chunks of 1000 until all records have been downloaded
#  optionally pass a rows_per_page and/or page_limit parameter to limit record count
kn = Knack( 
        obj='object_XX',
        app_id='REDACTED',
        api_key='REDACTED',
        tzinfo='Europe/London',
        page_limit=10,  #  this is the default
        rows_per_page=1000  #  this is the default
    )

#Set file paths
offline_filepath = 'C:/Directory Name/Folder Name/Save Location/'
filename = 'KNACK_filename here with no ext'

#Save the raw JSON file to Drive Backup
with open(offline_filepath+filename+'.json', 'w') as outfile:
    json.dump(kn.data, outfile)

#Flatten the data
data_df = json_normalize(kn.data,sep="_")

#Order the columns
#data_df = data_df[['Field ID','Employee','Department','Date of Birth','id']]
#data_df.sort_values(by=['Field ID'], inplace=True)

#Convert the date formats from UNIX to Date
#data_df['Date of Birth'] = pd.to_datetime(data_df['Date of Birth'],unit='ms')

#Convert and export to CSV
#Save to Drive Backup
data_df.to_csv(offline_filepath+filename+'.csv',index=False,date_format="%Y-%m-%d")

I set this script up in a folder for each of the objects in my knack database and use a batch file to run all the scripts in the folder in order.

How to create a batch file to run all python files in the same directory

cd /D C:\Directory\Folder Name\Script Folder

@echo off

for %%i in (*.py) do start "" /b /wait python "%%i"

Hope this helps anyone !

CatParky
  • 189
  • 4
  • 18