Merge dataframes with loops

Question

Here is code for a master data frame loop I’m trying to create.

import requests
import pandas as pd

"""
from: http://stats.nba.com/league/player/#!/advanced/
"""

u_a = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36"

advanced = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
passing = "http://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Possessions&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
scoring = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Scoring&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="

url_list = [advanced,passing,scoring]

master_df = []
for i in url_list:
    r = requests.get(i, headers={"USER-AGENT":u_a})
    r.raise_for_status()

    headers = []
    for item in r.json()['resultSets']:
        for val in item['headers']:
            headers.append(val)
    df = []
    for item in r.json()['resultSets']:
        for row in item['rowSet']:
            row_df = []
            for val in row:
                row_df.append(val)
            df.append(row_df)

    master_df.append(df)

The loop works but it stacks each set of data on top of another. I want the data to merge so that identical columns don’t get copied and the new data from each JSON file is added in additional columns if that makes sense. I also want the header to only add a column name if it's new.

score 0 · Answer 1 · answered Nov 07 '16 at 23:48

You're not using the headers, and not creating dataframes.

Here's something that might be close to what you want, though I think you may want to end up with a list for each url (and then pd.concat these into a single dataframe before adding it to the master_df_list) since they don't seem to return the same data.

# Keeping your import statements etc as per your question
[...]

master_df_list = []

for i in url_list:

    # Option: Maybe here you may want to create a list
    # to concat before adding to master_df
    # url_df_list = []

    r = requests.get(i, headers={"USER-AGENT":u_a})
    r.raise_for_status()
    data = r.json()

    # Get the headers
    headers = data['resultSets'][0]['headers']
    # And the rowSet (whatever that is...)
    shot_data = data['resultSets'][0]['rowSet']

    # Create a beautiful df from that
    df = pd.DataFrame(shot_data,columns=headers)

    master_df_list.append(df)

    # Option:
    # url_df_list.append(df)
    # In which case you would concat too
    # df_concat = pd.concat(url_df_list)
    # master_df_list.append(df_concat)



# Concat
master_df = pd.concat(master_df_list)

score 0 · Answer 2 · answered Nov 08 '16 at 01:05

Consider a reduce(lambda..., pd.merge)) to merge across a list of dataframes:

from functools import reduce
...
url_list = [advanced,passing,scoring]

dfList = []
for i in url_list:    
    r = requests.get(i, headers={"USER-AGENT":u_a})
    r.raise_for_status()

    data = r.json()
    df = pd.DataFrame(data["resultSets"][0]["rowSet"],
                      columns=data["resultSets"][0]["headers"])    
    dfList.append(df)

finaldf = reduce(lambda left,right: pd.merge(left, right, 
                 on=['PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'TEAM_ABBREVIATION']), dfList)

Do note, any repeat fields such as Age, W, L (which do not fully appear in all dataframes) will be suffixed with _x, _y.

Merge dataframes with loops

2 Answers2