I've trying to join 3 dataframes, however I'm having problems to do it. Let me show the scenario.
I have 3 dataframes:
First: Country
just two columns: Country_ID, Country_Name Primary Key: Country_ID
Country_ID | Country_Name
BR | Brazil
Second: Sports
just three columns: Country ID, Sport_ID, Sport_Name Primary Key: Country_ID, Sport_ID
Country_ID | Sport_ID| Sport_Name
BR | 1234 | Football
Third: University
just four columns: Country_ID, University_ID, University_Name Primary Key: Country_ID, University_ID
Country_ID | University_ID| University_Name
BR | UFCABC | Federal University of ABC
Final Result: just these columns: Country_Name, Sport_Name, University_Name
Country_Name | Sport_Name | University_Name
Brazil | Football | Federal university of Brazil
I tried to join ContryXSport and after that with university DataFrame, however I wasn't able to do it.
Here is the code for creation and join the dataframes:
country_raw_data = {
'country_id': [country.id for country in countries],
'country_name': [country.name for country in countries] }
sport_raw_data = {
'country_id': [sport.country.id for sport in sports],
'sport_id': [sport.id for sport in sports],
'sport_name': [sport.name for sport in sports]
}
university_raw_data = {
'country_id': [university.country.id for university in universities],
'university_id': [university.state.id for university in universities],
'university_name': [university.name for university in universities]
}
Now, the dataframe instances:
I tried to create df like this:
country_df = pd.DataFrame(country_raw_data, columns: ['country_id', 'country_name'])
I don't know why, the country_df was create wrong, some columns didn't appear correct with the right values. Then I had to create like this, it is working.
country_df = pd.DataFrame(country_raw_data)
sport_df = pd.DataFrame(sport_raw_data)
university_df = pd.DataFrame(university_raw_data)
Here, is the joins declarations:
I tried to do like this, however the result didn't join correct. Some columns didn't include the data frame correctly.
country_state_df = pd.merge(country_df, state_df, on='country_id', how='inner')
Another code tht I did it, however I had the same problem before:
country_sport_df = pd.merge(country_df, sport_df,
left_on='country_id',
right_on='sport_id',
how='inner')
So, after the first join, I did the next join between country_state with city
country_sport_university.df = pd.merge(country_sport_df, university_df,
on=['country_id', 'country_id'],
how='inner')
I'd like the final result must be like these columns:
country_name | Sport_Name | University_Name
Brazil | Football | Federal University of ABC
it is possible to do it using dataframe, or I need to use another libraries?
So, there are a lot of data, around millions of data, for example.
Can anyone help my or give me a suggestion to solve the problem?
Thank you very much!