0

I have two data frames with data, now I want to merge the fields of the second data frame to first. If I call the index of the first data frame needs to get all the subject and topic names like below. Can anyone help me with this?

import pandas as pd
sub_data = {'Subjectid':['10','11'],'Author':['Author1','Author2'],'SubjectName':['Maths', 'English']}
df1 = pd.DataFrame(sub_data)
print(df1)

topic_data = {'Topicid':['100','101','102'],'Subjectid':['10','10','11'],'TopicName':['Geometry','Trignometry', 'Tenses']}
df2 = pd.DataFrame(topic_data)
print(df2)

subtopic_data = {'Subtopicid':['1000','1001','1002'],'Topicid':['100','101','102'],'Subjectid':['10','10','11'],'SubtopicTopicName':['Lines','Angles', 'PresentTenses']}
df3 = pd.DataFrame(subtopic_data)
print(df3)

Desired Output:

    Author    SubjectName   topicid   TopicName       Subopicid SubtopicName
10  Author1       Maths     100       Geometry        1000       Lines
10  Author1       Maths     100       Trignometry     1001      Angles

2 Answers2

1

Use DataFrame.merge with convert index to column by DataFrame.reset_index and column to index by DataFrame.set_index:

df = df1.merge(df2.reset_index().set_index('Subjectid'), left_index=True, right_index=True)
print (df)
     Author SubjectName index    TopicName
10  Author1       Maths   100     Geometry
10  Author1       Maths   101  Trignometry
11  Author2     English   102       Tenses

Solution with changed data in question:

df = df1.merge(df2, on='Subjectid').set_index('Subjectid').rename_axis(None)
print (df)
     Author SubjectName Topicid    TopicName
10  Author1       Maths     100     Geometry
10  Author1       Maths     101  Trignometry
11  Author2     English     102       Tenses

For merge DataFrame df3:

df = (df1.merge(df2, on='Subjectid').set_index('Subjectid')
         .merge(df3, on=['Topicid','Subjectid']))
print (df)
    Author Subjectid SubjectName Topicid    TopicName Subtopicid  \
0  Author1        10       Maths     100     Geometry       1000   
1  Author1        10       Maths     101  Trignometry       1001   
2  Author2        11     English     102       Tenses       1002   

  SubtopicTopicName  
0             Lines  
1            Angles  
2     PresentTenses  

Last for filtr only Math rows use boolean indexing:

df4 = df[df['SubjectName'] == 'Maths']
print (df4)
    Author Subjectid SubjectName Topicid    TopicName Subtopicid  \
0  Author1        10       Maths     100     Geometry       1000   
1  Author1        10       Maths     101  Trignometry       1001   

  SubtopicTopicName  
0             Lines  
1            Angles  
jezrael
  • 629,482
  • 62
  • 918
  • 895
1

You can use merge:

pd.merge(df1, df2, on ='Subjectid').set_index('Subjectid')

            Author  SubjectName Topicid TopicName
Subjectid               
10          Author1 Maths       100     Geometry
10          Author1 Maths       101     Trignometry
11          Author2 English     102     Tenses
Allen
  • 16,421
  • 4
  • 42
  • 54