219

I have the following dataframes:

> df1
  id begin conditional confidence discoveryTechnique  
0 278    56       false        0.0                  1   
1 421    18       false        0.0                  1 

> df2
   concept 
0  A  
1  B
   

How do I merge on the indices to get:

  id begin conditional confidence discoveryTechnique   concept 
0 278    56       false        0.0                  1  A 
1 421    18       false        0.0                  1  B

I ask because it is my understanding that merge() i.e. df1.merge(df2) uses columns to do the matching. In fact, doing this I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4618, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 58, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 491, in __init__
    self._validate_specification()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 812, in _validate_specification
    raise MergeError('No common columns to perform merge on')
pandas.tools.merge.MergeError: No common columns to perform merge on

Is it bad practice to merge on index? Is it impossible? If so, how can I shift the index into a new column called "index"?

David Buck
  • 3,439
  • 29
  • 24
  • 31
brucezepplin
  • 7,402
  • 18
  • 68
  • 115
  • 3
    try this: `df1.join(df2)` – MaxU Nov 07 '16 at 14:53
  • What if you want to join by the index of one dataframe and a column of the second dataframe. (My second dataframe has a column that matches up with the indeces in the first df.) – mikey Jun 15 '20 at 17:32

6 Answers6

435

Use merge, which is inner join by default:

pd.merge(df1, df2, left_index=True, right_index=True)

Or join, which is left join by default:

df1.join(df2)

Or concat, which is outer join by default:

pd.concat([df1, df2], axis=1)

Samples:

df1 = pd.DataFrame({'a':range(6),
                    'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
   a  b
a  0  5
b  1  3
c  2  6
d  3  9
e  4  2
f  5  4

df2 = pd.DataFrame({'c':range(4),
                    'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
   c   d
a  0  10
b  1  20
h  2  30
i  3  40

#default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
   a  b  c   d
a  0  5  0  10
b  1  3  1  20

#default left join
df4 = df1.join(df2)
print (df4)
   a  b    c     d
a  0  5  0.0  10.0
b  1  3  1.0  20.0
c  2  6  NaN   NaN
d  3  9  NaN   NaN
e  4  2  NaN   NaN
f  5  4  NaN   NaN

#default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
     a    b    c     d
a  0.0  5.0  0.0  10.0
b  1.0  3.0  1.0  20.0
c  2.0  6.0  NaN   NaN
d  3.0  9.0  NaN   NaN
e  4.0  2.0  NaN   NaN
f  5.0  4.0  NaN   NaN
h  NaN  NaN  2.0  30.0
i  NaN  NaN  3.0  40.0
iacob
  • 7,935
  • 4
  • 26
  • 52
jezrael
  • 629,482
  • 62
  • 918
  • 895
  • 4
    nice. for others reading this, if it's not working, see if you need to `.transpose()` one of your dfs to sync up the indexes - that was my issue – Jona Oct 02 '17 at 11:34
  • 2
    Thanks a lot. Great answer. But why does `concat` have to put df in brackets while `join` and `merge` don't? – Bowen Liu Sep 07 '18 at 13:50
  • 1
    @Bowen Liu In my opinion for possible concat multiple DataFrames in list like `dfs = [df1, df2, df3,... dfn]` and then `df = pd. concat(dfs)` – jezrael Sep 07 '18 at 14:29
  • No worries, let me know if you ever want to chat. Good luck and keep up the great work. Who knows Christmas may come early for you ;) – cs95 Dec 23 '20 at 12:09
  • LOL, or Slovakia ;) in that case it will be early for us both. – cs95 Dec 23 '20 at 12:15
34

you can use concat([df1, df2, ...], axis=1) in order to concatenate two or more DFs aligned by indexes:

pd.concat([df1, df2, df3, ...], axis=1)

or merge for concatenating by custom fields / indexes:

# join by _common_ columns: `col1`, `col3`
pd.merge(df1, df2, on=['col1','col3'])

# join by: `df1.col1 == df2.index`
pd.merge(df1, df2, left_on='col1' right_index=True)

or join for joining by index:

 df1.join(df2)
MaxU
  • 173,524
  • 24
  • 290
  • 329
10

By default:
join is a column-wise left join
pd.merge is a column-wise inner join
pd.concat is a row-wise outer join

pd.concat:
takes Iterable arguments. Thus, it cannot take DataFrames directly (use [df,df2])
Dimensions of DataFrame should match along axis

Join and pd.merge:
can take DataFrame arguments

nucsit026
  • 572
  • 4
  • 13
vicpal
  • 101
  • 1
  • 3
7

A silly bug that got me: the joins failed because index dtypes differed. This was not obvious as both tables were pivot tables of the same original table. After reset_index, the indices looked identical in Jupyter. It only came to light when saving to Excel...

Fixed with: df1[['key']] = df1[['key']].apply(pd.to_numeric)

Hopefully this saves somebody an hour!

Dilettant
  • 2,962
  • 3
  • 25
  • 27
Stephen Morrell
  • 831
  • 7
  • 4
4

If u want to join two dataframes in pandas you can simply use available attributes like merge or concatenate. For example if I have two dataframes df1 and df2 I can join them by:

newdataframe=merge(df1,df2,left_index=True,right_index=True)
vignesh babu
  • 117
  • 1
  • 7
4

This answer has been resolved for a while and all the available options are already out there. However in this answer I'll attempt to shed a bit more light on these options to help you understand when to use what.

This post will go through the following topics:

  • Merging with index under different conditions
    • options for index-based joins: merge, join, concat
    • merging on indexes
    • merging on index of one, column of other
  • effectively using named indexes to simplify merging syntax


Index-based joins

TL;DR

There are a few options, some simpler than others depending on the use case.

  1. DataFrame.merge with left_index and right_index (or left_on and right_on using named indexes)
  2. DataFrame.join (joins on index)
  3. pd.concat (joins on index)
PROS CONS
merge

• supports inner/left/right/full
• supports column-column, index-column, index-index joins

• can only join two frames at a time

join

• supports inner/left (default)/right/full
• can join multiple DataFrames at a time

• only supports index-index joins

concat

• specializes in joining multiple DataFrames at a time
• very fast (concatenation is linear time)

• only supports inner/full (default) joins
• only supports index-index joins


Index to index joins

Typically, an inner join on index would look like this:

left.merge(right, left_index=True, right_index=True)

Other types of joins (left, right, outer) follow similar syntax (and can be controlled using how=...).

Notable Alternatives

  1. DataFrame.join defaults to a left outer join on the index.

     left.join(right, how='inner',)
    

    If you happen to get ValueError: columns overlap but no suffix specified, you will need to specify lsuffix and rsuffix= arguments to resolve this. Since the column names are same, a differentiating suffix is required.

  2. pd.concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.

     pd.concat([left, right], axis=1, sort=False)
    

    For more information on concat, see this post.


Index to Column joins

To perform an inner join using index of left, column of right, you will use DataFrame.merge a combination of left_index=True and right_on=....

left.merge(right, left_index=True, right_on='key')

Other joins follow a similar structure. Note that only merge can perform index to column joins. You can join on multiple levels/columns, provided the number of index levels on the left equals the number of columns on the right.

join and concat are not capable of mixed merges. You will need to set the index as a pre-step using DataFrame.set_index.


This post is an abridged version of my work in Pandas Merging 101. Please follow this link for more examples and other topics on merging.

cs95
  • 274,032
  • 76
  • 480
  • 537