3

I don't know if this could be consider a join/merge/concatenate or something else but this is what is happening: I have two pandas dataframes (df1, df2) and I am trying to put them together so I could have something similar to the result below:

df1 = pd.DataFrame({'index': [0, 1], 'column A': ['item_a', 'item_b']})
index column A
0 item_a
1 item_b
df2 = pd.DataFrame({'index': [0, 1, 2], 'column B': [11, 22, 34]})
index column B
0 11
1 22
2 34

Desired Output:

index column A column B
0 item_a 11
1 item_a 22
2 item_a 34
3 item_b 11
4 item_b 22
5 item_b 34

I tried most of the regular (merge, concat, etc.) and I still can't figure out what I was doing wrong?

Does anyone have any idea of what I should do?

Henry Ecker
  • 8,775
  • 3
  • 8
  • 28

2 Answers2

5

You can do this with a cross merge:

df1.merge(df2, how='cross').reset_index()

Output:

   index  column A  column B
0      0    item_a        11
1      1    item_a        22
2      2    item_a        34
3      3    item_b        11
4      4    item_b        22
5      5    item_b        34

Note that how='cross' requires pandas 1.2.0+. For earlier versions, create a dummy column to merge on:

(df1.assign(key=0)
    .merge(df2.assign(key=0), on='key')
    .drop(columns='key')
    .reset_index())
tdy
  • 11,122
  • 1
  • 8
  • 24
1

Your expected output is a classic cartesian product of two lists. In general you can create with list compression

data = [[a, b] for a in df1['column A'] for b in df2['column B']]
df3 = pd.DataFrame(data, columns=['column A', 'column B'])
print(df3)

  column A  column B
0   item_a        11
1   item_a        22
2   item_a        34
3   item_b        11
4   item_b        22
5   item_b        34

With itertools.product()

import itertools

data = [[a, b] for a, b in itertools.product(df1['column A'], df2['column B'])]
df3 = pd.DataFrame(data, columns=['column A', 'column B'])
print(df3)

  column A  column B
0   item_a        11
1   item_a        22
2   item_a        34
3   item_b        11
4   item_b        22
5   item_b        34

With pd.MultiIndex.from_product()

index = pd.MultiIndex.from_product([df1['column A'], df2['column B']], names = ["column A", "column B"])
df3 = pd.DataFrame(index = index).reset_index()
print(df3)

  column A  column B
0   item_a        11
1   item_a        22
2   item_a        34
3   item_b        11
4   item_b        22
5   item_b        34
Ynjxsjmh
  • 7,277
  • 2
  • 8
  • 31