How to merge two dataframes based on file name?

Question

I have a dictionary of dataframes with keys in the form "Sample_X_####celsius". Each dataframe is structured like this:

displacement	force
values	values
values	values
values	values

I also have an additional dataframe with three columns structured like this:

label	width	thickness
sample_1_200celsius	11	222
sample_1_300celsius	12	223

How can I combine each row of the additional dataframe with the dataframe in the dictionary who's key corresponds to the "label" entry? My goal looks like this:

key: sample_1_200celsius

displacement	force	label	width	thickness
values	values	sample_1_200celsius	11	222
values	values
values	values

`pd.concat([df1, df2])`. But are you sure you want a `concat` (so `label, width, thickness` are only non-empoty on the first row?), rather than a join with broadcasting to all rows? what is your intent, what do you want to do with the output df? — smci, Apr 11 '21 at 02:02
Possible duplicate, see [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — smci, Apr 11 '21 at 02:03
I don't quite understand what is *a dictionary of dataframes with keys in the form "Sample_X_####celsius"*. — Ynjxsjmh, Apr 11 '21 at 02:17
@smci I can't concatenate because I have a ton of different files (300+). I need a way for only one row from the additional dataframe to be "matched" to each file in the dictionary who's key is the same as the label of that row. — aria39948, Apr 11 '21 at 02:25
@Ynjxsjmh The keys match up with the "label" row of the additional dataframe is what I was trying to say there. — aria39948, Apr 11 '21 at 02:26
@aria39948 In the last dataframe of you posted, what's the value of `label`, `width`, `thickness` column of the last two rows. — Ynjxsjmh, Apr 11 '21 at 02:41
@Ynjxsjmh No value there-- I want to take only one row from the 2nd dataframe and add it to each instance of the first dataframe. So for the sample_1_200celsius first dataframe, it only gets the row of the second dataframe that contains "sample_1_200celsius". — aria39948, Apr 11 '21 at 04:16
You can do the concatenate/join if you do them at read-time. Or you could do it in batches of say 10. Really you need to go back and show us the read or load commands that imported these dfs. — smci, Apr 11 '21 at 20:06

tdy · Accepted Answer · 2021-04-11T05:17:53.387

If the dictionary and additional dataframe are dictionary and df, you can concat() the dictionary dataframes with the corresponding df row:

for key in dictionary.keys():
    dfs = [dictionary[key], df.loc[df.label == key].reset_index(drop=True)]
    dictionary[key] = pd.concat(dfs, axis=1)

Full example

Given dictionary with dataframes for sample_1_200celsius and sample_1_300celsius:

df200 = pd.DataFrame({'displacement': [1,2,3], 'values': [2,4,6]})
df300 = pd.DataFrame({'displacement': [30,20,10], 'values': [60,40,20]})
dictionary = {'sample_1_200celsius': df200, 'sample_1_300celsius': df300}

And additional dataframe df:

df = pd.DataFrame({'label': ['sample_1_200celsius','sample_1_300celsius'], 'width': [11,12], 'thickness': [222,223]})

#                  label  width  thickness
# 0  sample_1_200celsius     11        222
# 1  sample_1_300celsius     12        223

You can concat() each dictionary dataframe with the corresponding row from df using loc indexing:

for key in dictionary.keys():
    dfs = [dictionary[key], df.loc[df.label == key].reset_index(drop=True)]
    dictionary[key] = pd.concat(dfs, axis=1)

So then the dictionary values will be updated as desired:

dictionary['sample_1_200celsius']

#    displacement  values                label  width  thickness
# 0             1       2  sample_1_200celsius   11.0      222.0
# 1             2       4                  NaN    NaN        NaN
# 2             3       6                  NaN    NaN        NaN

dictionary['sample_1_300celsius']

#    displacement  values                label  width  thickness
# 0            30      60  sample_1_300celsius   12.0      223.0
# 1            20      40                  NaN    NaN        NaN
# 2            10      20                  NaN    NaN        NaN

How to merge two dataframes based on file name?

1 Answers1

Full example