3

I have a table with duplicate columns that I want to join into one singular column. They share the same column names, but I want to merge any column with the same title to become one.

I have tried to to use merge, concat, among other things, but no luck.

data = [['a','a','c'],['a','b','d'],['a','c','c']]
df = pd.DataFrame(data,columns=['col1','col2','col1'])
df
co1 col2 col1
a     a    c
a     b    d
a     c    c 

I expect to have two columns from this point "col1 with a,a,a,c,d,c " and "col2 a,b,c,nan,nan,nan"

airbud
  • 49
  • 6

2 Answers2

2

First stack, then unstack. We will need to do a little bit more before we can unstack the data.

u = df.stack()
(u.to_frame()
  .set_index(u.groupby(u.index).cumcount(), append=True)
  .unstack(1)
  .sort_index(level=1)[0]
  .reset_index(drop=True))

  col1 col2
0    a    a
1    a    b
2    a    c
3    c  NaN
4    d  NaN
5    c  NaN

Another option is groupby, to_dict, and reconstruction.

dct = (df.groupby(df.columns, axis=1)
                          # x.values.ravel().tolist()
         .apply(lambda x: [z for y in x.values for z in y])
         .to_dict())
pd.DataFrame.from_dict(dct, orient='index').T


  col1  col2
0    a     a
1    c     b
2    a     c
3    d  None
4    a  None
5    c  None
cs95
  • 274,032
  • 76
  • 480
  • 537
  • This worked and fix my issue. I do have a follow question. I want to understand this and not just copy, so if you have a moment how does the lambda work in this situation ? – airbud Mar 26 '19 at 15:46
  • @airbud for each group, the lambda is called, passing each group df to x. The list comprehension flattens the data and returns it. – cs95 Mar 26 '19 at 17:28
1

melt groupby with concat

d={x : y['value'].reset_index(drop=True) for x,y in df.melt().groupby('variable')}
df=pd.concat(d,1)
df
Out[39]: 
  col1 col2
0    a    a
1    a    b
2    a    c
3    c  NaN
4    d  NaN
5    c  NaN
BENY
  • 258,262
  • 17
  • 121
  • 165
  • Hey! Trying to advertise [this answer](https://stackoverflow.com/a/55335207/4909087), please take a look and let me know what you think. – cs95 Mar 25 '19 at 19:51
  • @coldspeed that is nice answer detail and succinct, just one suggestion , maybe you can add `str.match`, since it almost dose the same thing like `str.contains` – BENY Mar 25 '19 at 19:58
  • Great! Will do. – cs95 Mar 25 '19 at 20:00
  • 1
    @coldspeed and for numpy part, maybe can add `np.char.find` :-) – BENY Mar 25 '19 at 20:01