3

I have a dataframe looking as follows

id                          value       index
5eb3cbcc434474213e58b49a    [1,2,3,4,6] [0,1,2,3,4]
5eb3f335434474213e58b49d    [1,2,3,4]   [0,2,3,4]
5eb3f853434474213e58b49f    [1,2,3,4]   [0,2,3,4]
5eb40395434474213e58b4a2    [1,2,3,4]   [0,1,2,3]
5eb40425434474213e58b4a5    [1,2]       [0,2]

I try to convert this dataframe in the folllowing matter, as that the index aims as a title over each individual value, looking something like this:

id                          0   1   2   3   4
5eb3cbcc434474213e58b49a    1   2   3   4   6
5eb3f335434474213e58b49d    1   Nan 2   3   4
5eb3f853434474213e58b49f    1   Nan 2   3   4
5eb40395434474213e58b4a2    1   2   3   4   Nan
5eb40425434474213e58b4a5    1   Nan 2   Nan Nan

I tried to firstly split the List of Lists:

new_df = pd.DataFrame(df.Value.str.split(',').tolist(), index=df.Index).stack()
new_df = new_df.reset_index([0, 'Index'])
new_df.columns = ['Value', 'Index']

However I recieve the Error

TypeError: unhashable type: 'list'

What causes this error?

EQT1999
  • 33
  • 3
  • Does any answer below fit your requirement ? If yes, please review the answers and pick one to [accept the answer](https://stackoverflow.com/help/someone-answers). This is the way StackOverflow facilitates other users with similar problem to get a solution. Thanks! :-) – SeaBean Apr 27 '21 at 14:07

2 Answers2

5

You can use .apply() together with pd.Series(), as follows:

df = df.set_index('id').apply(lambda x: pd.Series(x['value'], index=x['index']), axis=1).reset_index()


print(df)

                         id    0    1    2    3    4
0  5eb3cbcc434474213e58b49a  1.0  2.0  3.0  4.0  6.0
1  5eb3f335434474213e58b49d  1.0  NaN  2.0  3.0  4.0
2  5eb3f853434474213e58b49f  1.0  NaN  2.0  3.0  4.0
3  5eb40395434474213e58b4a2  1.0  2.0  3.0  4.0  NaN
4  5eb40425434474213e58b4a5  1.0  NaN  2.0  NaN  NaN

This makes use of the .apply() function feature that:

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

This feature is handy in helping us give a simple solution to problems requiring expanding data onto columns while at the same time incorporating new columns into existing data by retaining and prorogating existing row indexes to these new columns. I used it to provide a simple answer to a classic question: How to merge a Series and DataFrame.

SeaBean
  • 6,349
  • 1
  • 3
  • 18
3

Create list of dictionaries in list comrehension and pass to DataFrame constructor, last append to original if dont need slow solution:

L = [dict(zip(x, y)) for x, y in zip(df.pop('index'), df.pop('value'))]

df = df.join(pd.DataFrame(L, index=df.index))
print (df)
                         id  0    1  2    3    4
0  5eb3cbcc434474213e58b49a  1  2.0  3  4.0  6.0
1  5eb3f335434474213e58b49d  1  NaN  2  3.0  4.0
2  5eb3f853434474213e58b49f  1  NaN  2  3.0  4.0
3  5eb40395434474213e58b4a2  1  2.0  3  4.0  NaN
4  5eb40425434474213e58b4a5  1  NaN  2  NaN  NaN

Performance:

#5k rows
df = pd.concat([df] * 1000, ignore_index=True)


In [123]: %timeit df.set_index('id').apply(lambda x: pd.Series(x['value'], index=x['index']), axis=1).reset_index()
2.14 s ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#similar code, because pop failed for test performance
In [124]: %timeit df.drop(['index','value'], axis=1).join(pd.DataFrame([dict(zip(x, y)) for x, y in zip(df['index'], df['value'])], index=df.index))
15.2 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#50k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [126]: %timeit df.set_index('id').apply(lambda x: pd.Series(x['value'], index=x['index']), axis=1).reset_index()
24.2 s ± 1.14 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [127]: %timeit df.drop(['index','value'], axis=1).join(pd.DataFrame([dict(zip(x, y)) for x, y in zip(df['index'], df['value'])], index=df.index))
128 ms ± 821 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
jezrael
  • 629,482
  • 62
  • 918
  • 895