1

I have which I think is a pretty general problem. Namely, to recast a bipartite adjacency matrix in a list of a list of nodes. In Pandas, that would mean transform from a specific pd.DataFrame format to a specific pd.Series format.

For non discrete-math people, this looks like the following transformation:

From

df = pd.DataFrame(columns=['item1','item2','item3'],
                  index=['foo','bar','qux'], 
                  data = [[1,1,0],[0,1,1],[0,0,0]])

which looks like

    item1   item2   item3
foo     1       1       0
bar     0       1       1
qux     0       0       0

To

srs = pd.Series([['item1','item2'],['item2','item3'],[]],index=['foo','bar','qux'])

that looks like

foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object

I have partially achieved this goal with the following code:

df_1 = df.stack().reset_index()

srs = df_1.loc[df_1[0]==1].groupby('level_0')['level_1'].apply(list)

which, together with being slightly unreadable, has the issue of having dropped poor qux along the way.

Is there any shorter path to the desired result?

HerrIvan
  • 511
  • 3
  • 14

2 Answers2

2

If want avoid reshape by stack and groupby here is possible use list comprehension with convert 0,1 to boolean by DataFrame.astype and then filter columns names, last pass it to Series constructor:

print([list(df.columns[x]) for x in df.astype(bool).to_numpy()])
[['item1', 'item2'], ['item2', 'item3'], []]

s = pd.Series([list(df.columns[x]) for x in df.astype(bool).to_numpy()], index=df.index)
print(s)
foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object

If also performance is important use:

c = df.columns.to_numpy()
s = pd.Series([list(c[x]) for x in df.astype(bool).to_numpy()], index=df.index)
jezrael
  • 629,482
  • 62
  • 918
  • 895
1

Applying straightforward list comprehension on each row (axis=1) can work. If there are no non-zero elements in the row, an empty list will be produced.

df.apply(lambda row: [df.columns[i] for i, el in enumerate(row) if el], axis=1)

Result

foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object
Bill Huang
  • 4,161
  • 1
  • 11
  • 28