I have which I think is a pretty general problem. Namely, to recast a bipartite adjacency matrix in a list of a list of nodes. In Pandas, that would mean transform from a specific pd.DataFrame
format to a specific pd.Series
format.
For non discrete-math people, this looks like the following transformation:
From
df = pd.DataFrame(columns=['item1','item2','item3'],
index=['foo','bar','qux'],
data = [[1,1,0],[0,1,1],[0,0,0]])
which looks like
item1 item2 item3
foo 1 1 0
bar 0 1 1
qux 0 0 0
To
srs = pd.Series([['item1','item2'],['item2','item3'],[]],index=['foo','bar','qux'])
that looks like
foo [item1, item2]
bar [item2, item3]
qux []
dtype: object
I have partially achieved this goal with the following code:
df_1 = df.stack().reset_index()
srs = df_1.loc[df_1[0]==1].groupby('level_0')['level_1'].apply(list)
which, together with being slightly unreadable, has the issue of having dropped poor qux
along the way.
Is there any shorter path to the desired result?