Remove rows with empty lists from pandas data frame

Question

I have a data frame with some columns with empty lists and others with lists of strings:

       donation_orgs                              donation_context
0            []                                           []
1   [the research of Dr. ...]   [In lieu of flowers , memorial donations ...]

I'm trying to return a data set without any of the rows where there are empty lists.

I've tried just checking for null values:

dfnotnull = df[df.donation_orgs != []]
dfnotnull

and

dfnotnull = df[df.notnull().any(axis=1)]
pd.options.display.max_rows=500
dfnotnull

And I've tried looping through and checking for values that exist, but I think the lists aren't returning Null or None like I thought they would:

dfnotnull = pd.DataFrame(columns=('donation_orgs', 'donation_context'))
for i in range(0,len(df)):
    if df['donation_orgs'].iloc(i):
        dfnotnull.loc[i] = df.iloc[i]

All three of the above methods simply return every row in the original data frame.=

In my experience it is quite perilous to keep data in lists within data frames. It can make grouping and aggregation functions go wrong. If you must do it, consider the tuple instead, that seems to work better. — Woody Pride, Dec 08 '15 at 18:50

score 56 · Answer 1 · answered Mar 07 '18 at 09:58

56

To avoid converting to str and actually use the lists, you can do this:

df[df['donation_orgs'].map(lambda d: len(d)) > 0]

It maps the donation_orgs column to the length of the lists of each row and keeps only the ones that have at least one element, filtering out empty lists.

It returns

Out[1]: 
                            donation_context          donation_orgs
1  [In lieu of flowers , memorial donations]  [the research of Dr.]

as expected.

answered Mar 07 '18 at 09:58

Victor

1,981
16
19

1

this should be the accepted answer . Its more elegant – Leothorn Oct 01 '19 at 06:43
2

`df[df['donation_orgs'].map(len) > 0]`, or even `df[df['donation_orgs'].map(bool)]` – MrKsn Mar 25 '20 at 09:57

score 31 · Accepted Answer · answered Dec 08 '15 at 18:23

You could try slicing as though the data frame were strings instead of lists:

import pandas as pd
df = pd.DataFrame({
'donation_orgs' : [[], ['the research of Dr.']],
'donation_context': [[], ['In lieu of flowers , memorial donations']]})

df[df.astype(str)['donation_orgs'] != '[]']

Out[9]: 
                            donation_context          donation_orgs
1  [In lieu of flowers , memorial donations]  [the research of Dr.]

score 9 · Answer 3 · answered Feb 13 '18 at 23:01

9

You can use the following one-liner:

df[(df['donation_orgs'].str.len() != 0) | (df['donation_context'].str.len() != 0)]

answered Feb 13 '18 at 23:01

Amir Imani

1,927
15
20

score 3 · Answer 4 · answered Jan 16 '20 at 17:21

Assuming that you read data from a CSV, the other possible solution could be this:

import pandas as pd

df = pd.read_csv('data.csv', na_filter=True, na_values='[]')
df.dropna()

na_filter defines additional string to recognize as NaN. I tested this on pandas-0.24.2.

Remove rows with empty lists from pandas data frame

4 Answers4