Why does using df.drop to delete a column drop the whole dataframe?

Question

Following this post I tried this to delete two columns from a dataframe:

import pandas as pd
from io import StringIO

A_csv = """cases,population,country,year,type,count
745,19987071,Afghanistan,1999,population,19987071
2666,20595360,Afghanistan,2000,population,20595360
37737,172006362,Brazil,1999,population,172006362
80488,174504898,Brazil,2000,population,174504898
212258,1272915272,China,1999,population,1272915272
213766,1280428583,China,2000,population,1280428583"""

with StringIO(A_csv) as fp:
    A = pd.read_csv(fp)
print(A)
print()
dropcols = ["type", "count"]

A = A.drop(dropcols, axis = 1, inplace = True)
print(A)

result

      cases  population      country  year        type       count
0     745    19987071  Afghanistan  1999  population    19987071
1    2666    20595360  Afghanistan  2000  population    20595360
2   37737   172006362       Brazil  1999  population   172006362
3   80488   174504898       Brazil  2000  population   174504898
4  212258  1272915272        China  1999  population  1272915272
5  213766  1280428583        China  2000  population  1280428583

None

Is there something obvious that is escaping me?

use either A.drop(dropcols, axis = 1, inplace = True) or A = A.drop(dropcols, axis = 1) — Vaishali, Sep 27 '17 at 16:52
replaced "A = A.drop(....inplace=True)" with just "A.drop(.... inplace=True)", and now it is fine; can someone explain that? — cumin, Sep 27 '17 at 16:53
Using 'inplace' means a 'None' object is returned. So, assigning it to A and printing it yields a 'None' object printed as was the case with your code snippet. — ShreyasG, Sep 27 '17 at 17:08

cs95 · Accepted Answer · 2017-09-28T08:01:52.773

_{These solutions were mentioned in the comments. I'm just fleshing them out in this post.}

When using drop, be wary of the two options you have.

One of them is to drop inplace. When this is done, the dataframe is operated upon and changes are made to the original. This means that this is sufficient.

A.drop(dropcols, axis=1, inplace=1)

A
    cases  population      country  year
0     745    19987071  Afghanistan  1999
1    2666    20595360  Afghanistan  2000
2   37737   172006362       Brazil  1999
3   80488   174504898       Brazil  2000
4  212258  1272915272        China  1999
5  213766  1280428583        China  2000

As the df.drop documentation specifies:

inplace : bool, default False

If True, do operation inplace and return None.

Note that when drop is called inplace, it returns None (that is the default value of any function that does not return a value), and A will have already been updated.

The other option is to drop, but return a copy. This means that the original is not modified. So, you can now do:

B = A.drop(dropcols, axis=1)

B    
    cases  population      country  year
0     745    19987071  Afghanistan  1999
1    2666    20595360  Afghanistan  2000
2   37737   172006362       Brazil  1999
3   80488   174504898       Brazil  2000
4  212258  1272915272        China  1999
5  213766  1280428583        China  2000

A
    cases  population      country  year        type       count
0     745    19987071  Afghanistan  1999  population    19987071
1    2666    20595360  Afghanistan  2000  population    20595360
2   37737   172006362       Brazil  1999  population   172006362
3   80488   174504898       Brazil  2000  population   174504898
4  212258  1272915272        China  1999  population  1272915272
5  213766  1280428583        China  2000  population  1280428583

Where B and A exist separately.

Note that you are not saving any memory working with inplace - both methods create a copy. However, in the former case, a copy is made behind the scene and the changes are added back into the original object.

Why does using df.drop to delete a column drop the whole dataframe?

1 Answers1