Delete column from pandas DataFrame

Question

When deleting a column in a DataFrame I use:

del df['column_name']

And this works great. Why can't I use the following?

del df.column_name

Since it is possible to access the column/Series as df.column_name, I expected this to work.

Note this question is being discussed on [Meta](https://meta.stackoverflow.com/q/385291/3022952). — R.M., May 22 '19 at 16:37

LondonRob · Answer 1 · 2020-07-27T15:42:01.650

2621

The best way to do this in pandas is to use drop:

df = df.drop('column_name', 1)

where 1 is the axis number (0 for rows and 1 for columns.)

To delete the column without having to reassign df you can do:

df.drop('column_name', axis=1, inplace=True)

Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:

df = df.drop(df.columns[[0, 1, 3]], axis=1)  # df.columns is zero-based pd.Index

Also working with "text" syntax for the columns:

df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)

Note: Introduced in v0.21.0 (October 27, 2017), the drop() method accepts index/columns keywords as an alternative to specifying the axis.

So we can now just do:

df.drop(columns=['B', 'C'])

edited Jul 27 '20 at 15:42

answered Aug 09 '13 at 11:12

LondonRob

53,478
30
110
152

94

Is this recommended over `del` for some reason? – beardc Dec 10 '13 at 20:13
1

@BirdJaguarIV I don't know of any performance improvement, but readability-wise, `drop` is a more SQL-like description of the operation in question. Couldn't `del` potentially be interpreted as setting all the values in that column to `NaN`? – LondonRob Dec 11 '13 at 12:20
4

I hadn't thought of reading it that way, but I guess I'm more used to pythonisms than I am to SQL. Maybe depends on who's going to be reading it? I'm also a fan of saving keystrokes when possible, all else equal :) – beardc Dec 11 '13 at 19:58
Does this make a copy though? – Andy Hayden Jan 18 '14 at 06:52
25

Though this method of deletion has its merits, this answer does not really answer the question being asked. – Paul May 28 '14 at 12:59
135

True @Paul, but due to the title of the question, most people arriving here will do so via trying to work out how to delete a column. – LondonRob May 28 '14 at 16:43
33

@beardc another advantage of `drop` over `del` is that `drop` allows you to drop multiple columns at once, perform the operation inplace or not, and also delete records along any axis (especially useful for a 3-D matrix or `Panel`) – hobs Apr 14 '16 at 20:17
16

Another advantage of `drop` over `del` is that [drop](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html) is part of the pandas API and contains documentation. – modulitos Aug 12 '16 at 08:53
What does the axis parameter refer to? – user402516 Aug 30 '16 at 17:09
1

@user402516 the `axis` parameter specifies whether to look at row labels or column labels for deleting. `0` refers to rows, `1` being for columns – Clock Slave Mar 18 '17 at 11:18
3

Starting from version 0.21, drop accepts `columns` or `rows` parameters inline with the rename method: http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#drop-now-also-accepts-index-columns-keywords if you want to update – ayhan Oct 02 '17 at 12:42
1

@ayhan As far as I can see this is not released yet. But thanks for the heads-up. (Also: what a brilliant improvement in syntax! Thumbs up from me.) – LondonRob Oct 02 '17 at 13:27
How to delete only if the column exists?, because if doesn't exist I get an error – Guilherme Felipe Reis Feb 26 '19 at 23:16
1

what does 'inplace = true', means here while deleting a column. – Deepak Jun 26 '19 at 13:23
This doesn't answer the question, which is who `del df.column_name` doesn't work. – pppery Sep 08 '19 at 02:16
What if I wanted to drop columns 6-10? What does that look like? – Arthur D. Howland Sep 26 '19 at 16:07
@beardc "fan of saving keystrokes" --> try Perl :) – Nov 08 '19 at 14:19
How make the drop conditional? – error2007s Apr 29 '20 at 22:12
1

I believe that if you use the columns kwarg, e.g. `df.drop(columns=['A', 'B'])`, then you don't have to specify `axis=1`. – billiam Jul 24 '20 at 15:52
@billiam if you double-check this to make sure, you can propose an edit to this answer! – LondonRob Jul 27 '20 at 10:28
@LondonRob Thank you. I did but it was rejected. Regards. – billiam Jul 27 '20 at 14:39
1

_**Note:**_ [Introduced in v0.21.0 (October 27, 2017), the drop() method accepts index/columns keywords as an alternative to specifying the axis.](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.21.0.html#new-features) `df.drop(['B', 'C'], axis=1)` is now equivalent to `df.drop(columns=['B', 'C'])` – billiam Jul 27 '20 at 14:43
1

@billiam fixed. Thanks for the edit. It's clearly useful to know in this context. – LondonRob Jul 27 '20 at 15:42

score 1131 · Accepted Answer · edited Apr 07 '19 at 22:01

1131

As you've guessed, the right syntax is

del df['column_name']

It's difficult to make del df.column_name work simply as the result of syntactic limitations in Python. del df[name] gets translated to df.__delitem__(name) under the covers by Python.

edited Apr 07 '19 at 22:01

cs95

274,032
76
480
537

answered Nov 21 '12 at 03:12

Wes McKinney

83,626
27
133
107

30

I realize this is a super old "answer", but my curiosity is piqued - *why* is that a syntactic limitation of Python? `class A(object): def __init__(self): self.var = 1` sets up a class, then `a = A(); del a.var` works just fine... – dwanderson Oct 04 '16 at 14:24
19

@dwanderson the difference is that when a column is to be removed, the DataFrame needs to have its own handling for "how to do it". In the case of `del df[name]`, it gets translated to `df.__delitem__(name)` which is a method that DataFrame can implement and modify to its needs. In the case of `del df.name`, the member variable gets removed without a chance for any custom-code running. Consider your own example - can you get `del a.var` to result in a print of "deleting variable"? If you can, please tell me how. I can't :) – Yonatan Dec 22 '16 at 08:27
8

@Yonatan You can use either https://docs.python.org/3/reference/datamodel.html#object.__delattr__ or descriptors for that: https://docs.python.org/3/howto/descriptor.html – Eugene Pakhomov Jan 19 '17 at 16:06
@EugenePakhomov good point. I was answering in python 2, indeed python 3 gives more flexibility in such matters. Thanks for clarifying. – Yonatan Jan 22 '17 at 19:03
5

@Yonatan Eugene's comment applies to Python 2 also; descriptors have been in Python 2 since 2.2 and it is trivial to satisfy your requirement ;) – C S Jun 20 '17 at 12:38
3

This answer isn't really correct - the `pandas` developers _didn't_, but that doesn't mean it is hard to do. – wizzwizz4 Sep 30 '17 at 09:42
Still, it is correct given that panda works / used to work on Py2.7, too - where you can't – pedjjj Mar 30 '20 at 17:04
2

using this answer may cause tokenizing problems when you save the csv and want to read it again. using "df.drop()" as @LondonRob described is the correct way. – Zeynab Rostami Sep 13 '20 at 12:17

score 273 · Answer 3 · edited May 23 '18 at 19:44

273

Use:

columns = ['Col1', 'Col2', ...]
df.drop(columns, inplace=True, axis=1)

This will delete one or more columns in-place. Note that inplace=True was added in pandas v0.13 and won't work on older versions. You'd have to assign the result back in that case:

df = df.drop(columns, axis=1)

edited May 23 '18 at 19:44

Peter Mortensen

28,342
21
95
123

answered Mar 23 '14 at 20:57

Krishna Sankar

3,295
2
15
13

3

A note about this answer: if a 'list' is used, the square brackets should be dropped: `df.drop(list,inplace=True,axis=1)` – edesz Jun 14 '17 at 23:31
2

this should really be the accepted answer, because it makes clear the superiority of this method over `del` -- can drop more than one column at once. – dbliss Jul 04 '17 at 21:27
I believe that if you use the columns kwarg, e.g. `df.drop(columns=['A', 'B'])`, then you don't have to specify `axis=1`. – billiam Jul 24 '20 at 15:53
Latecomers also look below to @eiTanLaVi.solution for pandas 0.16.1+ who recommends add errors='ignore' – micstr Sep 10 '20 at 09:34

score 132 · Answer 4 · edited May 23 '18 at 19:44

132

Drop by index

Delete first, second and fourth columns:

df.drop(df.columns[[0,1,3]], axis=1, inplace=True)

Delete first column:

df.drop(df.columns[[0]], axis=1, inplace=True)

There is an optional parameter inplace so that the original data can be modified without creating a copy.

Popped

Column selection, addition, deletion

Delete column column-name:

df.pop('column-name')

Examples:

df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])

print df:

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9

df.drop(df.columns[[0]], axis=1, inplace=True) print df:

   two  three
A    2      3
B    5      6
C    8      9

three = df.pop('three') print df:

   two
A    2
B    5
C    8

edited May 23 '18 at 19:44

Peter Mortensen

28,342
21
95
123

answered Jul 15 '15 at 13:37

jezrael

629,482
62
918
895

1

How can I pop a row in pandas? – Kennet Celeste Feb 09 '17 at 16:10
2

@Yugi You can use a transposed dataframe for that. ex - `df.T.pop('A')` – Clock Slave Mar 18 '17 at 11:21
@ClockSlave That doesn't modify the original `df`. You _could_ do `df = df.T; df.pop(index); df = df.T` but this seems excessive. – cs95 May 23 '19 at 17:28
Instead of `df.drop(df.columns[[0]], axis=1, inplace=True)` wouldn't it be enough to use `df.drop([0], axis=1)` ? – Anirban Mukherjee Dec 04 '19 at 20:02
1

@Anirban Mukherjee It depends. If want delete column name `0`, then `df.drop(0, axis=1)` working well. But if dont know column name and need remove first column then need `df.drop(df.columns[[0]], axis=1, inplace=True)`, it select first column by position and drop it. – jezrael Dec 04 '19 at 20:42
@KennetCeleste to pop rows: `df.drop(df.index[[1,2]], axis=0, inplace=True)` will pop rows 1 and 2. – marsipan Apr 30 '20 at 19:56
In my case had to remove `inplace=True` for it to work. – Tiago Martins Peres 李大仁 Jun 14 '20 at 08:02

score 77 · Answer 5 · edited Jun 20 '20 at 09:12

The actual question posed, missed by most answers here is:

Why can't I use `del df.column_name`?

At first we need to understand the problem, which requires us to dive into python magic methods.

As Wes points out in his answer del df['column'] maps to the python magic method df.__delitem__('column') which is implemented in pandas to drop the column

However, as pointed out in the link above about python magic methods:

In fact, __del__ should almost never be used because of the precarious circumstances under which it is called; use it with caution!

You could argue that del df['column_name'] should not be used or encouraged, and thereby del df.column_name should not even be considered.

However, in theory, del df.column_name could be implemeted to work in pandas using the magic method __delattr__. This does however introduce certain problems, problems which the del df['column_name'] implementation already has, but in lesser degree.

Example Problem

What if I define a column in a dataframe called "dtypes" or "columns".

Then assume I want to delete these columns.

del df.dtypes would make the __delattr__ method confused as if it should delete the "dtypes" attribute or the "dtypes" column.

Architectural questions behind this problem

Is a dataframe a collection of columns?
Is a dataframe a collection of rows?
Is a column an attribute of a dataframe?

Pandas answers:

Yes, in all ways
No, but if you want it to be, you can use the .ix, .loc or .iloc methods.
Maybe, do you want to read data? Then yes, unless the name of the attribute is already taken by another attribute belonging to the dataframe. Do you want to modify data? Then no.

TLDR;

You cannot do del df.column_name because pandas has a quite wildly grown architecture that needs to be reconsidered in order for this kind of cognitive dissonance not to occur to its users.

Protip:

Don't use df.column_name, It may be pretty, but it causes cognitive dissonance

Zen of Python quotes that fits in here:

There are multiple ways of deleting a column.

There should be one-- and preferably only one --obvious way to do it.

Columns are sometimes attributes but sometimes not.

Special cases aren't special enough to break the rules.

Does del df.dtypes delete the dtypes attribute or the dtypes column?

In the face of ambiguity, refuse the temptation to guess.

"In fact, `__del__` should almost never be used because of the precarious circumstances under which it is called; use it with caution!" is completely irrelevant here, as the method being used here is `__delattr__`. — pppery, Feb 22 '18 at 19:27
@ppperry you're miss-quoting. it's the `del` builtin that is meant, not the `.__del__` instance method. The `del` builtin is mapping to `__delattr__` and `__delitem__` which is what I am building my argument on. So maybe you want to re-read what I wrote. — firelynx, Feb 23 '18 at 10:01
`__` ... `__` gets intrerpreted as bold markup by StackExchange — pppery, Feb 25 '18 at 20:20
"Don't use df.column_name, It may be pretty, but it causes cognitive dissonance" What does this mean? I am not a psychologist so I have to look this up to understand what you mean. Also, quoting The Zen is meaningless because there are hundreds of valid ways to do the same thing in pandas. — cs95, May 23 '19 at 17:26

score 65 · Answer 6 · edited May 23 '18 at 19:48

65

A nice addition is the ability to drop columns only if they exist. This way you can cover more use cases, and it will only drop the existing columns from the labels passed to it:

Simply add errors='ignore', for example.:

df.drop(['col_name_1', 'col_name_2', ..., 'col_name_N'], inplace=True, axis=1, errors='ignore')

This is new from pandas 0.16.1 onward. Documentation is here.

edited May 23 '18 at 19:48

Peter Mortensen

28,342
21
95
123

answered Jan 03 '16 at 12:29

eiTan LaVi

2,359
20
14

score 45 · Answer 7 · edited Oct 21 '16 at 21:20

45

from version 0.16.1 you can do

df.drop(['column_name'], axis = 1, inplace = True, errors = 'ignore')

edited Oct 21 '16 at 21:20

Emile Bergeron

14,368
4
66
111

answered Apr 30 '16 at 18:57

sushmit

3,478
2
28
30

3

And this also supports dropping multiple columns, some of which need not exist (i.e. without raising error ``errors= 'ignore'``) ``df.drop(['column_1','column_2'], axis=1 , inplace=True,errors= 'ignore')``, if such an application desired! – muon Oct 21 '16 at 19:57

score 35 · Answer 8 · edited May 23 '18 at 19:43

35

It's good practice to always use the [] notation. One reason is that attribute notation (df.column_name) does not work for numbered indices:

In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]])

In [2]: df[1]
Out[2]:
0    2
1    5
Name: 1

In [3]: df.1
  File "<ipython-input-3-e4803c0d1066>", line 1
    df.1
       ^
SyntaxError: invalid syntax

edited May 23 '18 at 19:43

Peter Mortensen

28,342
21
95
123

answered Nov 16 '12 at 11:33

Andy Hayden

291,328
80
565
500

score 29 · Answer 9 · edited Sep 20 '18 at 18:48

29

Pandas 0.21+ answer

Pandas version 0.21 has changed the drop method slightly to include both the index and columns parameters to match the signature of the rename and reindex methods.

df.drop(columns=['column_a', 'column_c'])

Personally, I prefer using the axis parameter to denote columns or index because it is the predominant keyword parameter used in nearly all pandas methods. But, now you have some added choices in version 0.21.

edited Sep 20 '18 at 18:48

Acumenus

41,481
14
116
107

answered Oct 24 '17 at 14:31

Ted Petrou

45,121
17
113
114

1

df.drop(['column_a', 'column_c'], axis=1) | it is working for me for now – Indrajeet Gour Apr 22 '18 at 05:03

Alexander · Answer 10 · 2016-11-22T15:15:42.863

22

In pandas 0.16.1+ you can drop columns only if they exist per the solution posted by @eiTanLaVi. Prior to that version, you can achieve the same result via a conditional list comprehension:

df.drop([col for col in ['col_name_1','col_name_2',...,'col_name_N'] if col in df], 
        axis=1, inplace=True)

edited Nov 22 '16 at 15:15

answered Feb 13 '16 at 21:58

Alexander

87,529
23
162
169

piRSquared · Answer 11 · 2017-09-20T14:28:55.477

TL;DR

A lot of effort to find a marginally more efficient solution. Difficult to justify the added complexity while sacrificing the simplicity of df.drop(dlst, 1, errors='ignore')

df.reindex_axis(np.setdiff1d(df.columns.values, dlst), 1)

Preamble
Deleting a column is semantically the same as selecting the other columns. I'll show a few additional methods to consider.

I'll also focus on the general solution of deleting multiple columns at once and allowing for the attempt to delete columns not present.

Using these solutions are general and will work for the simple case as well.

Setup
Consider the pd.DataFrame df and list to delete dlst

df = pd.DataFrame(dict(zip('ABCDEFGHIJ', range(1, 11))), range(3))
dlst = list('HIJKLM')

df

   A  B  C  D  E  F  G  H  I   J
0  1  2  3  4  5  6  7  8  9  10
1  1  2  3  4  5  6  7  8  9  10
2  1  2  3  4  5  6  7  8  9  10

dlst

['H', 'I', 'J', 'K', 'L', 'M']

The result should look like:

df.drop(dlst, 1, errors='ignore')

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

Since I'm equating deleting a column to selecting the other columns, I'll break it into two types:

Label selection
Boolean selection

Label Selection

We start by manufacturing the list/array of labels that represent the columns we want to keep and without the columns we want to delete.

df.columns.difference(dlst)

Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

np.setdiff1d(df.columns.values, dlst)

array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype=object)

df.columns.drop(dlst, errors='ignore')

Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

list(set(df.columns.values.tolist()).difference(dlst))

# does not preserve order
['E', 'D', 'B', 'F', 'G', 'A', 'C']

[x for x in df.columns.values.tolist() if x not in dlst]
```
['A', 'B', 'C', 'D', 'E', 'F', 'G']
```

Columns from Labels
For the sake of comparing the selection process, assume:

 cols = [x for x in df.columns.values.tolist() if x not in dlst]

Then we can evaluate

df.loc[:, cols]
df[cols]
df.reindex(columns=cols)
df.reindex_axis(cols, 1)

Which all evaluate to:

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

Boolean Slice

We can construct an array/list of booleans for slicing

~df.columns.isin(dlst)
~np.in1d(df.columns.values, dlst)
[x not in dlst for x in df.columns.values.tolist()]
(df.columns.values[:, None] != dlst).all(1)

Columns from Boolean
For the sake of comparison

bools = [x not in dlst for x in df.columns.values.tolist()]

df.loc[: bools]

Which all evaluate to:

   A  B  C  D  E  F  G
0  1  2  3  4  5  6  7
1  1  2  3  4  5  6  7
2  1  2  3  4  5  6  7

Robust Timing

Functions

setdiff1d = lambda df, dlst: np.setdiff1d(df.columns.values, dlst)
difference = lambda df, dlst: df.columns.difference(dlst)
columndrop = lambda df, dlst: df.columns.drop(dlst, errors='ignore')
setdifflst = lambda df, dlst: list(set(df.columns.values.tolist()).difference(dlst))
comprehension = lambda df, dlst: [x for x in df.columns.values.tolist() if x not in dlst]

loc = lambda df, cols: df.loc[:, cols]
slc = lambda df, cols: df[cols]
ridx = lambda df, cols: df.reindex(columns=cols)
ridxa = lambda df, cols: df.reindex_axis(cols, 1)

isin = lambda df, dlst: ~df.columns.isin(dlst)
in1d = lambda df, dlst: ~np.in1d(df.columns.values, dlst)
comp = lambda df, dlst: [x not in dlst for x in df.columns.values.tolist()]
brod = lambda df, dlst: (df.columns.values[:, None] != dlst).all(1)

Testing

res1 = pd.DataFrame(
    index=pd.MultiIndex.from_product([
        'loc slc ridx ridxa'.split(),
        'setdiff1d difference columndrop setdifflst comprehension'.split(),
    ], names=['Select', 'Label']),
    columns=[10, 30, 100, 300, 1000],
    dtype=float
)

res2 = pd.DataFrame(
    index=pd.MultiIndex.from_product([
        'loc'.split(),
        'isin in1d comp brod'.split(),
    ], names=['Select', 'Label']),
    columns=[10, 30, 100, 300, 1000],
    dtype=float
)

res = res1.append(res2).sort_index()

dres = pd.Series(index=res.columns, name='drop')

for j in res.columns:
    dlst = list(range(j))
    cols = list(range(j // 2, j + j // 2))
    d = pd.DataFrame(1, range(10), cols)
    dres.at[j] = timeit('d.drop(dlst, 1, errors="ignore")', 'from __main__ import d, dlst', number=100)
    for s, l in res.index:
        stmt = '{}(d, {}(d, dlst))'.format(s, l)
        setp = 'from __main__ import d, dlst, {}, {}'.format(s, l)
        res.at[(s, l), j] = timeit(stmt, setp, number=100)

rs = res / dres

rs

                          10        30        100       300        1000
Select Label                                                           
loc    brod           0.747373  0.861979  0.891144  1.284235   3.872157
       columndrop     1.193983  1.292843  1.396841  1.484429   1.335733
       comp           0.802036  0.732326  1.149397  3.473283  25.565922
       comprehension  1.463503  1.568395  1.866441  4.421639  26.552276
       difference     1.413010  1.460863  1.587594  1.568571   1.569735
       in1d           0.818502  0.844374  0.994093  1.042360   1.076255
       isin           1.008874  0.879706  1.021712  1.001119   0.964327
       setdiff1d      1.352828  1.274061  1.483380  1.459986   1.466575
       setdifflst     1.233332  1.444521  1.714199  1.797241   1.876425
ridx   columndrop     0.903013  0.832814  0.949234  0.976366   0.982888
       comprehension  0.777445  0.827151  1.108028  3.473164  25.528879
       difference     1.086859  1.081396  1.293132  1.173044   1.237613
       setdiff1d      0.946009  0.873169  0.900185  0.908194   1.036124
       setdifflst     0.732964  0.823218  0.819748  0.990315   1.050910
ridxa  columndrop     0.835254  0.774701  0.907105  0.908006   0.932754
       comprehension  0.697749  0.762556  1.215225  3.510226  25.041832
       difference     1.055099  1.010208  1.122005  1.119575   1.383065
       setdiff1d      0.760716  0.725386  0.849949  0.879425   0.946460
       setdifflst     0.710008  0.668108  0.778060  0.871766   0.939537
slc    columndrop     1.268191  1.521264  2.646687  1.919423   1.981091
       comprehension  0.856893  0.870365  1.290730  3.564219  26.208937
       difference     1.470095  1.747211  2.886581  2.254690   2.050536
       setdiff1d      1.098427  1.133476  1.466029  2.045965   3.123452
       setdifflst     0.833700  0.846652  1.013061  1.110352   1.287831

fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharey=True)
for i, (n, g) in enumerate([(n, g.xs(n)) for n, g in rs.groupby('Select')]):
    ax = axes[i // 2, i % 2]
    g.plot.bar(ax=ax, title=n)
    ax.legend_.remove()
fig.tight_layout()

This is relative to the time it takes to run df.drop(dlst, 1, errors='ignore'). It seems like after all that effort, we only improve performance modestly.

If fact the best solutions use reindex or reindex_axis on the hack list(set(df.columns.values.tolist()).difference(dlst)). A close second and still very marginally better than drop is np.setdiff1d.

rs.idxmin().pipe(
    lambda x: pd.DataFrame(
        dict(idx=x.values, val=rs.lookup(x.values, x.index)),
        x.index
    )
)

                      idx       val
10     (ridx, setdifflst)  0.653431
30    (ridxa, setdifflst)  0.746143
100   (ridxa, setdifflst)  0.816207
300    (ridx, setdifflst)  0.780157
1000  (ridxa, setdifflst)  0.861622

score 14 · Answer 12 · answered Oct 15 '20 at 17:14

14

df.drop('columnname', axis =1, inplace = True)

or else you can go with

del df['colname']

To delete multiple columns based on column numbers

df.drop(df.iloc[:,1:3], axis = 1, inplace = True)

To delete multiple columns based on columns names

df.drop(['col1','col2',..'coln'], axis = 1, inplace = True)

answered Oct 15 '20 at 17:14

Praveen Bushipaka

341
2
9

I tried "" del df['colname'] "" , and i got this error "'DataFrameGroupBy' object does not support item deletion" – N.Elsayed Jan 07 '21 at 13:40

score 12 · Answer 13 · answered Apr 19 '20 at 13:58

12

We can Remove or Delete a specified column or sprcified columns by drop() method.

Suppose df is a dataframe.

Column to be removed = column0

Code:

df = df.drop(column0, axis=1)

To remove multiple columns col1, col2, . . . , coln, we have to insert all the columns that needed to be removed in a list. Then remove them by drop() method.

Code:

df = df.drop([col1, col2, . . . , coln], axis=1)

I hope it would be helpful.

answered Apr 19 '20 at 13:58

Littin Rajan

657
1
6
18

`df = df.drop([col1, col2, . . . , coln], axis=1)` this does not work if i specify a variable name in place of col1, col2 etc. I get error column not in axis when its definitely present. @Littin Could you help? – RSM May 20 '20 at 05:54

ccpizza · Answer 14 · 2020-06-21T11:15:31.247

5

If your original dataframe df is not too big, you have no memory constraints, and you only need to keep a few columns, or, if you don't know beforehand the names of all the extra columns that you do not need, then you might as well create a new dataframe with only the columns you need:

new_df = df[['spam', 'sausage']]

edited Jun 21 '20 at 11:15

answered Mar 15 '20 at 17:57

ccpizza

21,405
10
121
123

score 3 · Answer 15 · edited Jul 04 '18 at 10:14

3

The dot syntax works in JavaScript, but not in Python.

Python: del df['column_name']
JavaScript: del df['column_name'] or del df.column_name

edited Jul 04 '18 at 10:14

anothernode

4,369
11
37
53

answered Apr 20 '16 at 15:55

Doctor

3,826
2
24
44

score 2 · Answer 16 · answered Sep 09 '18 at 06:59

2

Another way of Deleting a Column in Pandas DataFrame

if you're not looking for In-Place deletion then you can create a new DataFrame by specifying the columns using DataFrame(...) function as

my_dict = { 'name' : ['a','b','c','d'], 'age' : [10,20,25,22], 'designation' : ['CEO', 'VP', 'MD', 'CEO']}

df = pd.DataFrame(my_dict)

Create a new DataFrame as

newdf = pd.DataFrame(df, columns=['name', 'age'])

You get a result as good as what you get with del / drop

answered Sep 09 '18 at 06:59

Deepak K Gupta

6,416
1
18
31

1

This is technically correct but it seems silly to have to list every column to keep instead of just the one (or few) columns you want to delete. – cs95 May 23 '19 at 17:24

score 1 · Answer 17 · answered Nov 15 '20 at 01:19

Deleting a column using iloc function of dataframe and slicing, when we have a typical column name with unwanted values.

df = df.iloc[:,1:] # removing an unnamed index column

Here 0 is the default row and 1 is 1st column so ,1 where starts and stepping is taking default values, hence :,1: is our parameter for deleting the first column.

Delete column from pandas DataFrame

17 Answers17

Drop by index

Popped

Examples:

Why can't I use `del df.column_name`?

Example Problem

Architectural questions behind this problem

Pandas answers:

TLDR;

Protip:

Zen of Python quotes that fits in here:

Pandas 0.21+ answer

TL;DR

Label Selection

Boolean Slice

Linked

Related

Delete column from pandas DataFrame

17 Answers17

Drop by index

Popped

Examples:

Why can't I use del df.column_name?

Example Problem

Architectural questions behind this problem

Pandas answers:

TLDR;

Protip:

Zen of Python quotes that fits in here:

Pandas 0.21+ answer

TL;DR

Label Selection

Boolean Slice

Linked

Related

Why can't I use `del df.column_name`?