1117

I have the following DataFrame (df):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))

I add more column(s) by assignment:

df['mean'] = df.mean(1)

How can I move the column mean to the front, i.e. set it as first column leaving the order of the other columns untouched?

nick
  • 940
  • 1
  • 9
  • 23
Timmie
  • 11,359
  • 3
  • 12
  • 7
  • 2
    possible duplicate of [Python Pandas - Re-ordering columns in a dataframe based on column name](http://stackoverflow.com/questions/11067027/python-pandas-re-ordering-columns-in-a-dataframe-based-on-column-name) – Laurence Jan 04 '13 at 11:31
  • 1
    For a generalized NumPy-based solution see [How to move a column in a pandas dataframe](https://stackoverflow.com/questions/52616829/how-to-move-a-column-in-a-pandas-dataframe), assumes one column level only, i.e. no `MultiIndex`. – jpp Oct 03 '18 at 08:31
  • After searching enough, I got this best link for columns re-arranging multiple logics in pretty simple terms [columns re-arrange logic for pandas] [http://www.datasciencemadesimple.com/re-arrange-or-re-order-the-column-of-dataframe-in-pandas-python-2] – ravibeli Jun 06 '20 at 13:49

38 Answers38

1039

One easy way would be to reassign the dataframe with a list of the columns, rearranged as needed.

This is what you have now:

In [6]: df
Out[6]:
          0         1         2         3         4      mean
0  0.445598  0.173835  0.343415  0.682252  0.582616  0.445543
1  0.881592  0.696942  0.702232  0.696724  0.373551  0.670208
2  0.662527  0.955193  0.131016  0.609548  0.804694  0.632596
3  0.260919  0.783467  0.593433  0.033426  0.512019  0.436653
4  0.131842  0.799367  0.182828  0.683330  0.019485  0.363371
5  0.498784  0.873495  0.383811  0.699289  0.480447  0.587165
6  0.388771  0.395757  0.745237  0.628406  0.784473  0.588529
7  0.147986  0.459451  0.310961  0.706435  0.100914  0.345149
8  0.394947  0.863494  0.585030  0.565944  0.356561  0.553195
9  0.689260  0.865243  0.136481  0.386582  0.730399  0.561593

In [7]: cols = df.columns.tolist()

In [8]: cols
Out[8]: [0L, 1L, 2L, 3L, 4L, 'mean']

Rearrange cols in any way you want. This is how I moved the last element to the first position:

In [12]: cols = cols[-1:] + cols[:-1]

In [13]: cols
Out[13]: ['mean', 0L, 1L, 2L, 3L, 4L]

Then reorder the dataframe like this:

In [16]: df = df[cols]  #    OR    df = df.ix[:, cols]

In [17]: df
Out[17]:
       mean         0         1         2         3         4
0  0.445543  0.445598  0.173835  0.343415  0.682252  0.582616
1  0.670208  0.881592  0.696942  0.702232  0.696724  0.373551
2  0.632596  0.662527  0.955193  0.131016  0.609548  0.804694
3  0.436653  0.260919  0.783467  0.593433  0.033426  0.512019
4  0.363371  0.131842  0.799367  0.182828  0.683330  0.019485
5  0.587165  0.498784  0.873495  0.383811  0.699289  0.480447
6  0.588529  0.388771  0.395757  0.745237  0.628406  0.784473
7  0.345149  0.147986  0.459451  0.310961  0.706435  0.100914
8  0.553195  0.394947  0.863494  0.585030  0.565944  0.356561
9  0.561593  0.689260  0.865243  0.136481  0.386582  0.730399
Aman
  • 38,643
  • 7
  • 32
  • 37
  • 25
    incase you get "cannot concatenate 'str' and 'list' objects" make sure you [] the str value in cols: cols = [cols[7]] + cols[:7] + cols[8:] – moeabdol Jan 09 '15 at 16:59
  • 1
    Is there any reason to use `df.ix[:, cols]` over `df[cols]`? – Snorfalorpagus Nov 05 '15 at 12:59
  • 1
    `FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union() '''`. I suggest replacing the `cols` line with `cols[-1:].union(cols[:-1])`, which yields the same result. – FooBar Oct 06 '16 at 15:54
  • 3
    @FooBar That's not a set union it's a concatenation of two ordered lists. – Aman Oct 06 '16 at 22:08
  • 3
    @Aman I'm just pointing out that your code is deprecated. Your handling of your post is at your discretion. – FooBar Oct 07 '16 at 06:13
  • 2
    @FooBar, the type of `cols` is `list`; it even allows duplicates (which will be discarded when used on the dataframe). You are thinking of `Index` objects. – alexis Feb 28 '17 at 15:19
  • 13
    This implies copying ALL the data, which is highly inefficient. I wished pandas had a way to do that without creating a copy. – Konstantin Nov 27 '17 at 08:48
597

You could also do something like this:

df = df[['mean', '0', '1', '2', '3']]

You can get the list of columns with:

cols = list(df.columns.values)

The output will produce:

['0', '1', '2', '3', 'mean']

...which is then easy to rearrange manually before dropping it into the first function

freddygv
  • 6,669
  • 1
  • 12
  • 9
  • 8
    You could also get the list of columns with list(df.columns) – Jim Oct 09 '15 at 22:14
  • 15
    or `df.columns.tolist()` – Jim Oct 09 '15 at 22:22
  • For newbies like me, re-arrange the list you get from cols. Then df=df[cols] i.e. the re-arranged list gets dropped into the first expression without only one set of brackets. – Sid Mar 20 '18 at 15:18
  • Column names will be integers in 3.x `df = df[['mean1', 0, 1, 2, 3]]` ✓ – prosti May 05 '19 at 01:10
  • 4
    I don't think this is a good answer as it does not provide code how to change column order of any dataframe. Say i import a csv file as pandas pd as ```pd.read_csv()``` . How can your answer be used to change the column order? – Robvh Jul 25 '19 at 08:22
  • 1
    @Robvh, the second line of code explains how to get the existing column names. From there, you can copy the output into the first line of code, and re-arrange as desired. The only other piece of information to know is that without a header, the default column names are integers, not strings. – daniel brandstetter Sep 09 '19 at 23:27
  • I would agree. This is a simple, straightforward solution to the problem. Though if you are dealing with large numbers of columns it's not ideal. Helped me out tho – Uralan Jul 09 '20 at 12:01
353

Just assign the column names in the order you want them:

In [39]: df
Out[39]: 
          0         1         2         3         4  mean
0  0.172742  0.915661  0.043387  0.712833  0.190717     1
1  0.128186  0.424771  0.590779  0.771080  0.617472     1
2  0.125709  0.085894  0.989798  0.829491  0.155563     1
3  0.742578  0.104061  0.299708  0.616751  0.951802     1
4  0.721118  0.528156  0.421360  0.105886  0.322311     1
5  0.900878  0.082047  0.224656  0.195162  0.736652     1
6  0.897832  0.558108  0.318016  0.586563  0.507564     1
7  0.027178  0.375183  0.930248  0.921786  0.337060     1
8  0.763028  0.182905  0.931756  0.110675  0.423398     1
9  0.848996  0.310562  0.140873  0.304561  0.417808     1

In [40]: df = df[['mean', 4,3,2,1]]

Now, 'mean' column comes out in the front:

In [41]: df
Out[41]: 
   mean         4         3         2         1
0     1  0.190717  0.712833  0.043387  0.915661
1     1  0.617472  0.771080  0.590779  0.424771
2     1  0.155563  0.829491  0.989798  0.085894
3     1  0.951802  0.616751  0.299708  0.104061
4     1  0.322311  0.105886  0.421360  0.528156
5     1  0.736652  0.195162  0.224656  0.082047
6     1  0.507564  0.586563  0.318016  0.558108
7     1  0.337060  0.921786  0.930248  0.375183
8     1  0.423398  0.110675  0.931756  0.182905
9     1  0.417808  0.304561  0.140873  0.310562
fixxxer
  • 13,562
  • 14
  • 54
  • 74
170

How about:

df.insert(0, 'mean', df.mean(1))

http://pandas.pydata.org/pandas-docs/stable/dsintro.html#column-selection-addition-deletion

Wes McKinney
  • 83,626
  • 27
  • 133
  • 107
  • 52
    Could this be a future feature add to `pandas`? something like `df.move(0,df.mean)`? – jason May 27 '14 at 01:23
  • Oh man, it even works like this `df_metadata.insert(0,'Db_name',"raw_data")` (Code not relevant to this thread) – Imad Jun 12 '18 at 09:10
  • 5
    Beautiful. And it happens in place, too. – cucu8 Aug 02 '18 at 10:29
  • 2
    This is a scalable solution since other solutions are manually typing column names. – CKM Aug 19 '19 at 17:00
  • 2
    This works for the OP's question, when creating a new column, but it doesn't for moving a column; attempt to move results in `*** ValueError: cannot insert mean, already exists` – spinup Feb 24 '20 at 21:07
  • 3
    This is a clean solution. The modern API method is: [`df.insert(0, 'mean', df['mean'])`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html?highlight=pandas%20insert#pandas.DataFrame.insert) – Sumanth Lazarus Jul 29 '20 at 05:22
  • New documentation link: https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#column-selection-addition-deletion – mins Dec 15 '20 at 09:31
  • @spinup if you already have the column, you can insert under different temporary name, drop the original column and then return temp name to the original: `df.insert(0, 'mean_temp', df['mean']) df.drop('mean', axis=1, inplace=True) df.rename(columns={'mean_temp': 'mean'}, inplace=True)` – Victor Di Mar 31 '21 at 10:58
155

In your case,

df = df.reindex(columns=['mean',0,1,2,3,4])

will do exactly what you want.

In my case (general form):

df = df.reindex(columns=sorted(df.columns))
df = df.reindex(columns=(['opened'] + list([a for a in df.columns if a != 'opened']) ))
Mr_and_Mrs_D
  • 27,070
  • 30
  • 156
  • 325
Alvaro Joao
  • 7,787
  • 9
  • 42
  • 70
78

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.

cols = ['mean']  + [col for col in df if col != 'mean']
df = df[cols]

You can also use a more general approach. In this example, the last column (indicated by -1) is inserted as the first column.

cols = [df.columns[-1]] + [col for col in df if col != df.columns[-1]]
df = df[cols]

You can also use this approach for reordering columns in a desired order if they are present in the DataFrame.

inserted_cols = ['a', 'b', 'c']
cols = ([col for col in inserted_cols if col in df] 
        + [col for col in df if col not in inserted_cols])
df = df[cols]
Kevin Markham
  • 4,396
  • 1
  • 23
  • 33
Alexander
  • 87,529
  • 23
  • 162
  • 169
74
import numpy as np
import pandas as pd
df = pd.DataFrame()
column_names = ['x','y','z','mean']
for col in column_names: 
    df[col] = np.random.randint(0,100, size=10000)

You can try out the following solutions :

Solution 1:

df = df[ ['mean'] + [ col for col in df.columns if col != 'mean' ] ]

Solution 2:


df = df[['mean', 'x', 'y', 'z']]

Solution 3:

col = df.pop("mean")
df = df.insert(0, col.name, col)

Solution 4:

df.set_index(df.columns[-1], inplace=True)
df.reset_index(inplace=True)

Solution 5:

cols = list(df)
cols = [cols[-1]] + cols[:-1]
df = df[cols]

solution 6:

order = [1,2,3,0] # setting column's order
df = df[[df.columns[i] for i in order]]

Time Comparison:

Solution 1:

CPU times: user 1.05 ms, sys: 35 µs, total: 1.08 ms Wall time: 995 µs

Solution 2:

CPU times: user 933 µs, sys: 0 ns, total: 933 µs Wall time: 800 µs

Solution 3:

CPU times: user 0 ns, sys: 1.35 ms, total: 1.35 ms Wall time: 1.08 ms

Solution 4:

CPU times: user 1.23 ms, sys: 45 µs, total: 1.27 ms Wall time: 986 µs

Solution 5:

CPU times: user 1.09 ms, sys: 19 µs, total: 1.11 ms Wall time: 949 µs

Solution 6:

CPU times: user 955 µs, sys: 34 µs, total: 989 µs Wall time: 859 µs

Pygirl
  • 10,115
  • 3
  • 20
  • 33
52

Suppose you have df with columns A B C.

The most simple way is:

df = df.reindex(['B','C','A'], axis=1)
Acumenus
  • 41,481
  • 14
  • 116
  • 107
liangli
  • 811
  • 6
  • 10
50

If your column names are too-long-to-type then you could specify the new order through a list of integers with the positions:

Data:

          0         1         2         3         4      mean
0  0.397312  0.361846  0.719802  0.575223  0.449205  0.500678
1  0.287256  0.522337  0.992154  0.584221  0.042739  0.485741
2  0.884812  0.464172  0.149296  0.167698  0.793634  0.491923
3  0.656891  0.500179  0.046006  0.862769  0.651065  0.543382
4  0.673702  0.223489  0.438760  0.468954  0.308509  0.422683
5  0.764020  0.093050  0.100932  0.572475  0.416471  0.389390
6  0.259181  0.248186  0.626101  0.556980  0.559413  0.449972
7  0.400591  0.075461  0.096072  0.308755  0.157078  0.207592
8  0.639745  0.368987  0.340573  0.997547  0.011892  0.471749
9  0.050582  0.714160  0.168839  0.899230  0.359690  0.438500

Generic example:

new_order = [3,2,1,4,5,0]
print(df[df.columns[new_order]])  

          3         2         1         4      mean         0
0  0.575223  0.719802  0.361846  0.449205  0.500678  0.397312
1  0.584221  0.992154  0.522337  0.042739  0.485741  0.287256
2  0.167698  0.149296  0.464172  0.793634  0.491923  0.884812
3  0.862769  0.046006  0.500179  0.651065  0.543382  0.656891
4  0.468954  0.438760  0.223489  0.308509  0.422683  0.673702
5  0.572475  0.100932  0.093050  0.416471  0.389390  0.764020
6  0.556980  0.626101  0.248186  0.559413  0.449972  0.259181
7  0.308755  0.096072  0.075461  0.157078  0.207592  0.400591
8  0.997547  0.340573  0.368987  0.011892  0.471749  0.639745
9  0.899230  0.168839  0.714160  0.359690  0.438500  0.050582

Although it might seem like I'm just explicitly typing the column names in a different order, the fact that there's a column 'mean' should make it clear that new_order relates to actual positions and not column names.

For the specific case of OP's question:

new_order = [-1,0,1,2,3,4]
df = df[df.columns[new_order]]
print(df)

       mean         0         1         2         3         4
0  0.500678  0.397312  0.361846  0.719802  0.575223  0.449205
1  0.485741  0.287256  0.522337  0.992154  0.584221  0.042739
2  0.491923  0.884812  0.464172  0.149296  0.167698  0.793634
3  0.543382  0.656891  0.500179  0.046006  0.862769  0.651065
4  0.422683  0.673702  0.223489  0.438760  0.468954  0.308509
5  0.389390  0.764020  0.093050  0.100932  0.572475  0.416471
6  0.449972  0.259181  0.248186  0.626101  0.556980  0.559413
7  0.207592  0.400591  0.075461  0.096072  0.308755  0.157078
8  0.471749  0.639745  0.368987  0.340573  0.997547  0.011892
9  0.438500  0.050582  0.714160  0.168839  0.899230  0.359690

The main problem with this approach is that calling the same code multiple times will create different results each time, so one needs to be careful :)

Yuca
  • 5,048
  • 2
  • 17
  • 38
24

This question has been answered before but reindex_axis is deprecated now so I would suggest to use:

df = df.reindex(sorted(df.columns), axis=1)

For those who want to specify the order they want instead of just sorting them, here's the solution spelled out:

df = df.reindex(['the','order','you','want'], axis=1)

Now, how you want to sort the list of column names is really not a pandas question, that's a Python list manipulation question. There are many ways of doing that, and I think this answer has a very neat way of doing it.

Acumenus
  • 41,481
  • 14
  • 116
  • 107
dmvianna
  • 12,370
  • 17
  • 69
  • 101
19

I ran into a similar question myself, and just wanted to add what I settled on. I liked the reindex_axis() method for changing column order. This worked:

df = df.reindex_axis(['mean'] + list(df.columns[:-1]), axis=1)

An alternate method based on the comment from @Jorge:

df = df.reindex(columns=['mean'] + list(df.columns[:-1]))

Although reindex_axis seems to be slightly faster in micro benchmarks than reindex, I think I prefer the latter for its directness.

clocker
  • 1,246
  • 9
  • 15
  • 7
    This was a nice solution, but reindex_axis will be deprecated. I used reindex, and it worked just fine. – Jorge Aug 08 '18 at 21:32
  • I may miss something but 1/ you likely forgot to include `axis=1` in this second solution to use the columns, not the rows. 2/ In 2020, the `reindex` solution changes the rows/columns order, but also clears data (`NaN` everywhere). – mins Dec 15 '20 at 10:30
18

This function avoids you having to list out every variable in your dataset just to order a few of them.

def order(frame,var):
    if type(var) is str:
        var = [var] #let the command take a string or list
    varlist =[w for w in frame.columns if w not in var]
    frame = frame[var+varlist]
    return frame 

It takes two arguments, the first is the dataset, the second are the columns in the data set that you want to bring to the front.

So in my case I have a data set called Frame with variables A1, A2, B1, B2, Total and Date. If I want to bring Total to the front then all I have to do is:

frame = order(frame,['Total'])

If I want to bring Total and Date to the front then I do:

frame = order(frame,['Total','Date'])

EDIT:

Another useful way to use this is, if you have an unfamiliar table and you're looking with variables with a particular term in them, like VAR1, VAR2,... you may execute something like:

frame = order(frame,[v for v in frame.columns if "VAR" in v])
seeiespi
  • 2,448
  • 1
  • 28
  • 33
18

I think this is a slightly neater solution:

df.insert(0, 'mean', df.pop("mean"))

This solution is somewhat similar to @JoeHeffer 's solution but this is one liner.

Here we remove the column "mean" from the dataframe and attach it to index 0 with the same column name.

Acumenus
  • 41,481
  • 14
  • 116
  • 107
erncyp
  • 1,164
  • 12
  • 19
17

Simply do,

df = df[['mean'] + df.columns[:-1].tolist()]
Napitupulu Jon
  • 6,589
  • 2
  • 20
  • 22
  • TypeError: Can't convert 'int' object to str implicitly – parvij Jan 05 '16 at 11:10
  • could be API has changed, you can also do this... `order = df.columns.tolist()` `df['mean'] = df.mean(1)` `df.columns = ['mean'] + order` – Napitupulu Jon Jan 08 '16 at 10:28
  • 1
    A variation of this worked well for me. With an existing list, `headers`, that was used to create a dict that was then used to create the DataFrame, I called `df.reindex(columns=headers)`. The only problem I ran into was I had already called `df.set_index('some header name', inplace=True)`, so when the reindex was done, it added another column named `some header name` since the original column was now the index. As for the syntax specified above, `['mean'] + df.columns` in the python interpreter gives me `Index(u'meanAddress', u'meanCity', u'meanFirst Name'...` – hlongmore Jun 20 '17 at 19:41
  • 1
    @hlongmore: I don't know your prior code is, but the edit should work (using 0.19.2) – Napitupulu Jon Jun 21 '17 at 00:56
  • The edit does indeed work (I'm on 0.20.2). In my case, I've already got the columns I want, so I think df.reindex() is what I really should use. – hlongmore Jun 22 '17 at 17:49
  • This answer appears to be wrong: it does not move the mean column so that it is the first column. Rather, it renames the first column to "mean" and renames the last column to 4. I was almost misled to doing something very wrong, but luckily checked it myself first. I would suggest doing something like: "df = df[list(reversed(df.columns))]" instead, or see some of the other answers for variations on this. – Moot Jul 12 '17 at 06:21
  • Oops how could I miss that? Prior to my edit before, it's still works though. I've re-edited my answer. – Napitupulu Jon Jul 12 '17 at 08:27
12

You could do the following (borrowing parts from Aman's answer):

cols = df.columns.tolist()
cols.insert(0, cols.pop(-1))

cols
>>>['mean', 0L, 1L, 2L, 3L, 4L]

df = df[cols]
otteheng
  • 551
  • 6
  • 24
10

Just type the column name you want to change, and set the index for the new location.

def change_column_order(df, col_name, index):
    cols = df.columns.tolist()
    cols.remove(col_name)
    cols.insert(index, col_name)
    return df[cols]

For your case, this would be like:

df = change_column_order(df, 'mean', 0)
ccerhan
  • 552
  • 5
  • 7
8

Moving any column to any position:

import pandas as pd
df = pd.DataFrame({"A": [1,2,3], 
                   "B": [2,4,8], 
                   "C": [5,5,5]})

cols = df.columns.tolist()
column_to_move = "C"
new_position = 1

cols.insert(new_position, cols.pop(cols.index(column_to_move)))
df = df[cols]
pomber
  • 19,363
  • 9
  • 71
  • 90
7

Here's a way to move one existing column that will modify the existing data frame in place.

my_column = df.pop('column name')
df.insert(3, my_column.name, my_column)
Joe Heffer
  • 345
  • 3
  • 7
  • This is pretty much the only good approach as it's in-place. Most other approaches are not in-place and are therefore not scalable. – Acumenus Jan 15 '21 at 15:41
6

I wanted to bring two columns in front from a dataframe where I do not know exactly the names of all columns, because they are generated from a pivot statement before. So, if you are in the same situation: To bring columns in front that you know the name of and then let them follow by "all the other columns", I came up with the following general solution:

df = df.reindex_axis(['Col1','Col2'] + list(df.columns.drop(['Col1','Col2'])), axis=1)
Acumenus
  • 41,481
  • 14
  • 116
  • 107
matthhias
  • 191
  • 2
  • 5
6

Here is a very simple answer to this(only one line).

You can do that after you added the 'n' column into your df as follows.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10, 5))
df['mean'] = df.mean(1)
df
           0           1           2           3           4        mean
0   0.929616    0.316376    0.183919    0.204560    0.567725    0.440439
1   0.595545    0.964515    0.653177    0.748907    0.653570    0.723143
2   0.747715    0.961307    0.008388    0.106444    0.298704    0.424512
3   0.656411    0.809813    0.872176    0.964648    0.723685    0.805347
4   0.642475    0.717454    0.467599    0.325585    0.439645    0.518551
5   0.729689    0.994015    0.676874    0.790823    0.170914    0.672463
6   0.026849    0.800370    0.903723    0.024676    0.491747    0.449473
7   0.526255    0.596366    0.051958    0.895090    0.728266    0.559587
8   0.818350    0.500223    0.810189    0.095969    0.218950    0.488736
9   0.258719    0.468106    0.459373    0.709510    0.178053    0.414752


### here you can add below line and it should work 
# Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.

df = df[list(('mean',0, 1, 2,3,4))]
df

        mean           0           1           2           3           4
0   0.440439    0.929616    0.316376    0.183919    0.204560    0.567725
1   0.723143    0.595545    0.964515    0.653177    0.748907    0.653570
2   0.424512    0.747715    0.961307    0.008388    0.106444    0.298704
3   0.805347    0.656411    0.809813    0.872176    0.964648    0.723685
4   0.518551    0.642475    0.717454    0.467599    0.325585    0.439645
5   0.672463    0.729689    0.994015    0.676874    0.790823    0.170914
6   0.449473    0.026849    0.800370    0.903723    0.024676    0.491747
7   0.559587    0.526255    0.596366    0.051958    0.895090    0.728266
8   0.488736    0.818350    0.500223    0.810189    0.095969    0.218950
9   0.414752    0.258719    0.468106    0.459373    0.709510    0.178053

rra
  • 372
  • 2
  • 10
6

You can use a set which is an unordered collection of unique elements to do keep the "order of the other columns untouched":

other_columns = list(set(df.columns).difference(["mean"])) #[0, 1, 2, 3, 4]

Then, you can use a lambda to move a specific column to the front by:

In [1]: import numpy as np                                                                               

In [2]: import pandas as pd                                                                              

In [3]: df = pd.DataFrame(np.random.rand(10, 5))                                                         

In [4]: df["mean"] = df.mean(1)                                                                          

In [5]: move_col_to_front = lambda df, col: df[[col]+list(set(df.columns).difference([col]))]            

In [6]: move_col_to_front(df, "mean")                                                                    
Out[6]: 
       mean         0         1         2         3         4
0  0.697253  0.600377  0.464852  0.938360  0.945293  0.537384
1  0.609213  0.703387  0.096176  0.971407  0.955666  0.319429
2  0.561261  0.791842  0.302573  0.662365  0.728368  0.321158
3  0.518720  0.710443  0.504060  0.663423  0.208756  0.506916
4  0.616316  0.665932  0.794385  0.163000  0.664265  0.793995
5  0.519757  0.585462  0.653995  0.338893  0.714782  0.305654
6  0.532584  0.434472  0.283501  0.633156  0.317520  0.994271
7  0.640571  0.732680  0.187151  0.937983  0.921097  0.423945
8  0.562447  0.790987  0.200080  0.317812  0.641340  0.862018
9  0.563092  0.811533  0.662709  0.396048  0.596528  0.348642

In [7]: move_col_to_front(df, 2)                                                                         
Out[7]: 
          2         0         1         3         4      mean
0  0.938360  0.600377  0.464852  0.945293  0.537384  0.697253
1  0.971407  0.703387  0.096176  0.955666  0.319429  0.609213
2  0.662365  0.791842  0.302573  0.728368  0.321158  0.561261
3  0.663423  0.710443  0.504060  0.208756  0.506916  0.518720
4  0.163000  0.665932  0.794385  0.664265  0.793995  0.616316
5  0.338893  0.585462  0.653995  0.714782  0.305654  0.519757
6  0.633156  0.434472  0.283501  0.317520  0.994271  0.532584
7  0.937983  0.732680  0.187151  0.921097  0.423945  0.640571
8  0.317812  0.790987  0.200080  0.641340  0.862018  0.562447
9  0.396048  0.811533  0.662709  0.596528  0.348642  0.563092
6

You can reorder the dataframe columns using a list of names with:

df = df.filter(list_of_col_names)

Sam Murphy
  • 435
  • 5
  • 18
5

Just flipping helps often.

df[df.columns[::-1]]

Or just shuffle for a look.

import random
cols = list(df.columns)
random.shuffle(cols)
df[cols]
plhn
  • 4,091
  • 4
  • 32
  • 44
4

A pretty straightforward solution that worked for me is to use .reindex on df.columns:

df = df[df.columns.reindex(['mean', 0, 1, 2, 3, 4])[0]]
Acumenus
  • 41,481
  • 14
  • 116
  • 107
CSQL
  • 106
  • 1
  • 3
3

How about using T?

df = df.T.reindex(['mean', 0, 1, 2, 3, 4]).T
Acumenus
  • 41,481
  • 14
  • 116
  • 107
ZEE
  • 188
  • 1
  • 10
3

A simple approach is using set(), in particular when you have a long list of columns and do not want to handle them manually:

cols = list(set(df.columns.tolist()) - set(['mean']))
cols.insert(0, 'mean')
df = df[cols]
Acumenus
  • 41,481
  • 14
  • 116
  • 107
Shoresh
  • 2,025
  • 12
  • 9
  • 2
    One caution: the order of columns goes away if you put it into set – pnv Mar 06 '18 at 05:31
  • Interesting! @user1930402 I have tried the approach above on several occasions and never had any problem. I will double check again. – Shoresh Mar 06 '18 at 14:50
3

You can use reindex which can be used for both axis:

df
#           0         1         2         3         4      mean
# 0  0.943825  0.202490  0.071908  0.452985  0.678397  0.469921
# 1  0.745569  0.103029  0.268984  0.663710  0.037813  0.363821
# 2  0.693016  0.621525  0.031589  0.956703  0.118434  0.484254
# 3  0.284922  0.527293  0.791596  0.243768  0.629102  0.495336
# 4  0.354870  0.113014  0.326395  0.656415  0.172445  0.324628
# 5  0.815584  0.532382  0.195437  0.829670  0.019001  0.478415
# 6  0.944587  0.068690  0.811771  0.006846  0.698785  0.506136
# 7  0.595077  0.437571  0.023520  0.772187  0.862554  0.538182
# 8  0.700771  0.413958  0.097996  0.355228  0.656919  0.444974
# 9  0.263138  0.906283  0.121386  0.624336  0.859904  0.555009

df.reindex(['mean', *range(5)], axis=1)

#        mean         0         1         2         3         4
# 0  0.469921  0.943825  0.202490  0.071908  0.452985  0.678397
# 1  0.363821  0.745569  0.103029  0.268984  0.663710  0.037813
# 2  0.484254  0.693016  0.621525  0.031589  0.956703  0.118434
# 3  0.495336  0.284922  0.527293  0.791596  0.243768  0.629102
# 4  0.324628  0.354870  0.113014  0.326395  0.656415  0.172445
# 5  0.478415  0.815584  0.532382  0.195437  0.829670  0.019001
# 6  0.506136  0.944587  0.068690  0.811771  0.006846  0.698785
# 7  0.538182  0.595077  0.437571  0.023520  0.772187  0.862554
# 8  0.444974  0.700771  0.413958  0.097996  0.355228  0.656919
# 9  0.555009  0.263138  0.906283  0.121386  0.624336  0.859904
silgon
  • 5,517
  • 4
  • 36
  • 62
3

Hackiest method in the book

df.insert(0, "test", df["mean"])
df = df.drop(columns=["mean"]).rename(columns={"test": "mean"})
Acumenus
  • 41,481
  • 14
  • 116
  • 107
Kaustubh J
  • 552
  • 5
  • 8
2

I believe @Aman's answer is the best if you know the location of the other column.

If you don't know the location of mean, but only have its name, you cannot resort directly to cols = cols[-1:] + cols[:-1]. Following is the next-best thing I could come up with:

meanDf = pd.DataFrame(df.pop('mean'))
# now df doesn't contain "mean" anymore. Order of join will move it to left or right:
meanDf.join(df) # has mean as first column
df.join(meanDf) # has mean as last column
Community
  • 1
  • 1
FooBar
  • 13,456
  • 10
  • 65
  • 140
2

I liked Shoresh's answer to use set functionality to remove columns when you don't know the location, however this didn't work for my purpose as I need to keep the original column order (which has arbitrary column labels).

I got this to work though by using IndexedSet from the boltons package.

I also needed to re-add multiple column labels, so for a more general case I used the following code:

from boltons.setutils import IndexedSet
cols = list(IndexedSet(df.columns.tolist()) - set(['mean', 'std']))
cols[0:0] =['mean', 'std']
df = df[cols]

Hope this is useful to anyone searching this thread for a general solution.

Jamelade
  • 29
  • 1
  • I am a bit surprised! I use `set` for this purpose quite often and never had to deal with ordering. – Shoresh Oct 02 '19 at 22:09
2

Here is a function to do this for any number of columns.

def mean_first(df):
    ncols = df.shape[1]        # Get the number of columns
    index = list(range(ncols)) # Create an index to reorder the columns
    index.insert(0,ncols)      # This puts the last column at the front
    return(df.assign(mean=df.mean(1)).iloc[:,index]) # new df with last column (mean) first
freeB
  • 51
  • 3
2

Most of the answers did not generalize enough and pandas reindex_axis method is a little tedious, hence I offer a simple function to move an arbitrary number of columns to any position using a dictionary where key = column name and value = position to move to. If your dataframe is large pass True to 'big_data' then the function will return the ordered columns list. And you could use this list to slice your data.

def order_column(df, columns, big_data = False):

    """Re-Orders dataFrame column(s)
       Parameters : 
       df      -- dataframe
       columns -- a dictionary:
                  key   = current column position/index or column name
                  value = position to move it to  
       big_data -- boolean 
                  True = returns only the ordered columns as a list
                          the user user can then slice the data using this
                          ordered column
                  False = default - return a copy of the dataframe
    """
    ordered_col = df.columns.tolist()

    for key, value in columns.items():

        ordered_col.remove(key)
        ordered_col.insert(value, key)

    if big_data:

        return ordered_col

    return df[ordered_col]

# e.g.
df = pd.DataFrame({'chicken wings': np.random.rand(10, 1).flatten(), 'taco': np.random.rand(10,1).flatten(),
                          'coffee': np.random.rand(10, 1).flatten()})
df['mean'] = df.mean(1)

df = order_column(df, {'mean': 0, 'coffee':1 })

>>>

output

col = order_column(df, {'mean': 0, 'coffee':1 }, True)

col
>>>
['mean', 'coffee', 'chicken wings', 'taco']

# you could grab it by doing this

df = df[col]

escha
  • 169
  • 1
  • 6
1

I have a very specific use case for re-ordering column names in pandas. Sometimes I am creating a new column in a dataframe that is based on an existing column. By default pandas will insert my new column at the end, but I want the new column to be inserted next to the existing column it's derived from.

enter image description here

def rearrange_list(input_list, input_item_to_move, input_item_insert_here):
    '''
    Helper function to re-arrange the order of items in a list.
    Useful for moving column in pandas dataframe.

    Inputs:
        input_list - list
        input_item_to_move - item in list to move
        input_item_insert_here - item in list, insert before 

    returns:
        output_list
    '''
    # make copy for output, make sure it's a list
    output_list = list(input_list)

    # index of item to move
    idx_move = output_list.index(input_item_to_move)

    # pop off the item to move
    itm_move = output_list.pop(idx_move)

    # index of item to insert here
    idx_insert = output_list.index(input_item_insert_here)

    # insert item to move into here
    output_list.insert(idx_insert, itm_move)

    return output_list


import pandas as pd

# step 1: create sample dataframe
df = pd.DataFrame({
    'motorcycle': ['motorcycle1', 'motorcycle2', 'motorcycle3'],
    'initial_odometer': [101, 500, 322],
    'final_odometer': [201, 515, 463],
    'other_col_1': ['blah', 'blah', 'blah'],
    'other_col_2': ['blah', 'blah', 'blah']
})
print('Step 1: create sample dataframe')
display(df)
print()

# step 2: add new column that is difference between final and initial
df['change_odometer'] = df['final_odometer']-df['initial_odometer']
print('Step 2: add new column')
display(df)
print()

# step 3: rearrange columns
ls_cols = df.columns
ls_cols = rearrange_list(ls_cols, 'change_odometer', 'final_odometer')
df=df[ls_cols]
print('Step 3: rearrange columns')
display(df)
1

I think this function is more straightforward. You Just need to specify a subset of columns at the start or the end or both:

def reorder_df_columns(df, start=None, end=None):
    """
        This function reorder columns of a DataFrame.
        It takes columns given in the list `start` and move them to the left.
        Its also takes columns in `end` and move them to the right.
    """
    if start is None:
        start = []
    if end is None:
        end = []
    assert isinstance(start, list) and isinstance(end, list)
    cols = list(df.columns)
    for c in start:
        if c not in cols:
            start.remove(c)
    for c in end:
        if c not in cols or c in start:
            end.remove(c)
    for c in start + end:
        cols.remove(c)
    cols = start + cols + end
    return df[cols]
hayj
  • 710
  • 8
  • 17
1

Similar to the top answer, there is an alternative using deque() and its rotate() method. The rotate method takes the last element in the list and inserts it to the beginning:

from collections import deque

columns = deque(df.columns.tolist())
columns.rotate()

df = df[columns]
abc
  • 55
  • 5
0

To set an existing column right/left of another, based on their names:

def df_move_column(df, col_to_move, col_left_of_destiny="", right_of_col_bool=True):
    cols = list(df.columns.values)
    index_max = len(cols) - 1

    if not right_of_col_bool:
        # set left of a column "c", is like putting right of column previous to "c"
        # ... except if left of 1st column, then recursive call to set rest right to it
        aux = cols.index(col_left_of_destiny)
        if not aux:
            for g in [x for x in cols[::-1] if x != col_to_move]:
                df = df_move_column(
                        df, 
                        col_to_move=g, 
                        col_left_of_destiny=col_to_move
                        )
            return df
        col_left_of_destiny = cols[aux - 1]

    index_old = cols.index(col_to_move)
    index_new = 0
    if len(col_left_of_destiny):
        index_new = cols.index(col_left_of_destiny) + 1

    if index_old == index_new:
        return df

    if index_new < index_old:
        index_new = np.min([index_new, index_max])
        cols = (
            cols[:index_new]
            + [cols[index_old]]
            + cols[index_new:index_old]
            + cols[index_old + 1 :]
        )
    else:
        cols = (
            cols[:index_old]
            + cols[index_old + 1 : index_new]
            + [cols[index_old]]
            + cols[index_new:]
        )

    df = df[cols]
    return df

E.g.

cols = list("ABCD")
df2 = pd.DataFrame(np.arange(4)[np.newaxis, :], columns=cols)
for k in cols:
    print(30 * "-")
    for g in [x for x in cols if x != k]:
        df_new = df_move_column(df2, k, g)
        print(f"{k} after {g}:  {df_new.columns.values}")
for k in cols:
    print(30 * "-")
    for g in [x for x in cols if x != k]:
        df_new = df_move_column(df2, k, g, right_of_col_bool=False)
        print(f"{k} before {g}:  {df_new.columns.values}")

Output:

enter image description here

Xopi García
  • 111
  • 1
  • 7
0

I thought of the same as Dmitriy Work, clearly easiest answer:

df["mean"] = df.mean(1)
l =  list(np.arange(0,len(df.columns) -1 ))
l.insert(0,-1)
df.iloc[:,l]
S.Baum
  • 117
  • 7
-2

DataFrame.sort_index(axis=1) is quite clean.Check doc here. And then concat

Shihe Zhang
  • 1,977
  • 4
  • 30
  • 49
  • Though the function `sort_index` is clean, I am sorry to say, but I don't think that it addresses the question, which is more about rearranging the columns (which may or may not follow any particular order) than sorting them. – Mitali Cyrus Feb 21 '20 at 05:12