14

The to_csv method of pandas does not preserve the order of columns. It chooses to alphabetically arrange the columns in CSV. This is a bug and has been reported and is supposed to be corrected in version 0.11.0. I have 0.18.0.

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [a],
                   'V_pod_used' : [b],
                   'U_sol_type' : [c]
                                ...
                                ... and so on upto 50 columns }

pd.to_csv(df)

Excel order:

0   U_sol type          V_pod_error      V_pod_used      ...
1

What I want is order in the dictionary:

0   V_pod_error      V_pod_used          U_sol type     ...
1

I have a huge number of columns and names. I cannot do it manually or write out the column order. There has been the exact same question in 2013 here. And it doesnt look like there is an update! I would like to ask the community to help me out! This is really problematic.

Community
  • 1
  • 1
agent18
  • 780
  • 1
  • 9
  • 24
  • Yes, I am constructing the data frame in a loop with the above commands. Will check out ordered dict. Thanks. The thing is I kept updating in a quick and dirty manner as and when I needed things. Now to work with it is really hard. In order to add a column to the end I prefix the column name with 'z'(quick and dirty). Any other suggestions of simple manipulations to my existing code to get the desired as defined output? – agent18 May 06 '16 at 12:27

3 Answers3

19

Try the following solution. Even I faced the same issue. I solved it as follows:

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [a],
                   'V_pod_used' : [b],
                   'U_sol_type' : [c]
                                ...
                                ... and so on upto 50 columns }

column_order = ['V_pod_error', 'V_pod_used', 'U_sol_type',.....# upto 50 column names]

df[column_order].to_csv(file_name)
Shawn
  • 159
  • 1
  • 13
10

I think problem is in DataFrame constructor, because you need add parameter columns for custom ordering of columns. If you dont set parameter columns, columns are ordered alphanumerical.

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [0,2],
                   'V_pod_used' : [6,4],
                   'U_sol_type' : [7,8]})
print df
   U_sol_type  V_pod_error  V_pod_used
0           7            0           6
1           8            2           4

print df.to_csv()
,U_sol_type,V_pod_error,V_pod_used
0,7,0,6
1,8,2,4


df1 = pd.DataFrame({'V_pod_error' : [0,2],
                   'V_pod_used' : [6,4],
                   'U_sol_type' : [7,8]}, 
                    columns=['V_pod_error','V_pod_used','U_sol_type'])

print df1
   V_pod_error  V_pod_used  U_sol_type
0            0           6           7
1            2           4           8

print df1.to_csv()
,V_pod_error,V_pod_used,U_sol_type
0,0,6,7
1,2,4,8

EDIT:

Another solution is set order of column by subset before write to_csv (thanks Mathias711):

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [0,2],
                   'V_pod_used' : [6,4],
                   'U_sol_type' : [7,8]})
print df
   U_sol_type  V_pod_error  V_pod_used
0           7            0           6
1           8            2           4

df = df[['V_pod_error','V_pod_used','U_sol_type']]
print df

   V_pod_error  V_pod_used  U_sol_type
0            0           6           7
1            2           4           8

EDIT1: Maybe help first convert dict to OrderedDict and then create DataFrame:

import collections
import pandas as pd


d = {'V_pod_error' : [0,2],'V_pod_used' : [6,4], 'U_sol_type' : [7,8]}
print d
{'V_pod_error': [0, 2], 'V_pod_used': [6, 4], 'U_sol_type': [7, 8]}

print pd.DataFrame(d)
   U_sol_type  V_pod_error  V_pod_used
0           7            0           6
1           8            2           4

d1 = collections.OrderedDict(d)
print d1
OrderedDict([('V_pod_error', [0, 2]), ('V_pod_used', [6, 4]), ('U_sol_type', [7, 8])])

print pd.DataFrame(d1)
   V_pod_error  V_pod_used  U_sol_type
0            0           6           7
1            2           4           8
Community
  • 1
  • 1
jezrael
  • 629,482
  • 62
  • 918
  • 895
  • Can it also help to call `df = df[[columns]]` right before `df.to_csv()`? Normally this will change the order of your columns – Mathias711 May 06 '16 at 11:10
  • Yes, this is another option. Thanks. I add it to answer. – jezrael May 06 '16 at 11:11
  • @jexrael Thanks for your immediate response. But, I am not sure we are on the same page. I understand that we need to do it manually. But I dont want to do it manually by writing out the column structure as I have too many columns. I want the dataframe column structure to be preserved. I want what is in the data frame to show up in the csv without any fuss. – agent18 May 06 '16 at 12:06
  • @Mathias711 Are you saying that if I do df = "df[[columns]]" (the exact same words as you have written), then it should work? I am just wondering if "df = df[[columns]]" means df[[col1,col2,col3]]. – agent18 May 06 '16 at 12:06
  • @ThejKiran - But I think it is other problem - how you can change sort of columns to custom? Because if creates dataframe without parameter columns, they are alphanumeric sorted. And what is rule for custum sorting? I think help only Mathias solution, see edit in my answer. Btw, for creating DataFrame is used Dictionary, so sorting in default dictionary is impossible. – jezrael May 06 '16 at 12:13
  • @jexrael the most important take away from this is that the problem is wit the df and not to_csv. STUD point! Should I edit the question to match your point? – agent18 May 06 '16 at 12:17
  • @ThejKiran What is order in `print df.columns` ? I think it is `Excel` order. – jezrael May 06 '16 at 12:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/111223/discussion-between-thej-kiran-and-jezrael). – agent18 May 06 '16 at 12:27
3

Try with:

df.to_csv(file_name, sep=',', encoding='utf-8', header=True, columns=["Col1","Col2","Col3","Col4"])

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Biranchi
  • 15,173
  • 20
  • 115
  • 154