Automatic leading zero for same column names in python

Question

I read csv with more than 300 columns. Many columns have similar names.
Shortened example from the csv:

index   d     c     a     b     b     a     a     a
0       data  data  data  data  data  data  data  data

Python automatically adds numbers at the end of columns names (if they are similar) as a suffix to ensure each column has a unique name.
Example:

index   d     c     a     b     b.1   a.1   a.2   a.3
0       data  data  data  data  data  data  data  data

My assignment is to sort the columns alphabetically and to add zeros (leading zeros) to the suffix.
Desired output:

index   a     a.01  a.02  a.03  b     b.01  c     d
0       data  data  data  data  data  data  data  data

My code below (taken from stackoverflow) can sort the column. But I have no idea how to make suffix with leading zeros?

import pandas as pd
df= pd.read_csv(r"C:\Users\.....\file.csv", skipinitialspace=False, sep=';', header= 0, index_col = 'DateTime', low_memory=False)

df = df.sort_index(axis=1) # sort the column
pd.set_option('display.max_columns', None) # view all columns without truncated

Any ideas?

Edited question
After sorting the columns, I want,

a.01 a02 a12

rather than

a.01 a.012 a.02

if you had a column `a.12`, would you want to just leave it as `a.12`? Also what about three digit numbers, do you need to consider those? — Dan, Nov 12 '18 at 14:08
@ k.koen, please use `df.columns` for this purpose as its easy to use for your case reference . — Karn Kumar, Nov 12 '18 at 14:43
@Dan, your point is what exactly I wanted to consider. After sorting columns, I want a.01 a02 a12 rather than a.01 a.012 a.02. Question edited. — k.ko3n, Nov 12 '18 at 15:28

Joe · Answer 1 · 2018-11-12T14:15:26.477

3

You could do in this way:

import re
df = df.rename(columns=lambda x: re.sub('\.','.0',x))

or

df = df.rename(columns=lambda x: x.replace('.','.0'))

Output:

      a  a.01  a.02  a.03     b  b.01     c     d
0  data  data  data  data  data  data  data  data

Or another option in which you can decide the number of leading zeros:

df = df.rename(columns=lambda x: x.split('.')[0] + '.' + x.split('.')[1].zfill(2) if '.' in x else x)

With this last option if the name of a column is b.10, it wont be changed.

edited Nov 12 '18 at 14:15

answered Nov 12 '18 at 14:05

Joe

9,817
4
32
43

2

This will turn `a.10` to `a.010`, worth clarifying what the desired behaviour for double digit numbers is – Dan Nov 12 '18 at 14:07
it would be a lot simpler to just use string interpolation with `02d`. So `lambda x: f"{x.split('.')[0]}.{x.split('.')[1]:02d}"` but piRSquared's solution doesn't require calling `split` twice – Dan Nov 12 '18 at 14:26
@Dan, i believe `df.columns.str` will be simple & easy approach. – Karn Kumar Nov 12 '18 at 14:48

score 3 · Answer 2 · answered Nov 12 '18 at 14:11

`rename`

def f(x):
  a, *b = x.rsplit('.', 1)

  return f"{a}.{int(b[0]):02d}" if b else a

df.rename(columns=f).sort_index(1)

          a  a.01  a.02  a.03     b  b.01     c     d
index                                                
0      data  data  data  data  data  data  data  data

Karn Kumar · Accepted Answer · 2018-11-12T14:57:50.183

With df.columns.str

There are answers posted, but would like to add one more easy solution:

Since df.columns is an Index object, we can use the .str accessor.

Here is he Doc refence Doc

Just a short simulated example:

>>> df
    b.1   a.1   a.2
0  data  data  data
1  data  data  data

Replace the desired columns directly to DataFrame as follows:

>>> df.columns = df.columns.str.replace('.','.0')
>>> df
   b.01  a.01  a.02
0  data  data  data
1  data  data  data

OR below already given but better to use inplace=True rather assigning to df.

df.rename(columns=lambda x: x.replace('.','.0'), inplace=True)

Another way around using axis:

df.rename(lambda x: x.replace('.','.0'), axis='columns', inplace=True)
print(df)
   b.01  a.01  a.02
0  data  data  data
1  data  data  data

Geeocode · Answer 4 · 2018-11-12T15:10:40.290

Using a string formatting and mainly a single row code:

import pandas as pd

df = pd.DataFrame( [["data", "data", "data", "data", "data", "data", "data", "data"], 
                    ["data", "data", "data", "data", "data", "data", "data", "data"]], 
                   columns=["a", "a.1", "a.2", "a.3", "b", "b.1", "c", "d"])

df.columns = [c.replace(c[c.find(".")+1], format(int(c[c.find(".")+1]),'02') if "." in c else c) for c in df.columns]

print(df)

Out:

      a  a.01  a.02  a.03     b  b.01     c     d
0  data  data  data  data  data  data  data  data
1  data  data  data  data  data  data  data  data

Automatic leading zero for same column names in python

4 Answers4

rename

`rename`