Regex For Adding Digits Within Alphanumeric In Python

Question

How does one add three digits within an alphanumeric string using regular expressions in Python?

For instance, I want to add three zeroes after the dash sign -, but before the last digit in the string, in way to make A1-1 be A1-0001 instead.

My guess was:

df['column'].str.replace('(^C3-\d{1)$)', ???)

If there has to be a last digit, you could try `^([A-Z]\d-(?=\d+$))` and replace with `\1000` — The fourth bird, Jan 29 '20 at 22:00
Not quite sure I understand. it looks like you are looking for C3-#{1 where the # is a number. But your example A1-1 doesn;t match that. Can you give a real example of your data. — AlwaysData, Jan 29 '20 at 22:06

Wiktor Stribiżew · Accepted Answer · 2020-01-29T22:11:29.590

1

You may use

df['column'] = df['column'].str.replace(r'^(C3-)(\d)$', r'\g<1>000\2')

See the regex demo. If C can be any uppercase ASCII letter, replace it with [A-Z].

Or, a bit more generic for 1-3 digit numbers:

df['column'] = df['column'].str.replace(r'^(C3-)(\d{1,3})$', lambda x: "{}{}".format(x.group(1), x.group(2).zfill(4)))

Details

^ - start
(C3-) - Group 1: C3-
(\d) - Group 2: a digit (\d{1,3} matches 1 to 3 digits)
$ - end of string
\g<1> - value of Group 1
000 - three zeros
\2 - value of Group 2

A Python test:

import pandas as pd
df = pd.DataFrame({'column': ['C3-1', 'C3-12', 'C3-123', 'C3-1234']})
df['column'] = df['column'].str.replace(r'^(C3-)(\d{1,3})$', lambda x: "{}{}".format(x.group(1), x.group(2).zfill(4)))

Output:

>>> df
    column
0  C3-0001
1  C3-0012
2  C3-0123
3  C3-1234

edited Jan 29 '20 at 22:11

answered Jan 29 '20 at 22:01

Wiktor Stribiżew

484,719
26
302
397

I have a question on r'\g<1>000\2' : how come there is only <> for 1 and not for 2 at the end? I am not used to using g<> regex. So can you please elaborate on this? – Seunghoon Jung Jan 30 '20 at 16:08
@SeunghoonJung `\g<>` is an unambiguous version of a `\1` backreference. It is only necessary if there is a number after a backreference. See [this thread](https://stackoverflow.com/questions/5984633/python-re-sub-group-number-after-number) for more details. – Wiktor Stribiżew Jan 30 '20 at 16:10

accdias · Answer 2 · 2020-01-29T22:36:34.837

1

Here is an alternative without regular expressions:

df = pd.DataFrame({'C': ['A2-2', 'A3-001', 'C3-1', 'C3-12', 'C3-123', 'C3-1234']})
df

Output:

df.C = df.C.apply(lambda _: _[:_.index('-') + 1] + _[_.index('-') + 1:].zfill(4))
df

Output:

edited Jan 29 '20 at 22:36

answered Jan 29 '20 at 22:29

accdias

3,827
2
15
28

Regex For Adding Digits Within Alphanumeric In Python

2 Answers2