-2

We have a 6 character strings that need to have the starting substring "00" replaced with "A".

Using the expression ^[0][0]* on the first string '001234', we get the expected result of A1234.

import re

# 1: Works fine
foo = '001234'
match = re.match(r"^[0][0][0-9]{4}$", foo)
print(match.group(0))       # 001234

bar = re.sub(r"^[0][0]*", 'A', match.group(0))
print(bar)                  # A1234

However, the second string '000123' was changed to A123 instead of A0123.

# 2: Substitutes more than needed
foo = '000123'
match = re.match(r"^[0][0][0-9]{4}$", foo)
print(match.group(0))       # 000123

bar = re.sub(r"^[0][0]*", 'A', match.group(0))
print(bar)                  # A123
                            # Expects: A0123

What went wrong with the regex pattern, and how can we fix it?

Tomerikoo
  • 12,112
  • 9
  • 27
  • 37
Athena Wisdom
  • 4,261
  • 5
  • 20
  • 28
  • 1
    Are you mistaking regexes for globs? `*` says the previous character (or group) repeats 0 or more times, it doesn't mean "allow anything here" like it does in globs. So `^[0][0]*` says to look for something that starts with at least one `0` and match all the leading zeroes (a shorter spelling would be `^0+`). – ShadowRanger Aug 27 '20 at 16:58
  • If you just have one character, e.g. `0` you can just write it as `0` in a regex instead of using character classes, i.e. `[0]` for simplicity – Alex W Aug 27 '20 at 17:03

1 Answers1

0

You just need to specify the number of zeros at the beginning of the line that you need to be replace,

foo = '000100'
re.sub(r'^0{2}', r'A', foo)

'A0100'
Vaishali
  • 32,439
  • 4
  • 39
  • 71
  • Why is `r'A'` used instead of `'A'`? – Athena Wisdom Aug 27 '20 at 17:52
  • 2
    @AthenaWisdom: It doesn't matter in this case, but it's safest to use raw strings for both regex patterns and substitutions, so backslashes aren't inadvertently ignored or misinterpreted. When there are no backslashes, it's unnecessary, but it builds good habits. – ShadowRanger Aug 27 '20 at 18:08