4

looking for help applying a regex function that finds a string that starts with 5 and is 7 digits long.

this is what i have so far based on my searches but doesn't work:

import re

string = "234324, 5604020, 45309, 45, 55, 5102903"
re.findall(r'^5[0-9]\d{5}', string)

not sure what i'm missing.

thanks

orangecodelife
  • 103
  • 1
  • 9
  • 2
    You are only matching from the start of the string, remove the `^` and you want to match 6 digits after the 5, not 5 digits. You probably want to use word boundaries as well – user3483203 Sep 19 '18 at 17:57
  • Use http://regex101.com or similar sites to check/proof/test your regexes. – Patrick Artner Sep 19 '18 at 17:58
  • Possible duplicate of [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Paolo Sep 19 '18 at 18:09

3 Answers3

6

You are using a ^, which asserts position at the start of the string. Use a word boundary instead. Also, you don't need both the [0-9] and the \d.

Use \b5[0-9]{6}\b (or \b5\d{6}\b) instead:

>>> re.findall(r'\b5\d{6}\b', s)
['5604020', '5102903']
user3483203
  • 45,503
  • 8
  • 43
  • 75
3

The ^ at the start of the regular expression forbids any matches that aren't at the very beginning of the string. Replacing it with a negative lookbehind for \d to match non-digits or the beginning, and add a negative lookahead to forbid extra following digits:

import re

string = "234324, 5604020, 45309, 45, 55, 5102903"
re.findall(r'(?<!\d)5\d{6}(?!\d)', string)
Ruzihm
  • 16,769
  • 3
  • 27
  • 40
  • 1
    This won't match a valid match at the beginning of a string. Try it with `s = '5123456'`. `\D` matches any character that isn't a digit, but beginning of the is not a character match, and `\D` doesn't check for zero length matches – user3483203 Sep 19 '18 at 18:06
  • You're right. I changed it to be negative lookbehind/ahead – Ruzihm Sep 19 '18 at 18:07
  • This still will miss out if you have `s = 67.5678903` – Onyambu Sep 19 '18 at 18:10
  • 1
    @Onyambu A word boundary still matches the position where `.` meets `5` in `67.5678903`. – revo Sep 19 '18 at 18:13
  • It's ambiguous if op wants to treat periods as number boundaries (if they are treated as separators) or if they are included as part of the number as decimal points. – Ruzihm Sep 19 '18 at 18:13
  • then you will need a lookbehind with a space – Onyambu Sep 19 '18 at 18:14
1

Try to match: Boundary 5 followed by 6 digits and after that match non-digit character in a non-capturing group.

\b5 looks 5 at start of numbers

\d{6} matches 6 digits
(?:\D|$) non-capturing group: ignores non-digit or $

\b5\d{6}(?:\D|$)

demo

import re

string = "234324, 5604020, 45309, 45, 55, 5102903"
re.findall(r'\b5\d{6}(?:\D|$)', string)
The Scientific Method
  • 2,151
  • 2
  • 10
  • 21
  • 1
    No, this will match `45123456` too. Use some boundaries. Also while you are matching `5\d{6}` which means `7` digits you don't have to repeat it inside a lookahead. – revo Sep 19 '18 at 18:04
  • You can safely remove the lookahead. – revo Sep 19 '18 at 18:08