-1

I want to capture numbers and number ranges from a list: ["op.15", "Op.16-17", "Op16,17,18"]

match = re.compile(r"\d+[-]?\d+").findall(text)

Gets the correct result

op.15 ['15']
Op.16-17 ['16-17']
Op16,17,18 ['16', '17', '18']

but this doesn't work:

match = re.compile(r"\d+(-)?\d+").findall(text)

op.15 ['']
Op.16-17 ['-']
Op16,17,18 ['', '', '']

What's the issue here? I want to add in alternative values to -, such as "to" i.e. -|to which doesn't work with [].

Cătălina Sîrbu
  • 1,148
  • 4
  • 18
iznog0ud
  • 1
  • 2
  • 1
    the `( )` is defining a capturing group, essentially you are saying capture the value inside these parentheses. where as `[ ]` is just saying match any characters in side these square brackets. – Chris Doyle Mar 31 '20 at 20:08
  • 1
    When you use a capturing group, `findall()` only returns the content matched by the group. Drop the `()` for `-?`. Edit: Or use a non-capturing group if you want to group stuff: `(?:a|boo)`. – oriberu Mar 31 '20 at 20:12
  • 1
    You can use a non-capturing group `(?:-)?` or no group at all: `-?` – chepner Mar 31 '20 at 20:15

1 Answers1

0

The documentation for findall in re module says

Return a list of all non-overlapping matches in the string. If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

In your first regex you dont provide any capture groups so you get returned a list of non overlapping matches I.E it will return one or more digits followed by 0 or 1 hyphen followed by one or more digits.

In your second regex you change your [ ] which was saying match any chars in this list. To ( ) which is a capture group. so now you are saying match one or more digits followed by and capture zero or one hyphen, followed by one or more digits.

Now since you have given a capture group as per the documentation you wont now be returned the full non over lapping match, instead you will be returned only the capture group. I.e only returned anything inside the ( ) which will be either empty if there was 0 hyphen or will be - if there was 1 hyphen.

To fix the issue, use a non-capturing group: r"\d+(?:-)?\d+".

Karl Knechtel
  • 51,161
  • 7
  • 77
  • 117
Chris Doyle
  • 7,991
  • 2
  • 18
  • 39