I've got a regular expression that I'm trying to match against the following types of data, with each token separated by an unknown number of spaces.
Update: "Text" can be almost any character, which is why I had .*
initially. Importantly, it can also include spaces.
- Text
- Text 01
- Text 01 of 03
- Text 01 (of 03)
- Text 01-03
I'd like to capture "Text", "01", and "03" as separate groups, and all except "Text" are optional. The best I've been able to do so far is:
\s*(.*)\s+(\d+)\s*(?:\s*\(?\s*(?:of|-)\s*(\d+)\s*\)?\s*)
This matches #3-#5, and puts them in the proper capture groups. I can't figure out, though, why when I add an additional ?
to the end to make the part of the expression after 01
optional, my capture groups get all funky.
\s*(.*)\s+(\d+)\s*(?:\s*\(?\s*(?:of|-)\s*(\d+)\s*\)?\s*)?
The RegEx above matches #2-#5, but the capture groups are correct only for #2 and #5.
This seems like a straightforward regular expression, so I don't know why I'm having so much trouble with it.
This is a link to an online RegEx evaluator I'm using to help me debug this: http://regexr.com?2tb64. The link already has the first RegEx and the test data filled in.