7

I have to validate next string format:

text-text-id-text

Separator is character '-'. Third column must always be id. I wrote next regex (in python) which validates string:

import re

s = 'col1-col2-col3-id' # any additional text at the end
                        # is allowed e.g. -col4-col5
print re.match('^(.*-){3}id(-.*)?$', s) # ok 
print re.match('^(.*-){1}id(-.*)?$', s) # still ok, is should not be

I tried adding non-greedy mode, but result is still the same:

^(.*?-){1}id(-.*)?$

What am I missing in my regex? I could just validate string like this:

>>> import re
>>> print re.split('-', 'col1-col2-col3-id')
['col1', 'col2', 'col3', 'id']

And then check if the third element matches id, but I am interested in why does the first regex works as mentioned above.

Unihedron
  • 10,251
  • 13
  • 53
  • 66
broadband
  • 2,743
  • 2
  • 34
  • 61

1 Answers1

7

Your first regex is incorrect because it asserts that id is present after the first three items.
Your second regex matches the string incorrectly because .* matches hyphens as well.

You should use this regex:

/^(?:[^-]+-){2}id/

Here is a regex demo!

And if you feel a need to anchor a regex to the end, use /^(?:[^-]*-){2}id.*$/!


As mentioned by Tim Pietzcker, consider asserting id at the end of the item:

/^(?:[^-]+-){2}id(?![^-])/

Here is an UPDATED regex demo!

Community
  • 1
  • 1
Unihedron
  • 10,251
  • 13
  • 53
  • 66
  • 1
    +1, and perhaps use a lookahead assertion `(?=-|$)` after `id` to make sure that the third column isn't something like `idiom`. – Tim Pietzcker Aug 15 '14 at 11:16
  • Probably, but we don't know which characters are allowed between `-`s, perhaps `id.txt` would be valid and shouldn't be matched. – Tim Pietzcker Aug 15 '14 at 11:19
  • @georg any other column can also be id. Valid string is also id-id-id-id. I then further validate rows of column id. I check for whitespaces. – broadband Aug 15 '14 at 11:51
  • 1
    @Unihedron .* matches hyphens as well. Thnx for this. Is ?: neccessary? – broadband Aug 15 '14 at 11:53
  • 1
    I understand now about ?:. Read http://stackoverflow.com/questions/3512471/non-capturing-group – broadband Aug 15 '14 at 12:03
  • @broadband Excellent! [Check this out, too.](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Unihedron Aug 15 '14 at 12:04
  • @TimPietzcker string 'text-text-id' does not pass when regex /^(?:[^-]+-){2}id(?![^-])/ is used. Therefore I use this: ^(?:[^-]*-){2}id(-[^-]*)?$ [regex demo](http://regex101.com/r/fX5sE0/2) – broadband Aug 16 '14 at 09:08