1

I'm trying to write a regular expression to match a requirement ID which consists of alphanumeric character and then a '-' and this can occur one or multitple times.

The example string would be:

NXOS-ABCD-005-053 – Requirement No. 1
F56156-ISSU-1 - Requirement No 1

I tried the regex below using regex 101 website but could not get the appropriate result that i would require.

([A-Za-z0-9]+\-)*

My understanding of using the above regex is ; it will match all characters A-Za-z0-9 multiple times , followed by a - and this combination can occur one or more times.

But It gives me logs of groups which i dont anticipate. can i get some help to fine tune the regex as I tried to google a lot but bot confused with various usages. Any help towards this will be much appreciated.

Thank you in advance.

Sai
  • 59
  • 6
  • 2
    `^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*`? Or, if it is not at the start of string, `\b[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*`? Note `([A-Za-z0-9]+\-)*` can match empty strings, as it matches any zero or more occurrences of a sequence of one or more alphanumeric chars followed with a hyphen. Note there is no hyphen after the last alphanumeric part, so this would not match the whole ID even if it worked as you think it would. – Wiktor Stribiżew Apr 17 '21 at 20:43
  • What in the above samples is your expected results? – JvdV Apr 17 '21 at 20:43
  • * is _zero_ or more times. Also what you've described _wouldn't_ match the IDs. I'd recommend using e.g. https://regex101.com/. – jonrsharpe Apr 17 '21 at 20:44
  • @WiktorStribiżew `?:` it's unnecessary, no? `/^[A-Z0-9]+(-[A-Z0-9]+)*/i` – Eduardo Jiménez Apr 17 '21 at 20:55
  • @Eduardo It depends. In Python, non-capturing groups are very convenient if you use `re.findall`. You are showing a JavaScript/Ruby/PHP regex literal notation now, but your question is tagged with Python, so I suggested just a string pattern in my top comment. – Wiktor Stribiżew Apr 17 '21 at 20:59

1 Answers1

1

Note ([A-Za-z0-9]+\-)* can match empty strings, as it matches any zero or more occurrences of a sequence of one or more alphanumeric chars followed with a hyphen. Note there is no hyphen after the last alphanumeric part, so this would not match the whole ID even if it worked as you think it would.

I suggest

^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*

Or, if it is not at the start of string

\b[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*\b

See the regex demo.

Details:

  • ^ - start of string
  • \b - word boundary
  • [A-Za-z0-9]+ - one or more alphanumeric chars
  • (?:-[A-Za-z0-9]+)* - zero or more repetitions of the pattern sequence inside the non-capturing group:
    • - - a hyphen
    • [A-Za-z0-9]+ - one or more alphanumeric chars
  • \b - word boundary
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397