-1

I have a language where block comments look like this /* ... */ (could be multi-line). However, nested comments like /* /* ... */ */ are illegal. What would be the correct regex expression to match this?

I have:

/\*(.*)?\*\/

But this would also match the second case which I don't want.

jrook
  • 3,226
  • 1
  • 14
  • 29
user4979733
  • 2,389
  • 2
  • 17
  • 33
  • This will be matched `/* /* ... */` and the trailing `*/` will throw an error. The matching is rudimentary, not really intelligent, the first open, then the first close. Like `/\*(.*?)\*/` with the dot-all flag. –  Nov 08 '19 at 20:02
  • How about `^\/\*[^\/*]+\*\/$` ? (https://regex101.com/r/zy0UgL/1) – jrook Nov 08 '19 at 20:12
  • Thank you. I am using python and more specifically Tatsu as the parser generator. – user4979733 Nov 08 '19 at 20:57
  • @jrook in your example, if there is a star character inside a comment, it doesn't work. For instance in your example regex101 url, if you replace "did" with "di*d", it doesn't match anymore. Doing this using regexp the way it's asked would be extremely complicated if not impossible – Vincent Nov 08 '19 at 21:02
  • or this : https://stackoverflow.com/questions/16160190/regular-expression-to-find-c-style-block-comments – jrook Nov 08 '19 at 21:11

1 Answers1

3

A regular expression (in the true sense of the word) which matches non-nesting C-style comments is:

[/][*]([^*]|[*]*[^*/])*[*]*[*][/]

The / and * characters are literally matched, so they are here boxed in [] character class syntax as an alternative to backslash escaping. (The / is often used as a delimiter for regular expressions.)

The explanation follows.

Match the leading /* sequence:

[/][*]

Then match any mixture, including an empty mixture, of:

  • individual characters that are not *; or

  • sequences of zero or more * terminated by a character that is neither * nor /

:

      ([^*]|[*]*[^*/])*

Then match zero or more * characters:

                       [*]*

Finally, match the trailing sequence:

                            [*][/]
Kaz
  • 48,579
  • 8
  • 85
  • 132
  • Great answer (upvoted). I'm curious why simulators like regex101 throw "pattern error" for [/], do you have a clue? – Vincent Nov 08 '19 at 22:04
  • This regex will match `/* /* ... */ */` partially – anubhava Nov 08 '19 at 22:08
  • @anubhava That is correct. The comment is this part `/* /* ... */`, which is followed followed by non-comment material `...*/` . OP said that nested comments are "illegal"; without explaining what that means: are we supposed to not match this at all, or does it mean that nested comments are treated as they are in C or Cascading Style Sheets. – Kaz Nov 08 '19 at 22:15
  • Well in that case `/\*.*?\*/` will also work – anubhava Nov 08 '19 at 22:17
  • @anubhava That's a Perl "regular" expression, though, not a regular expression. – Kaz Nov 08 '19 at 22:18
  • No I didn't mean `/` as regex delimiter. I meant `r'/\*.*?\*/'` as regex – anubhava Nov 08 '19 at 22:19