0

I'm a beginner in these regular expressions and I could not understand the meaning of optional occurrence and zero or more occurrence correctly.(I'm using JavaCC regular Expressions)

For example,

if I need to match a name like "file" ,which may also contain a number ,I can use

["a"-"z"]*[0-9]?  

but can I use

["a"-"z"]*[0-9]*

to match the name "file" since I'm using "zero or more occurrence" for numbers?

iahsp4
  • 13
  • 4
  • Possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – l'L'l Jun 18 '16 at 05:04
  • 1
    Sidenode: `["a"-"z"]*` does not what you want it to do. You are matching `"` or `a` or `" to "` or `z` or `"` with that regex. (note that `"-"` equals `"` and therefore is useless, since `"` is already part of the character-group) You are looking for: `[a-z]*` – dognose Jun 18 '16 at 06:02
  • A side note to that side note, the notation `["a"-"z"]` is the correct way to write it in JavaCC. – Theodore Norvell Jun 18 '16 at 14:48
  • And of course it should be `["0"-"9"]` too. – Theodore Norvell Jun 18 '16 at 15:00

2 Answers2

1

An optional occurrence can occur zero or one time. This uses the ? operator.

The * operator the occurrence can occur zero, one, two, ... times

Ed Heal
  • 55,822
  • 16
  • 77
  • 115
  • Additional side note: both are greedy by default. – Jan Jun 18 '16 at 07:48
  • In the context of JavaCC, I don't think "greedy" has any meaning -- at least when applied to the `*` and `?` operators. JavaCC looks for the longest match for the entire regular expression. Calling `*` greedy suggests that it is locally greedy, possibly at the expense of the longest match globally. – Theodore Norvell Jun 18 '16 at 15:35
  • To further explain my previous comment. Consider an RE `a*(ab)*` and a string `aaaabababab`. In Python, using the `re.match` function, the RE matches `aaaa` because the first `*` is greedy. However in JavaCC, the whole string is matched, because the whole string is the longest match for the whole RE. – Theodore Norvell Jun 19 '16 at 00:28
0

[@EdHeal's answer succinctly answers your question. My answer is focussed on trying to help you achieve what you want.]

What is a file name?

  • Any number (including 0) of lower-case English letters possibly followed by a digit: ["a"-"z"]* ["0"-"9"]?

  • Any number (including 0) of lower-case English letters followed by any number (including 0) of digits: ["a"-"z"]* ["0"-"9"]*

  • Any number (including 0) of lower-case English letters and up to one digit anywhere: ["a"-"z"]* ["0"-"9"]? ["a"-"z"]*

  • Any number (including 0) of lower-case English letters or digits: ["a"-"z","0"-"9"]*

All of the above will match the empty string, which could lead to an infinite loop.

If you what to require at least one character in a file name, the above would be respectively

  • ["a"-"z"]+ ["0"-"9"]? | ["0"-"9"]
  • ["a"-"z"]* ["0"-"9"]+ | ["a"-"z"]+ ["0"-"9"]*
  • ["a"-"z"]+ ["0"-"9"]? ["a"-"z"]* | ["a"-"z"]* ["0"-"9"] ["a"-"z"]* | ["a"-"z"]* ["0"-"9"]? ["a"-"z"]+
  • ["a"-"z","0"-"9"]+
Theodore Norvell
  • 11,939
  • 6
  • 27
  • 42
  • Thank you very much for your explanation Theodore..I really want to clarify,does "zero or more occurence" means that the specified pattern is optional? since it says "zero",or is it must to use the optional occurence notation? – iahsp4 Jun 18 '16 at 17:33
  • Yes, "zero or more" allows for zero, so the pattern is optional. In general `(x)*` is the same as `((x)*)?` for any regular expression `x`. – Theodore Norvell Jun 19 '16 at 00:18