What is the difference between the following regular expressions. For me they are both the same
[a-z][a-z]*
Vs[a-z]+
[a-z][a-z]*
Vs[a-z]*[a-z]
What is the difference between the following regular expressions. For me they are both the same
[a-z][a-z]*
Vs [a-z]+
[a-z][a-z]*
Vs [a-z]*[a-z]
These regexes are identical, as you thought.
[a-zA-Z] # exactly one alphabetic char
[a-zA-Z]* # 0 to infinite alphabetic chars
versus
[a-zA-Z]+ # 1 to infinite alphabetic chars
One is just 1 + [0, \infinity] = [1, \infinity]
, the other [1, \infinity]
.
#2 works similarly, all you're doing in each case is taking one example of the repeated character (in your case, [a-zA-Z]
, out of the repeated character command, *
or +
.
The answer below that points out that the more readable version is preferred is right on target. There is absolutely no reason to do something like [a-zA-Z]*[a-zA-Z]
or [a-zA-Z][a-zA-Z]*
, since ultimatley they're both just [a-zA-Z]+
.
All are the same, and anytime you're repeating two identical commands in a row in a regex, you're doing something wrong.
$ python -m timeit -s "import re" "re.search(r'[a-zA-Z]*[a-
zA-Z]', '2323hfjfkf 23023493')"
1000000 loops, best of 3: 1.14 usec per loop
$ python -m timeit -s "import re" "re.search(r'[a-zA-Z]+',
'2323hfjfkf 23023493')"
1000000 loops, best of 3: 1 usec per loop
$ python -m timeit -s "import re" "re.search(r'[a-zA-Z][a-z
A-Z]*', '2323hfjfkf 23023493')"
1000000 loops, best of 3: 0.956 usec per loop
Turns out that [a-zA-Z][a-zA-Z]*
is marginally faster than using [a-zA-Z]+
. I'm a little surprised, but frankly I don't think the loss in readability is worth the .05 microsecond gain in efficiency.
Functionally all these regular expressions are identical.
Using the +
quantifier, though, may be problematic in some cases, because depending on the parser and its settings it may or it may not need to be escaped (\+
) in order to retain its special meaning. That is why some people avoid using +
and prefer the more explicit XX*
form, in order to keep their regular expressions more portable.
As far as Java is concerned, though, +
always retains its special meaning, unless escaped.
Yes, all four are totally equal regular expressions. [a-z]+
is the simplest one and should be chosen for readability issues.
You're right that [a-zA-Z][a-zA-Z]*
and [a-zA-Z]+
match all of the same strings so in that respect there's no difference. There's one main advantage [a-zA-Z]+
has over the other which is that it's more readable (readability counts!).
Both are the same check out Pattern Reluctant quantifiers. [a-zA-Z]+ is more readable for yourself and others.
[a-zA-Z][a-zA-Z]* Vs [a-zA-Z]*[a-zA-Z]
I think the main difference between this regular expression is that first expression will be done early than the second one. Because tree-walk for match for [a-zA-Z][a-zA-Z]* consist of steps less than another part of the expression.