The second one takes long (or at least can take long, depending on the implementation) thanks to the *
's in your regex.
Your regex starts off trying to match like this:
[a-zA-Z]+ \s* \w* \s* \w* \s* \w* \( \w+ \) [unmatched]
Asdadasdasd sadsdsad sdasd X ( s ) sdsd
At this point you might expect it to say "okay, doesn't match, we're done".
But this is not what it does.
Instead, it will backtrack in an attempt to find a match that would work (since it's not all that easy for a computer to figure out that backtracking will be a waste of time in this case).
Where it previously matched the second \w*
to sdasd
, it will now try 1 less character, i.e. sdas
, and then it will add another \s*\w*
which will match 0 characters for \s*
and d
for \w*
.
[a-zA-Z]+ \s* \w* \s* \w* \s* \w* \s* \w* \( \w+ \) [unmatched]
Asdadasdasd sadsdsad sdas X d X ( s ) sdsd
This also won't work, so it will instead try sda
and then sd
, which won't work and lead it to splitting that up further to sda
, s
and d
.
[a-zA-Z]+ \s* \w* \s* \w* \s* \w* \s* \w* \( \w+ \) [unmatched]
Asdadasdasd sadsdsad sda X sd X ( s ) sdsd
[a-zA-Z]+ \s* \w* \s* \w* \s* \w* \s* \w* \s* \w* \( \w+ \) [unmatched]
Asdadasdasd sadsdsad sda X s X d X ( s ) sdsd
And so on, until each \w
is just matching one character.
PS: The above is not necessarily exactly what it does, it's more intended to give a basic idea of what happens.
PPS: Used \
instead of \\
for brevity.
How do you fix it?
There are a few ways to fix it.
The one requiring the least changes is perhaps to use (\\s*\\w*)*+
instead - *+
makes the *
possessive, which prevents it from backtracking at all (which is in line with what we want here).
^[a-zA-Z]+(\\s*\\w*)*+\\(\\w+\\)
What would also work is to use \\s+
instead of \\s*
, although this would lead to some slightly different behaviour (specifically that 0-9 can no longer appear before the first space, which can be fixed by adding \\w*
before your brackets).
This fixes it because we can no longer match 0 characters for \\s
, which prevents a lot of work we would've otherwise done while backtracking.
^[a-zA-Z]+(\\s+\\w*)*\\(\\w+\\)
OR ^[a-zA-Z]+\\w*(\\s+\\w*)*\\(\\w+\\)
I'd also recommend removing the +
from the [a-zA-Z]
in either case, since this is already covered by the \\w*
(thus doesn't change what the regex matches) and (in my opinion) makes the desired behaviour of the regex clearer when looking at it.
PS: [\\s]*
is equivalent to \\s*
.