They both match the same strings, but in terms of differences in output, (T|E|N)*
also returns a capture group containing the last matched character.
For example, given the string TENTEN
, (T|E|N)*
will match and will have N
in the first capture group. [TEN]*
on the other hand will not have any capture group.
In terms of performance, (T|E|N)*
will tend to be slower because most regex engines test the first branch before testing the second one.
For instance with TENTEN
, this is what happens (spaces added for the sake of clarification):
Attempts to match T
T E N T E N
^
Matches T, moves on
T E N T E N
^
Attempts to match T
T E N T E N
^
Fails, attempt to match the next, E
T E N T E N
^
Matches E, moves on
T E N T E N
^
Attempts to match T
T E N T E N
^
Fails, attempt to match the next, E
T E N T E N
^
Fails, attempt to match the next, N
T E N T E N
^
Matches N, moves on
T E N T E N
^
And so on, but with the character class, you could say that everything is tested at the same time:
Attempts to match T, E or N
T E N T E N
^
Matches T, moves on
T E N T E N
^
Attempts to match T, E or N
T E N T E N
^
Matches E, moves on
T E N T E N
^
Attempts to match T, E or N
T E N T E N
^
Matches N, moves on
T E N T E N
^
This means that ( ... | ... )
will always try to match the first branch before attempting to match the next, while [ ... ]
does not and just 'mixes everything together'.
This means that for simple patterns (1 character), it would be best to use character class, i.e. [TEN]*
instead of (T|E|N)*
(or (?:T|E|N)*
).