The []
in a regular expression denotes a character set. It tells the pattern matcher to match any character that appears inside the brackets. So, for instance,
/[abc]/
will match any one of 'a'
, 'b'
, or 'c'
.
Inside the brackets, however, the hyphen ('-'
) has a special meaning: it denotes the entire range of characters between the character just before and just after the hyphen (inclusive). That is, the above regex could have been written:
/[a-c]/
If you want to include a literal hyphen in the list of characters in the set, you need to escape it. That is:
/[a\-c]/
will match any one of 'a'
, '-'
, or 'c'
(and not 'b'
). You can also suppress the special meaning of the hyphen by making it the first or last character in the set, so:
/[-ac]/
will also match any one of 'a'
, '-'
, or 'c'
.
This explains why /[A-Za-z0-9]/
is not the same thing as /[0-z]/
: the range of characters between '0'
and 'z'
simply includes additional characters, as you noted in your question. That's all there is to it.
As a technical detail, Javascript uses the Unicode standard to define what characters fall within a range. If you're sticking with the 7-bit ASCII character set, you'll get the same results using an ASCII chart. But don't use an ASCII chart for character codes above 0x7F. You need to consult the Unicode charts instead.