-1

I'm currently using this website to create some regular expressions for a programming language I want to build, at the moment I'm just setting up an expression for identifiers.

In my language, identifiers are expressed like most languages:

  • They cannot begin with a digit, or special character other than an underscore
  • After the first character they can contain alphanumeric and underscore characters

Given those rules I've come up with the following expression by myself:

^\D\w+$

Obviously, it doesn't account for special characters, however the following expression does (which I didn't make myself):

^(?!\d)\w+$

Why does the second expression account special characters? Shouldn't they be producing the same results?

Mathew O'Dwyer
  • 147
  • 1
  • 6
  • `\D` contains all that is not a digit, including all that is not a word character. – Casimir et Hippolyte May 05 '18 at 08:16
  • Rather use `[A-Za-z]\w+` or `\p{Alpha}\w+`. As @Casimir said, `\D` contains also special characters. – CoronA May 05 '18 at 08:18
  • @CasimiretHippolyte But in the first expression, this would be mean that characters such as #, %, @ etc. wouldn't be acceptable, but they are. – Mathew O'Dwyer May 05 '18 at 08:22
  • `\D` accepts `#` because it is not a digit. Same with the others. – CoronA May 05 '18 at 08:29
  • You cannot expect `\w` to match "special" characters since it matches "word" chars. Read up about [lookaheads](https://www.regular-expressions.info/lookaround.html). `(?!\d)` just means the next char cannot be a digit. – Wiktor Stribiżew May 05 '18 at 08:47

3 Answers3

1

I will explain why the second regex works.

The second regex uses a lookahead. After matching the start of the string, the engine checks whether the next character is a digit but it does not match it! This is important because if the next character is not a digit, it tries to use \w to match that same character, which it couldn't if the character is a symbol, if it is a digit, the negative lookahead fails and nothing is matched.

\D on the other hand, will match the character if it is not a digit, and \w will match whatever comes after that. That means all symbols are accepted.

Sweeper
  • 145,870
  • 17
  • 129
  • 225
0

This ^(?!\d)\w+$ means a string consisted of word characters [a-zA-Z0-9_] that doesn't start with a digit.

This ^\D\w+$ means a non-digit character followed by at least one character from [a-zA-Z0-9_] set.

So @ab01 is matched by second regex while first regex rejects it.

revo
  • 43,830
  • 14
  • 67
  • 109
0

(?!\d)\w+ means "match a word which is not prepended with digits". But as you're wrapping it with ^ and $ characters it is basically the same as just ^\w+$ which is obviously not the same as ^\D\w+$. ^(?!\d).+\w+$ (note ".+" in the middle) would behave the same as ^\D\w+$

nutic
  • 457
  • 2
  • 12