-1

I have this regex let nonAlphaNumeric = /[\W_]/gi; When using it on "32086073S" the letter S is being removed. This is the code I use to test: "3208S6073OS".replace(/[\W_]/gi, '');

edit: added s in test.

The underscore and i in combination are matching the S and removing it, why?

test url: regexr.com/4gpit

  • `"32086073S".replace(/[\W_]/gi, '')` > `32086073` – Wiktor Stribiżew Jul 02 '19 at 09:38
  • Where do you get this behavior? I can only see that it is not expected as per [ECMA-262 5.1 Edition](https://www.ecma-international.org/ecma-262/5.1/): *In case-insignificant matches all characters are implicitly converted to upper case immediately before they are compared. However, if converting a character to upper case would expand that character into more than one character (such as converting `"ß"` (`\u00DF`) into `"SS"`), then the character is left as-is instead.* – Wiktor Stribiżew Jul 02 '19 at 09:46
  • *The character is also left as-is if it is not an ASCII character but converting it to upper case would make it into an ASCII character. This prevents Unicode characters such as `\u0131` and `\u017F` from matching regular expressions such as `/[a‑z]/i`, which are only intended to match ASCII letters. Furthermore, if these conversions were allowed, then **`/[^\W]/i` would match each of `a`, `b`, …, `h`, but not `i` or `s`.*** – Wiktor Stribiżew Jul 02 '19 at 09:46
  • Can you reproduce the behaviour in a online regex tester, I.e. [regex101.com](https://regex101.com/r/QE1CrO/1)? I cannot reproduce your problem here and I copied your string from your example. – Chrᴉz supports Monica Jul 02 '19 at 11:10
  • @WiktorStribiżew, I don't think this is a duplicate. It really seems to be a deviate behaviour in Chrome. In FF, Edge, IE11 the "S" is not matched, while in Chrome (v75) it (wrongly) is. The [duplicate reference](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) does not help to explain this. – trincot Jul 02 '19 at 12:22
  • @trincot Ok, looks like it is a bug in Chrome v75, as the ECMA standard specifically describes this pattern and `S` should not be matched with it. – Wiktor Stribiżew Jul 02 '19 at 12:26
  • If it is a bug, it deserves to be treated as a separate Q&A. – trincot Jul 02 '19 at 12:26
  • So, you suggest putting my two comments above as an answer? – Wiktor Stribiżew Jul 02 '19 at 12:27
  • It could be part of an answer, but neither seems to fully explain this particular case. The input string is already in upper case, only contains ASCII and the behaviour is not the same with "I" instead of "S". I think more needs to be investigated here. A reference to a bug report would be ideal -- unless no-one bumped into this before. – trincot Jul 02 '19 at 12:29

1 Answers1

-1

It seems you stumbled over a recent bug/regression in Chrome 75 (since build 75.0.3756.0). A bug report can be found at Issue 972850: RegExp /[\W_]/gi matches the letter S:

"RST".replace(/[\W_]/gi, "");

What is the expected behavior?
Output is "RST".

What went wrong?
Output is "RT".

The issue got merged with Issue 971636: regex /ſ/i (U+017F) matches a normal s (U+0073). A Chromium project member spots a similar case at comment #13 of that second thread:

hum...
"S".match(/[\W]/i)

null

"S".match(/[a\W]/i)

["S", index: 0, input: "S", groups: undefined]

Notice that /[a\W]/i is just a variant on your case: the bug is not specific to the underscore: it happens as soon as you add a character in the class that is in \w. For example, /[x\W]/i, /[,.\W#j]/i also trigger the problem. The g flag is unrelated to the problem, and when you use the u modifier, the problem disappears.

Anyway, the important news here is that the bug has been acknowledged and is fixed in Chromium 76 and possibly in a 75 bug release.

If you don't fear beta-releases, you can download the beta, which at the time of writing is 76.0.3809.46. I installed it just now and I confirm it fixes the bug.

Community
  • 1
  • 1
trincot
  • 211,288
  • 25
  • 175
  • 211