-2

I want to find out whether a char appears consecutively more than 2 times in a string, according to msdn, I ended up with this snippet:

string tripleRepetitiveCharPtn = @"(\w)\1\1";

Regex.IsMatch("45678au---lt", tripleRepetitiveCharPtn, RegexOptions.IgnoreCase)
{
    ...
}

It works for "aaa", but not "---". What should I do?

Thanks.

UPDATE:

Since the question was marked as [duplicated]-as a result, my reputation amount was reduced, I realized that the question is originally somehow vague. Thanks to all the gentle men who left answers/comments, it is now clear that why (\w) does not match hyphen symbol, thank you all.

But my intention was actually to find out a method to specify other chars in addition to (\w), I know that (.*) can match everything, however, can I specify them literally?

VincentZHANG
  • 727
  • 1
  • 10
  • 28
  • 3
    Because `-` is not a word character? – J. Steen Feb 04 '15 at 09:24
  • @malkam It doesn't work, but thank you. – VincentZHANG Feb 05 '15 at 01:34
  • Your question is still unclear. Instead of adding an update, please try to reword your question. The main problem with your question, in my opinion, is that there's no clear link between the title and the content. Note also that @malkam's _updated_ comment does appear to meet your criteria. – Simon MᶜKenzie Feb 05 '15 at 02:45
  • 1
    @SimonMᶜKenzie, thank you, `@"([\w\-])\1\1"` works in the end, it must be embraced with brackets to be referenced. I won't revise this post further. – VincentZHANG Feb 05 '15 at 07:19

1 Answers1

0

From the MSDN page on Character Classes in Regular Expressions

\w matches any word character. A word character is a member of any of the Unicode categories listed in the following table.

  • Ll: Letter, Lowercase
  • Lu: Letter, Uppercase
  • Lt: Letter, Titlecase
  • Lo: Letter, Other
  • Lm: Letter, Modifier
  • Nd: Number, Decimal Digit
  • Pc: Punctuation, Connector. This category includes ten characters, the most commonly used of which is the LOWLINE character (_), u+005F.

The hyphen character - (U+002D) actually belongs to Pd (Punctuation, Dash) category, so \w does not match -.

List of characters under Pc category:

U+005F  LOW LINE    _   
U+203F  UNDERTIE    ‿   
U+2040  CHARACTER TIE   ⁀   
U+2054  INVERTED UNDERTIE   ⁔ 
U+FE33  PRESENTATION FORM FOR VERTICAL LOW LINE     ︳ 
U+FE34  PRESENTATION FORM FOR VERTICAL WAVY LOW LINE    ︴
U+FE4D  DASHED LOW LINE     ﹍   
U+FE4E  CENTRELINE LOW LINE     ﹎   
U+FE4F  WAVY LOW LINE   ﹏   
U+FF3F  FULLWIDTH LOW LINE  _   
Community
  • 1
  • 1
Rawling
  • 45,907
  • 6
  • 80
  • 115
  • You missed the **If** part `If ECMAScript-compliant behavior is specified, \w is equivalent to [a-zA-Z_0-9].` – nhahtdh Feb 04 '15 at 09:30
  • @nhahtdh As I said, I'm paraphrasing, and I provided the link. I think `A-Za-z0-9_` is more useful than "this list of Unicode character classes you'll have to research yourself". Still, this is a duplicate so I've CW'd it. – Rawling Feb 04 '15 at 09:31
  • You can say: In xyz category, and a-zA-Z0-9_ are included in it. Don't take the word out of the context (in this case, the clause comes from ECMAScript part). – nhahtdh Feb 04 '15 at 09:32
  • Saying "`a-zA-Z0-9_` are included in it" doesn't say that the hyphen isn't included in it. I've explained what it includes in general regex flavours, and that in C# it contains more but still not the hyphen. – Rawling Feb 04 '15 at 09:35
  • 1
    Since you have made it a comm. wiki, I have edited it to clarify it. – nhahtdh Feb 04 '15 at 09:37
  • @nhahtdh That was also my intention :p – Rawling Feb 04 '15 at 09:38