0

I'm following this tutorial.

When I was trying to test my regular expression (The method dump is from linqpad to display it on the console):

Regex.Match("a^7lowah", @"\ba\w*\b").Success.Dump();

It should match a word that starts with an "a" and has x amount of alphanumeric characters to the end of the word.

But unfortunately the regex above matches.

My understanding of the regex:

  • "\b" (begin of the word)
  • "a" (just the letter a)
  • "\w" (alphanumeric character)
  • "*" (repeat previous term)
  • "\b" (end of the word)

What am I doing wrong?

Unihedron
  • 10,251
  • 13
  • 53
  • 66
Jamie
  • 2,823
  • 4
  • 30
  • 55
  • That's really weird, there is nothing wrong with your regexp and it shouldn't match. – Anders Bornholm Oct 02 '14 at 09:42
  • You should specify why you don't want this match to be successful. Is it because the leading `a` is followed by a `^` rather than a space? Is it because you want the whole of the input to have to match the output? Is it because the `a` is followed by 0, rather than 1 or more, other letters? – Rawling Oct 02 '14 at 09:58

3 Answers3

4

Yes, the regex will match.

Pattern: \ba\w*\b
String: a^7lowah

The * means "zero or more".

So this will be the match:

enter image description here

As you can see, no word characters are matched, but because you're quantifying "zero or more", it does not matter - our pointer skips over that part of the construct, and are already possible in asserting a word boundary.

You might want to change * to + instead.

Read also:

Community
  • 1
  • 1
Unihedron
  • 10,251
  • 13
  • 53
  • 66
  • +1, you were not only faster, but your answer describes the OP's problem better. – Philipp M Oct 02 '14 at 09:53
  • Thanks, changing the "*" to "+" works! It was indeed matching the a as a word. Because it thinks "^" is the end of the word. – Jamie Oct 02 '14 at 09:53
1

It matches only the a of your string.

Since a is a word character and ^ is not a word character, the empty string between them defines the word boundary. (for \b)

In your case the a is matched because it is followed directly by the word boundary as mentioned above. The reason is that the * matches zero or more characters of the preceding token.

See here.

Depending if your x should be 1 or more tokens instead of 0 or more tokens, you need to change to \ba\w+\b.

Philipp M
  • 1,827
  • 7
  • 27
  • 38
1

The problem is not in your Regexp, it's in your interpretation of success. The regexp matches the "a" only, but that is still a match and Success will be true.

Anders Bornholm
  • 1,266
  • 7
  • 17
  • I just noticed this, i didn't know it would count a special character (in this case "^" as an end of a word. So a would be matched. When i used: ^ in het begin and $ at the end of the regex then it works just fine. – Jamie Oct 02 '14 at 09:49