0

I am reading The Java™ Tutorials, Regular Expressions, Boundary Matchers. How can I find dog when it is after the end of the previous match, but not when it is at the start of the string.

For example: \Gdog will find 2 matches in dogdog but I don't want to catch the first dog because it is not after the previous match (there is no previous match).

Why does \Gdog match the first dog when we don't have previous match?

One more question: There is special symbol for start of input: \A. How can I negate its meaning, i.e. "not at the start of the input". I tried with \a, but that did not work. And, what's the meaning of \a ?

Thanks in advance.

Andy Brown
  • 18,200
  • 3
  • 46
  • 59
DPM
  • 1,350
  • 13
  • 37
  • 1
    "*One more question*" - please try and ask just one question per post. – Andy Brown Aug 22 '15 at 13:04
  • 1
    possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Andy Brown Aug 22 '15 at 13:08
  • @AndyBrown, I know that I should post only I question, but actually its coupled with the first one( because the first one I am asking for solution, so the second one is to find conrete solution). I wrote second question, because of the readability. Thanks for the link. – DPM Aug 22 '15 at 18:55

1 Answers1

4

The \G marker is under-documented. In the first match, since it doesn't have a "previous match", it matches at the beginning of the string. So its actual meaning is "match either at the beginning of input or after the previous match".

You should note that if the \Gdog didn't match at the beginning of the string, it would not have matched "dogdog" at all. The first "dog" is at the beginning of the string, so it's not matched. And the second dog is not matched, because the first dog didn't match...

As for your second question, a negative lookbehind will allow you to do the opposite of a \A: "(?<!\\A)". It's not always the case that lowercase "marker" is the opposite of the uppercase "marker". The Pattern documentation lists \a as "the bell character", which means it will match a \u0007 in the input.

RealSkeptic
  • 32,074
  • 7
  • 48
  • 75
  • Thank you very much. In my opinion the documentation is wrong, because as you said (which is true) "match either at the beginning of input or after the previous match". – DPM Aug 22 '15 at 19:07