0

I have a regular string as bellow:

xxSTART Here we have the first text in 1234 asdf xxENDxxSTART Here we have the second text 999 fffd xxENDxxSTART Here we have the third text 1234 9985Df xxENDxxSTART Here we have the fourth text 1234 asdf Dert xxEND

I'm using the follow REGEX: ^(?:(.*?)\K(xxSTART)){3}(.*?xxEND) to get ONLY third match xxSTART Here we have the third text 1234 9985Df xxEND. This work well in http://www.regexr.com/v1/, but i read a article saying \K option is not a option in C# (Support of \K in regex), and the article Translate Perl regular expressions to .NET say to use look-behind ((?<=…)) instead. But i cant use look-behind ((?<=…)) in my RegEX, anybody can help me? please!!

Anyone have a ideia how to use ((?<=…)) in my RegEX ^(?:(.*?)\K(xxSTART)){3}(.*?xxEND) to replace \K option?

Thank Regards,

Community
  • 1
  • 1
AneEx
  • 13
  • 3

3 Answers3

1

You don't really need a lookbehind here, you can match the xxSTART and still get the 3rd part that you want to get:

^(?:xxSTART.*?){3}\s*(.*?)xxEND

ideone demo

But if you really want to use a lookbehind (in case you don't want any capture groups, well, in this case, you can consider using a lookahead for the xxEND), you would use something like this:

(?<=^(?:xxSTART.*?){3}\s*).*?(?=xxEND)

ideone demo

Jerry
  • 67,172
  • 12
  • 92
  • 128
  • Thanks a lot for reply Jerry, using first `^(?:xxSTART.*?){3}\s*(.*?)xxEND` I get the first three occurrences: `xxSTART Here we have the first text in 1234 asdf xxENDxxSTART Here we have the second text 999 fffd xxENDxxSTART Here we have the third text 1234 9985Df xxEND` and parsing this with other `(?<=^(?:xxSTART.*?){3}\s*).*?(?=xxEND) - this is a 'reversal' of first REGEX` i get last third ocorrence. I Think if exists a way to combine the two regex `^(?:xxSTART.*?){3}\s*(.*?)xxEND` + `(?<=^(?:xxSTART.*?){3}\s*).*?(?=xxEND)` to get ONLY third match `..Here we have the third text 1234 9985..` – AneEx Apr 11 '14 at 17:53
  • @AneEx I don't understand you. Both regex give `Here we have the third text 1234 9985Df ` if you look at the code execution examples in my answer. How are you using the regex? – Jerry Apr 11 '14 at 17:59
  • Sorry Jerry, I'll try to be clearer. I upload 2 images of RegexBuddy representing my 'problem' and parsing regexex bellow. First image (http://imagizer.imageshack.us/a/img853/1869/5q0v.png) represent match of `^(?:xxSTART.*?){3}\s*(.*?)xxEND` in yellow and second (http://imagizer.imageshack.us/a/img534/492/k3gt.png) represent a parse of `(?<=^(?:xxSTART.*?){3}\s*).*?(?=xxEND)` in yellow. I can get selection of third ocorrence in only one match :D if possible. – AneEx Apr 11 '14 at 18:46
  • @AneEx For the first one, you have to pick the first capture: [image](http://i.stack.imgur.com/pqPtz.png) and for the second one, you must not loop through the regex. I still don't understand what you are trying to do. Where are you using the regex? – Jerry Apr 11 '14 at 19:00
  • @AneEx I told you that you don't loop through the regex and the regex will work *fine*. But I guess if you want to be that determined... That one will match only one `(?<=^(?:xxSTART(?:(?!xxSTART).)*){3}\s*)(?:(?!xxSTART).)*?(?=xxEND)` no matter the number of loops. – Jerry Apr 11 '14 at 19:04
  • HAHAHHAHAHHAHA!!! Jerry you are teacher in regex!!! Thanks, work very well!!! Thanks a lot!! :) I'm so happy with your post! – AneEx Apr 11 '14 at 19:34
  • @AneEx Feel free to [accept my answer](http://meta.stackexchange.com/a/5235/192545) so your question is also marked as solved :) – Jerry Apr 11 '14 at 19:49
0

Simply use this:

^(?:xxSTART.*?xxEND){2}(xxSTART.*?xxEND)

Skip the first two blocks first, and then capture the third one. No lookbehind assertion is required here.

Sabuj Hassan
  • 35,286
  • 11
  • 68
  • 78
0

An easy method is to not impose this limitation inside the regex, but instead to do the counting outside:

use strict;
use warnings;

my $data = do {local $/, <DATA>};

my $count = 0;
while ($data =~ /(?<=xxxSTART)(.*?)(?=xxEND)/g) {
    if (++$count == 3) {
        print $1;
        last;
    }
}

__DATA__
xxSTART Here we have the first text in 1234 asdf xxENDxxSTART Here we have the second text 999 fffd xxENDxxSTART Here we have the third text 1234 9985Df xxENDxxSTART Here we have the fourth text 1234 asdf Dert xxEND

outputs:

 Here we have the third text 1234 9985Df
Miller
  • 34,344
  • 4
  • 33
  • 55