1

I'm trying to just preg content between html tags, I'm trying this simple assertion pattern and I don't understand why it doesn't match this string.

<a href=http://url.com title="link">this is a ling</a>

(?<=<a.*>)([ \w]*)(?=<.*\/a>)

Debuggex Demo

1 Answers1

4

Lookbehinds on debuggex (PCRE, Javascript and Python) cannot be of variable width, meaning that you can use (?<=<a>) which has a fixed width (3 characters) but not something that can vary in length (?<=<a.*>) (can have 3 characters, or 4, or 5, etc).

The regex simply is not valid but debuggex tells you that there is no match.

Jerry
  • 67,172
  • 12
  • 92
  • 128
  • I understand. I must find another way to do this. Thanks for the answer. – user3491049 Apr 02 '14 at 20:06
  • 2
    @user3491049 You should be able to use something like `([ \w]*)<.>` because you're getting a capture group anyway. Or if you can have the anchor `\K`, then you could perhaps use `\K([ \w]*)(?=<.>)`. All the same, you should use a proper HTML parser if you're parsing HTML. – Jerry Apr 02 '14 at 20:07
  • @Jerry In this context the `a` tag can/should never be nested (assuming valid HTML) so a basic RegEx should be fine. – tenub Apr 02 '14 at 20:11
  • @tenub Fair enough. I was just advising that for general HTML parsing :) – Jerry Apr 03 '14 at 04:34
  • This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Lookarounds". – aliteralmind Apr 10 '14 at 00:31