4

I created a pattern for matching string from 3 numbers (like: 333) between a tags:

@((<a>(.?[^(<\/a>)].?))*)([0-9]{3})(((.*?)?</a>))@i

How can I invert the pattern above to get numbers not between a tags.

I try used ?! but doesn't work

Edit: Example input data:

lor <a>111</a> em 222 ip <a><link />333</a> sum 444 do <a>x555</a> lo <a>z 666</a> res
kicaj
  • 2,633
  • 4
  • 40
  • 62

2 Answers2

5

You're trying to solve a HTML problem in text domain, which is just awkward to use. The right way is to use a DOM parser; you can use an XPath expression to filter what you want:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

foreach ($xpath->query('//text()[not(ancestor::a)]') as $node) {
    if (preg_match('/\d{3}/', $node->textContent)) {
        // do stuff with $node->textContent;
    }
}
Ja͢ck
  • 161,074
  • 33
  • 239
  • 294
0

kicaj, this situation sounds very similar to this question to regex match a pattern unless....

With all the disclaimers about using regex to parse html, there is a simple way to do it.

Here's our simple regex (see demo):

<a.*?</a>(*SKIP)(*F)|\d{3}

The left side of the alternation | matches complete <a ... </a> tags then deliberately fails and skips to the next position in the string. The right side matches groups of three digits, and we know they are the right digits because they were not matched by the expression on the left.

Note that if you only want to match three digits exactly, but not three digits within more digits, e.g. 123 in 12345, you may want to add a negative lookahead and a negative lookbehind:

<a.*?<\/a>(*SKIP)(*F)|(?<!\d)\d{3}(?!\d)

Reference

How to match (or replace) a pattern except in situations s1, s2, s3...

Community
  • 1
  • 1
zx81
  • 38,175
  • 8
  • 76
  • 97