Matching all characters until a specific word (in this case an html tag) with Regex

Question

Possible Duplicate:
How to parse and process HTML with PHP?

I'm not very good with regex, but I found this code:

<?php
$string = "some text (a(b(c)d)e) more text";
if(preg_match("/\((?>[^()]+|(?R))*\)/",$string,$matches))
{
    echo "<pre>"; print_r($matches); echo "</pre>";
}
?>

And I'm trying to change the regex pattern to match opening and closing html tags instead of parenthesis, but I cant figure out how to mimic "[^()]+" so that it matches tags instead of parenthesis.

The purpose of this would be to allow me to make a new html tag, whose contents I can access regardless of how many times the tag is nested within itself.
Thank you.

score 0 · Accepted Answer · answered Feb 11 '12 at 14:01

0

[^()] defines character class. ^ means "everything but following characters". So your example can be interpreted as everything except brackets.

If you're parsing content of html tag you require [^<>]+.

If you have content like <div>Blah <a>foo</a>bar</div> and you want to match Blah <a>foo</a>bar you should use regexp like ~<div>(.+?)</div>~

? after quantifier is called greedy killer and it'll make sure regexp "stops eating" when it encouters </div

Anyway... You should rather use DOM and xPath::query() when parsing HTML. Here's some random tutorial from google.

answered Feb 11 '12 at 14:01

Vyktor

19,006
5
53
93

the example with
Blah foobar
would suit my needs better if it could accurately parse
Blah
foo
bar
but thanks for the info about DOM, ill look into it – Max Feb 11 '12 at 14:48
@Max DOM will have better performance and all... I may add example of parsing it if you want to... – Vyktor Feb 11 '12 at 14:50

Matching all characters until a specific word (in this case an html tag) with Regex

1 Answers1