0

I'm using regex to match specific div's in a page and replace them with a custom formatted one. I can't use domdocument as often the pages we process are mal-formed and after running it through domdocument, the pages are reformatted and don't display the same.

I'm currently using the following which works perfectly:

preg_match('#(\<div id=[\'|"]'.$key.'[\'|"](.*?)\>)(.*?)\<\/div\>#s', $contents, $response);

To match div tags such as:

<div id="test"></div>
<div id="test" style="width: 300px; height: 200px;"></div>
etc...

The problem I'm encountering is tags where the id is after the style or class, example:

<div class="test" id="test"></div>

If I run the following, the regex seems to become greedy and matches a ton of html before the div tag, so I'm not sure how to fix this:

preg_match('#(\<div(.*?)id=[\'|"]'.$key.'[\'|"](.*?)\>)(.*?)\<\/div\>#s', $contents, $response);

Does anyone have any ideas?

Andy Lester
  • 81,480
  • 12
  • 93
  • 144
Joe
  • 1,670
  • 8
  • 37
  • 59

2 Answers2

4

You can use the Ungreedy modifer (U), and also - do not use .*, but [^>]* (which means anything that is not > as > is the end of the tag and you are searching withing the tag). You don't need to escape / when this is not your delimiter (you are using # as delimiter)

preg_match('#(<div[^>]*id=[\'|"]'.$key.'[\'|"][^>]*>)(.*)</div>#isU', $contents, $response);
Maxim Krizhanovsky
  • 24,757
  • 5
  • 49
  • 85
0

Don't use regex for HTML parsing, there are DOM parsers out there, like PHP DOM: http://www.php.net/manual/en/book.dom.php

Rafał Walczak
  • 543
  • 5
  • 11
  • 1
    Right, but to use them, you need a valid dom, which can mean tidy first, then implementing the dom object, and that can be a lot of overhead. Sometimes, especially when you're looking for specific bits of information, it doesn't make sense to go through the whole ritual of Dom parsing, when you can write a simple regular expression in two lines of code. – Yitzhak Oct 17 '15 at 00:13
  • There's no need to be afraid of this stuff. It's only a chainsaw. – Yitzhak Oct 17 '15 at 00:17