0

I would like to read everything JavaScript out of a string with preg_match_all.

$pattern = '~<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>~su';
$success = preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);

array(0 => '<script>alert("Hallo Welt 1");</script>');

The result now contains the script tag as well. I would like to exclude this tag.

My Sample Online Regex with Sample Code.

Ryan
  • 3,607
  • 1
  • 23
  • 33
Severin
  • 145
  • 1
  • 12

1 Answers1

1

Regex is the wrong tool for parsing XML/HTML. You should use a DOM parser instead. XPath expressions is a language specialized on parsing DOM structures.

$html = <<<_EOS_
<script>alert("Hallo Welt 1");</script>
<div>Hallo Welt</div>
<script type ="text/javascript">alert("Hallo Welt 2");</script>
<div>Hallo Welt 2</div>
<script type ="text/javascript">
              alert("Hallo Welt 2");
</script>
_EOS_;

$doc = new DOMDocument();
$doc->loadHTML("<!DOCTYPE html><html>$html</html>");
$xpath = new DOMXPath($doc);
$scripts = $xpath->query('//script/text()');

foreach ($scripts as $script)
  var_dump($script->data);
Quasimodo's clone
  • 5,511
  • 2
  • 18
  • 36