-1

I have simple PHP regex to extract all content within <body></body>

Regex is

<body>(.*?)<\/body>

This is the text

<!doctype html>
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="">
    <meta name="author" content="">

    <link href="http://localhost//themes/default/../dashboard/css/bootstrap.min.css" rel="stylesheet" type="text/css" />

    <link rel="stylesheet" type="text/css" href="http://localhost//assets/cache/default_product_groups_product_groups_mod.css" media="screen" />
    <link rel="shortcut icon" href="http://localhost//favicon.ico">
</head>

<body>
    <p dir='rtl'>
     <a target='_blank' href='https://zuz.mx/2e5y'>לרכישה מכאן במחיר 37.01$</a>
    </p>

    <input id="base_url_special" type="hidden" name="base_url_special" value="http://localhost//"/>

</body>
<script src="http://localhost//themes/default/js/jquery.min.js" type="text/javascript"></script>

<script src="http://localhost//assets/cache/default_fetchPG_product_groups_mod.js?_dt=1492617362" type="text/javascript"></script>
<script src="http://localhost//themes/default/../dashboard/js/bootstrap.min.js" type="text/javascript"></script>

</body>
</html>

This is the live example https://regex101.com/r/joLaTm/1

Thomas Ayoub
  • 27,208
  • 15
  • 85
  • 130
Umair Ayub
  • 13,220
  • 12
  • 53
  • 124

2 Answers2

4

You have to set the single line option to make . match new lines (see fork of your test). Or use [\S\s]*

Thomas Ayoub
  • 27,208
  • 15
  • 85
  • 130
4

Look at the explanation section of your live example.

.* matches any character (except for line terminators)

Add the s flag to your regex. https://regex101.com/r/joLaTm/3

ceejayoz
  • 165,698
  • 38
  • 268
  • 341
  • Also how to exclude BODY tags from results? I tried this but it only excludes the ending body tag `/(=?)(.*?)(?=)/ism` – Umair Ayub Apr 19 '17 at 16:09
  • @Umair No need, that's already how it works. Don't use the "full match", use the "group 1" result, which is just the `(.*?)` portion of it. It'll probably be in `$matches[1]` assuming you're using `preg_match` in the way the docs advise. – ceejayoz Apr 19 '17 at 16:11
  • @ceejayoz there is a typo in the lookbehind op uses – Thomas Ayoub Apr 19 '17 at 16:13