2

I will have a string (one line) composed by a HTML code that will be stored in a PHP variable. This string comes from a HTML page that normally has new line and white spaces between tags. We can have new line (one or more) and, or white space like this exemle:

<h1>tag1</h> 
       <p>Between h ad p we have \s and \n</p>

After perform a regex and preg_replace I would like to have this:

<h1>tag1</h><p>Between h ad p we have \s and \n</p>

I have tried this regex but it is not workig.

$str=<<<EOF
<h1>tag1</h> 
           <p>Between h ad p we have \s and \n</p>

EOF;


$string =  trim(preg_replace('/(>\s+<)|(>\n+<)/', ' ', $str)); 

Here you can find the entire code http://www.phpliveregex.com/p/7Pn

zwitterion
  • 3,958
  • 9
  • 43
  • 64
  • possible duplicate of [HTML minification?](http://stackoverflow.com/questions/728260/html-minification) – Glavić Nov 01 '14 at 06:12
  • Hi Glavic, it could work, but I need the regex expression behind the scenes. This site do the job http://kangax.github.io/html-minifier/ – zwitterion Nov 01 '14 at 06:20
  • See the footer: *Source and bugtracker are hosted on Github.* – Glavić Nov 01 '14 at 06:26

3 Answers3

5

There are two problems with

(preg_replace('/(>\s+<)|(>\n+<)/', ' ', $str)
  • \s already includes \n hence there is no need to provide another alternation.

  • (>\s+<)here the regex consumes both the angulars < and > hence replacing with space would remove everything including the angulars

The output is

<h1>tag1</hp>Between h ad p we have \s and \n</p>

which is not what you want

How to correct

use the regex (>\s+<) and replacement string as >< giving output as

<h1>tag1</h><p>Between h ad p we have \s and \n</p>

for example http://regex101.com/r/dI1cP2/2

you can also use lookaround to solve the issue

the regex would be

(?<=>)\s+(?=<)

and replace string would be empty string

Explanation

(?<=>) asserts that \s is presceded by >

\s+ matches one or more space

(?=<) asserts the \s is followed by <

Here the look arounds will not consume any angular brackets as in the earlier regex

see http://regex101.com/r/dI1cP2/3 for example

nu11p01n73R
  • 24,873
  • 2
  • 34
  • 48
0

You can try with this:

echo preg_replace("/(?=\>\s+\n|\n)+(\s+)/", "", $str);
jogesh_pi
  • 9,477
  • 4
  • 31
  • 61
0
(?<=<\/h>)\s+

Try this.See demo.Replace by empty string

http://regex101.com/r/jI8lV7/1

vks
  • 63,206
  • 9
  • 78
  • 110