I have a lot of HTML files which have unwanted line-feeds. These break things like inline javascript and formatting within the pages. I want to come up with a way to strip out all line feeds from the pages that do not appear directly after an html tag e.g </div>
. Does anyone know of a regex and/or program that may be able to acheive this?
Asked
Active
Viewed 475 times
1
![](../../users/profiles/44562.webp)
soulmerge
- 68,989
- 18
- 113
- 147
-
1You might benefit from a minifier. See http://stackoverflow.com/questions/728260/html-minification/1102101. – David Andres Sep 16 '09 at 11:20
2 Answers
1
You may be able to use Notepad++'s search/replace function, with a regular expression to catch most of this.
Something like:
([^>])\n(.+)
Replaced with:
\1 \2
![](../../users/profiles/37947.webp)
DisgruntledGoat
- 62,693
- 62
- 192
- 281
-
1Depending on the format of the html file, you may need to use ([^>])\r\n(.+) or ([^>])\r(.+) instead. – Brian Sep 16 '09 at 13:07
0
You can use a negative lookbehind to match the line feeds
<?php
$buffer = file_get_contents('test.html');
// replace all line feeds not preceded by </div>
$buffer = preg_replace('|(?<!</div>)[\r\n]|', "", $buffer);
file_put_contents('test.new.html', $buffer);
?>
![](../../users/profiles/150463.webp)
Lance Rushing
- 7,210
- 4
- 26
- 33
-
-
you may actually want something more like (?]+>)(\r?\n){2,} i.e. any closing tag with more than 1 CRLF (where CR is optional) – Neel Sep 29 '09 at 11:29