99

Is there a online tool that we can input the HTML source of a page into and will minify the code?

I would do that for aspx files as it is not a good idea to make the webserver gzip them...

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Paulo
  • 6,603
  • 10
  • 34
  • 34
  • I'm not really sure what you're asking for. First you mention putting HTML source through it, but then you talk about ASPX pages. Are you trying to minify the output of the ASP.NET code, before it is sent to the browser? – Chad Birch Apr 08 '09 at 02:15
  • sorry..... actually, both... for static html pages and also aspx pages, which I believe would be better to use server side code? – Paulo Apr 08 '09 at 02:17
  • 19
    When is it a bad idea to have the server gzip? – Chuck Apr 08 '09 at 02:24
  • 5
    I read that because the aspx pages are not static files, it won't be cached by IIS and so it will gzip the page on every request... – Paulo Apr 08 '09 at 02:28
  • 23
    ...and is that a problem? Unless you server is already at 99.9% CPU, probably not. gzipping is the usual thing to do and much more effective than any ‘minification’. – bobince Apr 08 '09 at 09:07
  • 2
    This seems to be pretty interesting: http://perfectionkills.com/experimenting-with-html-minifier/ http://kangax.github.com/html-minifier/ – StefanS Jul 13 '10 at 11:28
  • @EmerickRogul: see the section "Cleaning up presentational markup" at http://www.w3.org/People/Raggett/tidy/. For some kinds of HTML this will reduce the file size. But it might not qualify as "minifying". – LarsH Jun 24 '13 at 13:54
  • @Chuck when you use SSL, HTTP compression compromises it. See BREACH attack. – Neil McGuigan Oct 10 '13 at 23:20
  • 2
    The answers here are outdated, not to mention that some of them are wrong. Please check my [explanation about the problem and the proper tool](http://stackoverflow.com/a/22447444/1090562). – Salvador Dali Mar 17 '14 at 05:17

8 Answers8

63

Perhaps try HTML Compressor, here's a before and after table showing what it can do (including for Stack Overflow itself):

Sorry, markdown has no concept of tables

It features many selections for optimizing your pages up to and including script minimizing (ompressor, Google Closure Compiler, your own compressor) where it would be safe. The default option set is quite conservative, so you can start with that and experiment with enabling more aggressive options.

The project is extremely well documented and supported.

Tim Post
  • 32,014
  • 15
  • 104
  • 162
59

Don't do this. Or rather, if you insist on it, do it after any more significant site optimizations are complete. Chances are very high that the cost/benefit for this effort is negligible, especially if you were planning to manually use online tools to deal with each page.

Use YSlow or Page Speed to determine what you really need to do to optimize your pages. My guess is that reducing bytes of HTML will not be your site's biggest problem. It's much more likely that compression, cache management, image optimization, etc will make a bigger difference to the performance of your site overall. Those tools will show you what the biggest problems are -- if you've dealt with them all and still find that HTML minification makes a significant difference, go for it.

(If you're sure you want to go for it, and you use Apache httpd, you might consider using mod_pagespeed and turning on some of the options to reduce whitespace, etc., but be aware of the risks.)

Zac Thompson
  • 11,645
  • 40
  • 55
  • 25
    What is wrong with optimization if minified code is easy to read using automated beautification? –  Jan 01 '10 at 15:33
  • 12
    It probably isn't the biggest problem - but if it's a trivial process to run markup through a minifying set of regex's when compiling from dev to qa or prod, then why wouldn't you want to send out smaller markup documents? – Will Peavy Jan 05 '10 at 14:43
  • 26
    Not actually an answer to the original question :( – Chuck Le Butt Jun 23 '10 at 11:18
  • 7
    @Will, it's almost certainly *not* a trivial process to run HTML through minifying regexes, and even using a proper parser it's probably not trivial or fast. What's more, unlike JS/CSS minification, HTML minification won't be lossless: any tag can be styled as `white-space: pre`, and minification would destroy the pre-formatted text. – eyelidlessness Feb 08 '11 at 21:18
  • 3
    @eyelidlessness - I currently have thousands of pages of that are minified by regexes before they are served. This function is not a complex or expensive part of the system. ... On the other hand, if you wanted to parse computed style in order to avoid minifying elements styled with `white-space:pre`, then yes, minifying HTML would be more complex. However, I'm not clear on why someone would want to use white-space:pre rather than using a `pre` or `code` element. – Will Peavy May 12 '11 at 17:41
  • @Will - `pre` affects a lot more than `white-space:pre` does. I'm glad your function works for you and is not complex, but in general I would never advise someone to spend resources on this unless they were sure it would pay off in a faster user experience. – Zac Thompson May 14 '11 at 06:26
  • @Zac - "pre affects a lot more than white-space:pre does" - When pre-formatted text is wrapped in a "pre" element, it makes it easier for both humans and machines to see that the text is intended to be pre-formatted. ... "I would never advise someone to spend resources on this unless they were sure it would pay off in a faster user experience." - Then we agree on this point. – Will Peavy May 15 '11 at 18:38
  • @Will But sometimes you don't want all the text to be pre-formatted, just whitespace preserved. Hence the difference. – Zac Thompson May 20 '11 at 23:00
  • @Zac - I think you're talking about the pre-line and pre-wrap properties. I've never had a use case for these. If someone wanted to write a minifier to handle these cases though - that minifier would either have to calculate computed style (that would be a pain to setup - you'd need to do something like integrate Gecko into the minifier), or would have to rely on a flag (for example, there might be a setting to ignore content inside an element with a class of "pre-wrap"). – Will Peavy May 27 '11 at 21:51
  • I managed to shave over 50kb on one of my html pages by minifying alone... And yes it wont work for small pages but on very large pages its deffinately worth minifying. Dont forget with complex DOM trees, depending on how far you indent, you can shave of up to 30 white spaces a line, especially using tables! – AlexMorley-Finch Nov 21 '11 at 13:43
  • My HTML often has a lot of comments about positioning of various elements. While the comments really make it elegant. I dont like the face that comments go to the client. I reached this question while figuring out a solution. – Eastern Monk Jul 26 '12 at 19:18
  • Trivial you say??? I have a page with a very big table and its size is 1,2Mb. After minifying it, its only 123Kb... Isn't this a considerable reduction of size? I Think it is... – Alvaro Feb 20 '13 at 09:44
  • @Steve how big is the gzipped original? How big is the gzipped minified version? – Zac Thompson Feb 20 '13 at 21:24
  • @ZacThompson My mistake. I apologize. I was saving the "complete" page with Chrome and not only the generated source code. – Alvaro Feb 21 '13 at 09:49
  • @Steve no need to apologize! It's easy to think that a change is big when in fact it doesn't have a significant impact on the end user. That's why I'm encouraging measuring first. There's almost always something else that will have a bigger effect than HTML minification. – Zac Thompson Feb 21 '13 at 19:46
34

Here is a short answer to your question: you should minify your HTML, CSS, JS. There is an easy to use tool which is called grunt. It allows you to automate a lot of tasks. Among them JS, CSS, HTML minification, file concatenation and many others.

The answers written here are extremely outdated or even sometimes does not make sense. A lot of things changed from old 2009, so I will try to answer this properly.

Short answer - you should definitely minify HTML. It is trivial today and gives approximately 5% speedup. For longer answer read the whole answer

Back in old days people were manually minifying css/js (by running it through some specific tool to minify it). It was kind of hard to automate the process and definitely required some skills. Knowing that a lot of high level sites even right now are not using gzip (which is trivial), it is understandable that people were reluctant in minifying html.

So why were people minifying js, but not html? When you minify JS, you do the following things:

  • remove comments
  • remove blanks (tabs, spaces, newlines)
  • change long names to short (var isUserLoggedIn to var a)

Which gave a lot of improvement even at old days. But in html you were not able to change long names for short, also there was almost nothing to comment during that time. So the only thing that was left is to remove spaces and newlines. Which gives only small amount of improvement.

One wrong argument written here is that because content is served with gzip, minification does not make sense. This is totally wrong. Yes, it makes sense that gzip decrease the improvement of minification, but why should you gzip comments, whitespaces if you can properly trim them and gzip only important part. It is the same as if you have a folder to archive which has some crap that you will never use and you decide to just zip it instead of cleaning up and zip it.

Another argument why it pointless to do minification is that it is tedious. Maybe this was true in 2009, but new tools appeared after this time. Right now you do not need to manually minify your markup. With things like Grunt it is trivial to install grunt-contrib-htmlmin (relies on HTMLMinifier by @kangax) and to configure it to minify your html. All you need is like 2 hours to learn grunt and to configure everything and then everything is done automatically in less than a second. Sounds that 1 second (which you can even automate to do nothing with grunt-contrib-watch) is not really so bad for approximately 5% of improvement (even with gzip).

One more argument is that CSS and JS are static, and HTML is generated by the server so you can not pre-minify it. This was also true in 2009, but currently more and more sites are looking like a single page app, where the server is thin and the client is doing all the routing, templating and other logic. So the server is only giving you JSON and client renders it. Here you have a lot of html for the page and different templates.

So to finish my thoughts:

  • google is minifying html.
  • pageSpeed is asking your to minify html
  • it is trivial to do
  • it gives ~5% of improvement
  • it is not the same as gzip
kangax
  • 37,379
  • 12
  • 94
  • 132
Salvador Dali
  • 182,715
  • 129
  • 638
  • 708
  • 3
    Minfying HTML is absolutely **not** trivial, as whitespace is significant in HTML and whether it any given whitespace can be removed depends on CSS. Also, thin clients are terrible and can’t, in my opinion, be given as a good argument against the troubles of minifying dynamic HTML. (One good way to do it is pick a template engine [Haml, Jade, etc.] that doesn’t include unnecessary whitespace in its rendered output in the first place.) – Ry- Aug 10 '14 at 16:57
  • @minitech minifying HTML is **trivial** also there are few possible problems with whitespaces (like ``). First of all you can always find a way to write valid html making it whitespace agnostic. Also you might be surprise to hear, but JS/CSS minifier can also introduce a bug - which does not mean that you should not use it. So two ways to solve your problem: learn to write whitespace agnostic markup, test your product before/after minification (CSS/HTML/JS). Also in Minifier you can specify what whitespaces you want to preserve. – Salvador Dali Aug 11 '14 at 05:50
  • Correct JavaScript minifiers on non-insane code (i.e. code that doesn’t read itself or cheat by timing) cannot introduce a bug. And no, there’s not always a way to write whitespace-agnostic HTML, specifically because HTML is, again, not whitespace-agnostic. At all. Make sure to test copying and pasting on this if you think margins will cut it. Specifying what whitespace I want to preserve sounds like a waste of time (except for Google)… – Ry- Aug 12 '14 at 01:05
  • @minitech can you show me CSS that is impossible to write in white-space agnostic way? I am minifying html for a long time, and have not seen problems so far. – Salvador Dali Aug 12 '14 at 01:08
  • `* { white-space: pre; }` is an obvious one, but if you’re removing all whitespace and not just collapsing it (replacing it with margins instead), text can copy incorrectly and wreak havoc on text browsers and screen readers. – Ry- Aug 12 '14 at 02:02
  • Sure, you can tailor a set of minification rules to fit your needs exactly, but fixing problems as they show up is terrible, and the aforementioned template engines let you achieve consistent and minimal results. – Ry- Aug 12 '14 at 02:05
  • @minitech I was looking for an example of what you want to achieve, not a particular rule. Programming is not about using a particular rule, pattern, language but rather solving a problem. What I am trying to tell that I can achieve the same effect making my code whitespace agnostic. – Salvador Dali Aug 12 '14 at 03:09
  • Paragraphs in a certain document are produced with indentation that uses spaces and specific hard wraps. I would like to maintain this indentation in the rendered document. – Ry- Aug 12 '14 at 04:49
  • @minitech use `
    ` and define that everything inside of `pre` should not be minified.
    – Salvador Dali Aug 12 '14 at 06:11
  • No, I have to put them in `

    ` elements.

    – Ry- Aug 12 '14 at 06:38
23

I wrote a web tool to minify HTML. http://prettydiff.com/?m=minify&html

This tool operates using these rules:

  • All HTML comments are removed
  • Runs of white space characters are converted to single space characters
  • Unnecessary white space characters inside tags are removed
  • White space characters between two tags where one of these two tags is not a singleton is removed
  • All content inside a style tag is presumed to be CSS and is minified as such
  • All content inside a script tag is presumed to be JavaScript, unless provided a different media type, and then minified as such
    • The CSS and JavaScript minification uses a heavily forked form of JSMin. This fork is extended to support CSS natively and also support SCSS syntax. Automatic semicolon insertion is supported for JavaScript minification, however automatic curly brace insertion is not yet supported.
    austincheney
    • 1,171
    • 9
    • 11
    8

    This worked for me:

    http://minify.googlecode.com/git/min/lib/Minify/HTML.php

    It's not an already available online tool, but being a simple PHP include it's easy enough you can just run it yourself.

    I would not save compressed files though, do this dynamically if you really have to, and it's always a better idea to enable Gzip server compression. I don't know how involved that is in IIS/.Net, but in PHP it's as trivial as adding one line to the global include file

    Forethinker
    • 3,148
    • 4
    • 23
    • 44
    adamJLev
    • 12,986
    • 11
    • 57
    • 62
    6

    CodeProject has a published sample project (http://www.codeproject.com/KB/aspnet/AspNetOptimizer.aspx?fid=1528916&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=2794900) to handle some of the following situations...

    • Combining ScriptResource.axd calls into a single call
    • Compress all client side scripts based on the browser capability including gzip/deflate
    • A ScriptMinifier to remove comments, indentations, and line breaks.
    • An HTML compressor to compress all html markup based on the browser capability including gzip/deflate.
    • And - most importantly - an HTML Minifier to write complete html into single line and minify it at possible level (under construction).
    The Lazy DBA
    • 1,462
    • 1
    • 12
    • 14
    3

    For Microsoft .NET platform there is a library called the WebMarkupMin, which produces the minification of HTML code.

    In addition, there is a module for integration this library into ASP.NET MVC - WebMarkupMin.Mvc.

    Andrey Taritsyn
    • 1,286
    • 11
    • 26
    1

    try http://code.mini-tips.com/html-minifier.html, this is .NET Libary for Html Minifier

    HtmlCompressor is a small, fast and very easy to use .NET library that minifies given HTML or XML source by removing extra whitespaces, comments and other unneeded characters without breaking the content structure. As a result pages become smaller in size and load faster. A command-line version of the compressor is also available.