28

Here's my HTML:

<a>View it in your browser</a>
<div id="html">
    <h1>Doggies</h1>
    <p style="color:blue;">Kitties</p>
</div>

How do I use JavaScript to make the href attribute of my link point to a base64 encoded webpage whose source is the innerHTML of div#html?

I basically want to do the same conversion done here (with the base64 checkbox checked) except for in JavaScript.

Rob W
  • 315,396
  • 71
  • 752
  • 644
Web_Designer
  • 64,966
  • 87
  • 197
  • 254

1 Answers1

46

Characteristics of a data-URI

A data-URI with MIME-type text/html has to be in one of these formats:

data:text/html,<HTML HERE>
data:text/html;charset=UTF-8,<HTML HERE>

Base-64 encoding is not necessary. If your code contains non-ASCII characters, such as éé, charset=UTF-8 has to be added.

The following characters have to be escaped:

  • # - Firefox and Opera interpret this character as the marker of a hash (as in location.hash).
  • % - This character is used to escape characters. Escape this character to make sure that no side effects occur.

Additionally, if you want to embed the code in an anchor tag, the following characters should also be escaped:

  • " and/or ' - Quotes mark the value of the attribute.
  • & - The ampersand is used to mark HTML entities.
  • < and > do not have to be escaped inside a HTML attribute. However, if you're going to embed the link in the HTML, these should also be escaped (%3C and %3E)

JavaScript implementation

If you don't mind the size of the data-URI, the easiest method to do so is using encodeURIComponent:

var html = document.getElementById("html").innerHTML;
var dataURI = 'data:text/html,' + encodeURIComponent(html);

If size matters, you'd better strip out all consecutive white-space (this can safely be done, unless the HTML contains a <pre> element/style). Then, only replace the significant characters:

var html = document.getElementById("html").innerHTML;
html = html.replace(/\s{2,}/g, '')   // <-- Replace all consecutive spaces, 2+
           .replace(/%/g, '%25')     // <-- Escape %
           .replace(/&/g, '%26')     // <-- Escape &
           .replace(/#/g, '%23')     // <-- Escape #
           .replace(/"/g, '%22')     // <-- Escape "
           .replace(/'/g, '%27');    // <-- Escape ' (to be 100% safe)
var dataURI = 'data:text/html;charset=UTF-8,' + html;
Community
  • 1
  • 1
Rob W
  • 315,396
  • 71
  • 752
  • 644
  • 2
    Thanks for the extensive answer. This was really helpful! :) – Web_Designer Feb 11 '12 at 19:42
  • Note Opera behaves similarly to Firefox regarding `#`. Chrome and Safari do not attach a special meaning to `#`. – Rob W May 01 '12 at 08:13
  • 2
    Small typo in the example at the bottom. If I'm not mistaken, `data:text/html,charset=UTF-8` should be `data:text/html;charset=UTF-8` – Braden Best Apr 07 '13 at 01:10
  • @B1KMusic Thanks for bringing it up. The comma must indeed be changed to a semicolon, and a trailing semicolon need to be added. Revised answer. – Rob W Apr 07 '13 at 09:10
  • Would [encodeURIComponent](https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/encodeURIComponent) work instead of your many uses of the `replace` method? – Web_Designer Apr 07 '13 at 21:31
  • @Web_Designer Yes. That's stated in the section before the code block with the many `.replace`s. – Rob W Apr 07 '13 at 21:32
  • Don't you need to loop on 'Replace all consecutive spaces' multiple times? Original: ssssss. 1st pass: sss. 2nd pass: ss. 3rd pass: s. – johny why Apr 05 '20 at 17:30