Can I use unencoded ampersands (&) in html?

Question

I'm building a website where I have to work with less then perfect masterdata (I guess I'm not the only one :-))

In my case I have to render an xml filte to html (using xsl). Sometimes the masterdata is using html-enitites allready (eg ;é in french words) so there I have to use 'disable-output-escaping='yes') there in order to avoid double encoding.

The easiest solution is disable output escaping all together, so I never run the risk of a double encoding.

The only characters that misses encoding for this masterdata are the ampersands. But when I parse them 'raw' (so rather & than & all browsers seem to be ok with it.

So the question : what are the consequenses of using not encoded ampersands in html?

You have a really awkward situation to deal with - my sympathies. Can you preprocess the master data before the XSL transformation? You could replace any bare ampersands with `&`, using a simple regexp, so normalising the input before it gets to the XSL. — Tom Anderson, Jun 27 '12 at 07:57
@Peter it's possible these days for an asker to unilaterally self-close a question as a duplicate. I suggest doing so. — Mark Amery, Sep 19 '17 at 22:17
@Mark I got notified of your comment, I think the status is ok now? — Peter, Sep 20 '17 at 07:31

score 8 · Answer 1 · answered Jun 27 '12 at 07:51

It depends

The best research I have seen on this topic can be found here

In HTML5 you should escape all of the ampersands that do not fall in the categories below:

An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section.

Matthias Bynens, the author of the linked post, is a formidable and highly active developer. Any interesting question you have about the edge cases of the HTML, ECMAScript and CSS specs, there's a significant chance Matthias has written about it in pedantic detail, as he has done here. — Mark Amery, May 16 '15 at 20:55

Jon · Accepted Answer · 2012-06-27T08:00:16.723

AFAIK bare ampersands are illegal in HTML. With that out of the way, let's look at the consequences:

You are now relying on the browser's capabilities to detect and gracefully recover from the problem. Note that in order to do this, the browser has to guess: & is "clearly" an ampersand followed by a space, and © is clearly the copyright symbol. But what about the text fragment edit&copy? The browser I 'm using right now mangles it.
If you are using XHTML, or if the content is ever going to be inserted into an XML document, the result will be a hard parser error.

Since it's more difficult to detect and account for these cases manually than it is to replace all ampersands that are not part of entities (say with a regex), you should really do the latter.

the browser does not have to *guess* as there is a spec for that, at least in HTML5 — Razor, Jun 27 '12 at 08:11

score 4 · Answer 3 · edited May 23 '17 at 12:17

4

See Do I really need to encode '&' as '&'?

To summarize: Yes you can, but strictly speaking it is not legal (except in HTML5 where it is legal as long as it doesn't "look like" a character entity).

edited May 23 '17 at 12:17

Community

1
1

answered Jun 27 '12 at 07:53

Supr

16,712
3
28
35

thanks for pointing out the url – Peter Jun 27 '12 at 08:14

Can I use unencoded ampersands (&) in html?

3 Answers3