2

The W3C Validator is firing up an error about ampersands in some of my URLs. For example:

<link href="min/?f=static/css/reset.css,static/css/main5.css&2" rel="stylesheet">

According to the HTML5 spec, however, the validator is wrong:

An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more alphanumeric ASCII characters, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section.

This site comes to the same conclusion and states that no validators currently implement the spec correctly.

Is there anything definitive on this?

dcaswell
  • 2,946
  • 2
  • 22
  • 25
Chuck Le Butt
  • 43,669
  • 58
  • 179
  • 268

2 Answers2

4

I believe you're correct - according to the following thread, this is a bug in the HTML5 validator. I don't know if this is "definitive" as it's not an official statement (but I recognise the names on the replies as those of reputable members here on SO at least).

What you're trying to do is indeed valid in HTML5 (in specific cases, such as yours).

Here's an excerpt from that bug report for reference:

Original thread post:

<img src="http://codx.altervista.org/scripts/php/image.phpimg=/membri/codx/grafica
/articles_covers/cover_t3dc1360866428.jpg&h=96" alt="Teeter 3D contesT" />

"Simply, that & does not have to be escaped to &amp;" - Source.

Thread answer/explanation:

That's right, in HTML5. It's a bug in the validator that it says otherwise, see http://lists.w3.org/Archives/Public/www-validator/2013Mar/0009.html. The unstable development version of the validator, http://qa-dev.w3.org:8888/ has this bug fixed (and your document validates in it).

This may, in part, reflect the nature of HTML5 validator as experimental software that checks against some "specification" which is not identified in public and which may change at any moment without notice and often does. - Source.

Note - The excerpts above haven't been changed, but aren't the full question/answer, merely snippets. Any future readers are advised to see the thread linked to at the top of this answer for the full question + explanation of why this behaviour occurs.

dsgriffin
  • 61,907
  • 17
  • 128
  • 134
  • It would be great to find a definitive statement on this because even the unstable development version now fails such examples :-/ – Chuck Le Butt Jun 28 '13 at 22:16
  • @DjangoReinhardt You're right, it doesn't. Check out the bug report for that here - http://bugzilla.validator.nu/show_bug.cgi?id=841 – dsgriffin Jun 28 '13 at 22:47
  • Hmm. It seems it's actually a bit of a grey area at the moment? I'm going to un-accept your answer to see if anyone else might come along with more information. – Chuck Le Butt Jun 28 '13 at 23:57
  • 1
    @DjangoReinhardt It's understandable. I would post the bug report myself but I think it'd be marked as a dupe – dsgriffin Jun 28 '13 at 23:58
1

First, the HTML 5 specification is constantly changing so validators and the validity of this answer can be expected to break.

That being said, I repeat the quote that defines an “ambiguous ampersand”:

An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more alphanumeric ASCII characters, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section.

In other words it's something that looks like a named character reference but is unknown to the specification. Now that the specification defined the term it defines when such ambiguous ampersands must not occur:

  • textarea, title: Escapable raw text elements can have text and character references, but the text must not contain an ambiguous ampersand.
  • MathML, SVG elements: … but the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand.
  • Ordinary non-empty HTML elements: … but the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand.
  • Attribute values: … with the additional restriction that the text cannot contain an ambiguous ampersand.

The bullet points have been quoted from the specification, too. Please search the specification for “ambiguous ampersand” for the full sentences that have been omitted here.

The HTML 5 specification does allow ambiguous ampersand in raw text elements (script and style tags), though. Just because HTML 5 it gives a definition for “ambiguous ampersands” and browsers are able to cope with “ambiguous ampersands” most times it does not mean they are valid for general use.

So do escape “ambiguous ampersands” to make them unambiguous except in script and style tags.

Let's come back to your case. You do not have an “ambiguous ampersand” because your ampersand is not followed by alphanumerics and a semicolon. As it is not followed by that sequence one should assume your ampersand is to be taken literally and retain it as is. Therefore your ampersand should be considered valid according to the HTML 5 specification.

Remark: I'd suggest to escape your ampersand nevertheless as your are relying on a detail of an unstable specification. Additionally I would not expect every software to follow the specification that close and instead go with the simpler rule to escape ampersands whenever they occur as I cannot see this calls for trouble.

Augustus Kling
  • 3,118
  • 20
  • 22