I'm currently experimenting with delivering XHTML5. Currently I deliver XHTML 1.1 Strict on the page I'm working on. That is I do for capable browsers. For those who don't accept XML encoded data I fall back to HTML4.1 strict.
In experimenting with using HTML5 for either, when delivering as HTML5 all works more or less as expected. The first issue I have when delivering as XHTML5 however is with the HTML entities. FF4 sais ü
is an undefined entity. Because there is no HTML5 DTD.
I read that the HTML5 wiki currently recommends:
Do not use entity references in XHTML (except for the 5 predefined entities:
&
,<
,>
,"
and'
)
I do need <
, >
at certain places. Hence my Question is what is the best way in PHP to decode all but the five entities named above. html_entity_decode()
decodes all of them, so is there a reasonable way to exclude some?
UPDATE:
I went with a simple replace / replace back approach for the moment, so unless there really is an elegant way the question is solved enough for my immediate needs.
function non_html5_entity_decode($string)
{
$string = str_replace("&",'@@@AMP',
str_replace("'",'@@@APOS',
str_replace("<",'@@@LT',
str_replace(">",'@@@GT',
str_replace(""",'@@@QUOT',$string)))));
$string = html_entity_decode($string);
$string = str_replace('@@@AMP',"&",
str_replace('@@@APOS',"'",
str_replace('@@@LT',"<",
str_replace('@@@GT',">",
str_replace('@@@QUOT',""",$string)))));
return $string;
}