9

encodeForHtml() (new in CF10) vs htmlEditFormat(), how are they different?

James A Mohler
  • 10,562
  • 14
  • 41
  • 65
Henry
  • 31,972
  • 19
  • 112
  • 214

2 Answers2

10

I think it is same as encodeForHTML function in java's OWASP ESAPI. More secure to avoid XSS attack to use content in HTML.

<cfsavecontent variable="htmlcontent">
<html>
    <head>
        <script>function hello() {alert('hello')}</script>
    </head>
    <body>
        <a href="#bookmark">Book Mark &amp; Anchor</a><br/>
        <div class="xyz">Div contains & here.</div>
        <IMG     SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#    x27&#x58&#x53&#x53&#x27&#x29>
    <IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>
</body>
</html></cfsavecontent>

<cfoutput>#htmleditformat(htmlcontent)#</cfoutput>
<br />
<cfoutput>#encodeforhtml(htmlcontent)#</cfoutput>
Pritesh Patel
  • 1,950
  • 18
  • 31
  • 2
    Seems strange that they would not just enhance the pre-existing tag via another attribute to make it more secure or just plain enhance it out of the box. – Snipe656 May 15 '12 at 14:17
  • 3
    Well, encodeForHtml() is part of a set: encodeForCss(), encodeForJavascript(), encodeForHtmlAttribute(), etc. It's also supposed to escape more than the original htmlEditFormat(). – ale May 15 '12 at 14:48
  • 4
    Since they use different output they added a new tag as part of the afore mentioned set rather than modify the existing tag. This helps maintain backward compatibility with existing code. – Justin Scott May 15 '12 at 15:15
  • 2
    I think it also comes at the risk of cluttering the language when you have multiple functions(offhand these HTML ones mentioned then two JS ones) that at first glance appear to do the same thing but in fact they don't. It leaves me wondering at what point should I have a valid reason to use HTMLEditFormat() over EncodeForHTML()? When 10 was still in Beta and I read through the docs that very question entered my mind and has been lurking in there ever since. – Snipe656 May 15 '12 at 15:59
  • Same can be said for the new encodeForURL() vs URLFormat(). What do you guys think? http://stackoverflow.com/questions/10604987/should-encodeforhtml-encodeforurl-be-used-from-cf10-onward-in-flavor-of-h – Henry May 15 '12 at 16:25
  • @Henry, I think yes.. this is again for security major. – Pritesh Patel May 15 '12 at 16:56
  • 6
    The more I think about this, I think what bothers me the most is why are the old functions not specifically flagged as deprecated. It just implies to me that some usage is seen for the older ones during new development over always using the newer functions. – Snipe656 May 15 '12 at 18:05
  • Early on I encountered a bug with the new functions and have been using the old ones since out of trust level: https://bugbase.adobe.com/index.cfm?event=bug&id=3566150 supposedly fixed now though so I should probably change my habits. I agree the old ones should be left but marked deprecated not replaced or dropped. – TheCycoONE Jan 09 '15 at 17:20
5

EncodeFor* functions are based on the OWASP ESAPI libraries. The main difference is that HTMLEditFormat() merely replaces "bad" strings, like &, < and > with good strings, like &amp;, &lt; and &gt; whereas EncodeForHTML() is smarter, with one advantage being it can recognize content that is already encoded and not double-encode it.

For example, if a user submitted the following content to your site:

<div>
Here is <i>test</i> html content includes<br/>
<script>alert('hello')</script>
Notice how &amp; rendered with both functions.
</div>

Both HTMLEditFormat() and EncodeForHTML() would properly escape the '<' and '>' characters. But HTMLEditFormat() would blindly encode the & again such that your output looks like:

... how &amp;amp; rendered ...

Where it would otherwise look like with encodeForHTML():

... how &amp; rendered ...

HTMLEditFormat() couldn't tell that the ampersand was already encoded, so it re-encoded it again. This is a trivial example, but it demonstrates how the ESAPI libraries are smarter and, therefore, more secure.

Bottom line, there's no reason to use HTMLEditFormat() in CF10+. For maximum protection, you should replace the Format functions with the Encode functions.

The complete example above and more background are at isummation: http://www.isummation.com/blog/day-2-avoid-cross-site-scripting-xss-using-coldfusion-10-part-1/