216

If only deal with url encoding, I should use EscapeUriString?

Robert MacLean
  • 38,077
  • 24
  • 96
  • 147
user496949
  • 75,601
  • 138
  • 297
  • 413
  • 10
    Always escape each individual **value** using `Uri.EscapeDataString()`, as explained in @Livven's answer. With other approaches, the system simply does not have enough information to produce the intended result for every possible input. – Timo Jun 16 '16 at 12:34

5 Answers5

277

I didn't find the existing answers satisfactory so I decided to dig a little deeper to settle this issue. Surprisingly, the answer is very simple:

There is (almost) no valid reason to ever use Uri.EscapeUriString. If you need to percent-encode a string, always use Uri.EscapeDataString.*

* See the last paragraph for a valid use case.

Why is this? According to the documentation:

Use the EscapeUriString method to prepare an unescaped URI string to be a parameter to the Uri constructor.

This doesn't really make sense. According to RFC 2396:

A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.

While the quoted RFC has been obsoleted by RFC 3986, the point still stands. Let's verify it by looking at some concrete examples:

  1. You have a simple URI, like this:

     http://example.org/
    

Uri.EscapeUriString won't change it.

  1. You decide to manually edit the query string without regard for escaping:

     http://example.org/?key=two words
    

Uri.EscapeUriString will (correctly) escape the space for you:

    http://example.org/?key=two%20words
  1. You decide to manually edit the query string even further:

     http://example.org/?parameter=father&son
    

However, this string is not changed by Uri.EscapeUriString, since it assumes the ampersand signifies the start of another key-value pair. This may or may not be what you intended.

  1. You decide that you in fact want the key parameter to be father&son, so you fix the previous URL manually by escaping the ampersand:

     http://example.org/?parameter=father%26son
    

However, Uri.EscapeUriString will escape the percent character too, leading to a double encoding:

    http://example.org/?parameter=father%2526son

As you can see, using Uri.EscapeUriString for its intended purpose makes it impossible to use & as part of a key or value in a query string instead of as a separator between multiple key-value pairs.

This is because, in an attempt at making it suitable for escaping full URIs, it ignores reserved characters and only escapes characters that are neither reserved nor unreserved, which, BTW, is contrary to the documentation. This way you don't end up with something like http%3A%2F%2Fexample.org%2F, but you do end up with the issues illustrated above.


In the end, if your URI is valid, it does not need to be escaped to be passed as a parameter to the Uri constructor, and if it's not valid then calling Uri.EscapeUriString isn't a magic solution either. Actually, it will work in many if not most cases, but it is by no means reliable.

You should always construct your URLs and query strings by gathering the key-value pairs and percent-encoding and then concatenating them with the necessary separators. You can use Uri.EscapeDataString for this purpose, but not Uri.EscapeUriString, since it doesn't escape reserved characters, as mentioned above.

Only if you cannot do that, e.g. when dealing with user-provided URIs, does it make sense to use Uri.EscapeUriString as a last resort. But the previously mentioned caveats apply – if the user-provided URI is ambiguous, the results may not be desirable.

Rolf Bjarne Kvinge
  • 19,073
  • 2
  • 39
  • 79
Livven
  • 6,461
  • 5
  • 22
  • 19
  • 5
    Wow, thank you for finally clarifying this issue. The previous two answers were not very helpful. – EverPresent Dec 28 '15 at 20:53
  • 4
    Exactly right. EscapeUriString (like EscapeUrl's default behavior in Win32) was created by someone who didn't understand URIs or escaping. It's a misguided attempt to create something that takes a malformed URI and *sometimes* turn it into the intended version. But it doesn't have the information it needs to do this reliably. It also frequently gets used in place of EscapeDataString which is also very problematic. I wish EscapeUriString did not exist. Every use of it is wrong. – Brandon Paddock Feb 05 '16 at 01:19
  • 4
    nicely explained +1 it is way better than accepted link only answer – Ehsan Sajjad Apr 25 '16 at 17:23
  • 1
    This answer needs more attention. It is the correct way to do it. The other answers have scenarios where they do not produce the intended results. – Timo Jun 16 '16 at 12:31
  • I will be an alternate voice of reason here. Coming from JavaScript where there are two distinct functions [encodeURI](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI) and [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent), this answer and some of the comments like "I wish EscapeUriString did not exist" appear mis-guided... – Crescent Fresh Nov 14 '17 at 22:59
  • 1
    ...Sure `encodeURI`/`Uri.EscapeUriString` is not needed as often as `encodeURIComponent`/`Uri.EscapeDataString` (since when are you deaing with blind urls that must be used in a uri context), but that does not mean it doesn't have its place. – Crescent Fresh Nov 14 '17 at 23:02
  • 1
    Point #3: "it assumes the ampersand signifies the start of another key-value pair" is a bit misleading. Key-value pair syntax is a web framework thing, not a URI thing. I think it's more accurate to say spaces are escaped (point #2) because they are _illegal_ in a URI; ampersands are not, because they are not. – Todd Menier Dec 04 '17 at 03:22
  • 1
    @CrescentFresh You haven't actually explained where `encodeURI`/`Uri.EscapeUriString` are needed. Can you give a single use case where `encodeURIComponent`/`Uri.EscapeDataString` are **not** the best solution for the problem? – imgx64 Mar 14 '18 at 06:32
  • Uri.EscapeDataString worked for me too. I was previously using WebUtility.HtmlEncode(str) to escape form input, however this was causing exceptions on the server of this form "A potentially dangerous Request.Form value was detected from the client". One example is for single quotes - encoded to &39; by HTMLEncode, but correctly (and safely) encoded to %27 by Uri.EscapeDataString. – Eeeeed Dec 03 '18 at 10:36
  • @CrescentFresh You're right, a valid use would be as a best-effort when dealing with user-provided URIs. I added that to the answer. Are there any other you could think of? – Livven Nov 11 '19 at 15:02
  • @Livven - even as a best-effort solution for user-provided URI's EscapeUriString is probably not a good idea. It's not clearly documented; and whatever processing you need to do for user-provided uri's is likely going to exceed that method anyhow. e.g. let's say your user enters `google.com/?q=bla bla` - EscapeUriString isn't going to do anything useful, unlike most browsers, that will. The tiny niche for implementing a browser url bar is so specialized, .net simply shouldn't have a method for that, and even if you're going that - don't use Uri.EscapeUriString. It's still not good enough. – Eamon Nerbonne Jun 10 '20 at 15:12
  • @Livven Then there's the fact that even when EscapeUriString does "something" - what destination server won't do that better? If it's comprehensible enough to best-effort escape, then let the target server deal with it. Finally consider that the "real-world" use case for EscapeUriString is simply making a bug by accident. Best be clearly about it's usefulness therefore - just don't use it. Ever. – Eamon Nerbonne Jun 10 '20 at 15:14
  • @CrescentFresh People don't use EscapeUriString correctly - https://github.com/search?p=99&q=EscapeUriString&type=Code for some additional reason's why you should be 100% clear it's just not a good idea to *ever* use this. Nobody is using it correctly. Can you find even one case where it's at least clearly harmless and has any reasonable effect whatsoever? I can see a ton that are clearly wrong, and bet you could find a few exploitable security holes just on the basis of those search results. Don't use it; it's dangerous and useless - even as a best-effort fallback. – Eamon Nerbonne Jun 10 '20 at 15:16
  • After wondering why it seemed "&" was not encoded, then it seemed like it was being encoded, I think I mixed up these two, causing a bit of panic. Looking closely at documentation, it seems like Uri.EscapeString is now marked Obsolete. @BrandonPaddock, seems like this is close to what you were hoping. – bobwki Jun 10 '20 at 15:45
121

Use EscapeDataString always (for more info about why, see Livven's answer below)

Edit: removed dead link to how the two differ on encoding

Jcl
  • 24,674
  • 5
  • 55
  • 81
  • 3
    I'm not sure that link actually provides more information as it's regarding unescaping rather then esacaping. – Steven Aug 30 '13 at 15:52
  • 1
    It's basically the same difference. If you actually read the article, there's a table around the middle that actually escapes (not unescapes) to show the differences (comparing with `URLEncode` too). – Jcl Aug 30 '13 at 15:57
  • 2
    It's still not clear to me -- what if I'm not escaping a whole URI but only part of it -- (i.e. the *data* for a query string parameter)? Am I escaping data for the URI, or does EscapeDataString imply something completely different? – BrainSlugs83 Nov 10 '13 at 03:37
  • 4
    ... did some testing looks like I want EscapeDataString for a URI parameter. I tested with the string "I heart C++" and EscapeUriString did not encode the "+" characters, it just left them as is, EscapeDataString correctly converted them to "%2B". – BrainSlugs83 Nov 10 '13 at 03:42
  • @BrainSlugs83 yes, that's on the article I linked at. If you want to be more specific, you should be using `HttpUtility.UrlEncode` if what you are encoding is a URL... that will also change spaces into `+` (which is correct for a URL, more so than %20 -although both will work-) and will use the more correct lowercase too. As documentation states, `EscapeUriString` does not convert RFC2396 reserved characters (that includes `+`, but also others: more info [here](http://www.ietf.org/rfc/rfc2396.txt) ) – Jcl Nov 11 '13 at 16:10
  • 1
    I'm not encoding URL or URIs, I'm encoding data that goes into the value of a query string parameter of a URL (again, that data is not a URL or URI). As far as personal preference goes: using "+" for " " in a URL is evil, because some functions (as you mention) will randomly leave them in -- and on the server side, it can be ambiguous -- where as "%20" and "%2B" are explicit -- there's no chance to get the decoding wrong. – BrainSlugs83 Nov 25 '13 at 05:54
  • Yeah, well, it's a matter of standards (there's RFC's that define these kind of encodings). The problem is that browsers have historically been pretty loose on their support of encodings. The functions are not "randomly" encoding or decoding... they follow some standards or not, and it's usually documented :-) – Jcl Nov 25 '13 at 07:18
  • 1
    Here's a sample of running it and the other encoding methods that shows differences https://dotnetfiddle.net/12IFw1 – Maslow Sep 17 '14 at 18:02
  • 7
    This is a bad answer. You should never use EscapeUriString, it doesn't make any sense. See Livven's answer below (and upvote it). – Brandon Paddock Feb 05 '16 at 01:20
  • 1
    By StackOverflow standards, this is a terrible answer. It doesn't actually explain the difference, gives confusing (and incorrect) advice, and leaves everything up to an external link. If that link becomes dead in the future, this answer will no longer be valid or correct. – SlugFiller Oct 05 '17 at 14:56
  • I have updated the answer to link to the obviously more correct answer below. Also removed the dead link – Jcl Jan 15 '19 at 17:15
60

The plus (+) characters can reveal a lot about the difference between these methods. In a simple URI, the plus character means "space". Consider querying Google for "happy cat":

https://www.google.com/?q=happy+cat

That's a valid URI (try it), and EscapeUriString will not modify it.

Now consider querying Google for "happy c++":

https://www.google.com/?q=happy+c++

That's a valid URI (try it), but it produces a search for "happy c", because the two pluses are interpreted as spaces. To fix it, we can pass "happy c++" to EscapeDataString and voila*:

https://www.google.com/?q=happy+c%2B%2B

*)The encoded data string is actually "happy%20c%2B%2B"; %20 is hex for the space character, and %2B is hex for the plus character.

If you're using UriBuilder as you should be, then you'll only need EscapeDataString to properly escape some of the components of your entire URI. @Livven's answer to this question further proves that there really is no reason to use EscapeUriString.

Seth
  • 5,489
  • 4
  • 40
  • 51
  • Thanks. What about when you have a absolute URI string that you need to encode, for example `"https://www.google.com/?q=happy c++"`. Looks like I manually need to split on "?", or is there a better way? – wensveen Mar 09 '15 at 11:59
  • If you're passing the entire URL as a parameter to another URL, then use `EscapeDataString`. If the URL you provided is the actual URL, then yes you want to just split on `?`. – Seth Mar 09 '15 at 15:21
7

Comments in the source address the difference clearly. Why this info isn't brought forward via XML documentation comments is a mystery to me.

EscapeUriString:

This method will escape any character that is not a reserved or unreserved character, including percent signs. Note that EscapeUriString will also do not escape a '#' sign.

EscapeDataString:

This method will escape any character that is not an unreserved character, including percent signs.

So the difference is in how they handle reserved characters. EscapeDataString escapes them; EscapeUriString does not.

According to the RFC, the reserved characters are: :/?#[]@!$&'()*+,;=

For completeness, the unreserved characters are alphanumeric and -._~

Both methods escape characters that are neither reserved nor unreserved.

I disagree with the general notion that EscapeUriString is evil. I think a method that escapes only illegal characters (such as spaces) and not reserved characters is useful. But it does have a quirk in how it handles the % character. Percent-encoded characters (% followed by 2 hex digits) are legal in a URI. I think EscapeUriString would be far more useful if it detected this pattern and avoided encoding % when it's immediately proceeded by 2 hex digits.

Todd Menier
  • 32,399
  • 14
  • 130
  • 153
2

A simple example

var data = "example.com/abc?DEF=あいう\x20えお";

Console.WriteLine(Uri.EscapeUriString(data));
Console.WriteLine(Uri.EscapeDataString(data));
Console.WriteLine(System.Net.WebUtility.UrlEncode(data));
Console.WriteLine(System.Web.HttpUtility.UrlEncode(data));

/*
=>
example.com/abc?DEF=%E3%81%82%E3%81%84%E3%81%86%20%E3%81%88%E3%81%8A
example.com%2Fabc%3FDEF%3D%E3%81%82%E3%81%84%E3%81%86%20%E3%81%88%E3%81%8A
example.com%2Fabc%3FDEF%3D%E3%81%82%E3%81%84%E3%81%86+%E3%81%88%E3%81%8A
example.com%2fabc%3fDEF%3d%e3%81%82%e3%81%84%e3%81%86+%e3%81%88%e3%81%8a
*/
Learning
  • 17,618
  • 35
  • 153
  • 314