2

In my program, when my servlet get the parameters, it should make an urlConnection to php, and get the created xml back. Everything is fine but it seems my program will not support special characters. I believe the key point is the servlet. But what should I do to let the servlet support special characters? For example:

String urlString = "XXXXXXX";   
URL url= new URL(urlString);    
URLConnection connection = url.openConnection();

I hope the url which will connect to .php can be something like:

http://XXX.php?name=José&type=artist

But the result showed me seems the url changed to:

http://XXX.php?name=Jos&type=artist

It ignored the special character é. What should I do?

BalusC
  • 992,635
  • 352
  • 3,478
  • 3,452
lkkeepmoving
  • 2,023
  • 5
  • 21
  • 30

1 Answers1

4

Only those characters are allowed in URLs:

RFC 3986 section 2.2 Reserved Characters (January 2005)
!   *   '   (   )   ;   :   @   &   =   +   $   ,   /   ?   #   [   ]
RFC 3986 section 2.3 Unreserved Characters (January 2005)
A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z
a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z
0   1   2   3   4   5   6   7   8   9   -   _   .   ~   

Other characters must be URL-encoded.

Basically, you must URL-encode every single request parameter value (and actually also parameter name, but since they're in your particular case apparently hardcoded and already URL-safe, that's not necessary).

String name = "José";
String type = "artist";
// ...
String url = String.format("http://XXX.php?name=%s&type=%s", 
    URLEncoder.encode(name, "UTF-8"), 
    URLEncoder.encode(type, "UTF-8"));

Note that the involved charset is depending on whatever the target server supports. A lot of legacy servers are still configured to use ISO-8859-1 for that. You should then replace "UTF-8" throughout the above example by "ISO-8859-1". Contact the server admin when not sure. The URL-encoded form of é is in UTF-8 %C3%A9 and in ISO-8859-1 %E9.

See also:

Community
  • 1
  • 1
BalusC
  • 992,635
  • 352
  • 3,478
  • 3,452
  • My servlet get the José, then servlet transfer it to javascript, do you know why the javascript create a new html shows Jos� ? Thank you! – lkkeepmoving Apr 08 '13 at 21:09
  • You specified the wrong encoding at some point. Carefully read [Unicode - How to get the characters right?](http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html) to learn about all possible points where it can go wrong and how to fix it. – BalusC Apr 08 '13 at 21:15