13

I receive a GET response to this web service

@GET
@Path("/nnnnnn")
public Response pfpfpfpf(@BeanParam NNNNNN n)

The class NNNNN has:

@QueryParam("parameter")
private String parameter;

And for that parameter there is a get and set.

I send a request on a get with a query parameter and it is being bind automatically to my option NNNNN, everything is great.

but, now i am sending Japanese strings in the query url. I encode the paramter by UTF-8 before sending, and I have to decode them using UTF-8.

but my question is where should I call the URLDecoder? i tried to call it in the getter of that parameter, but it didn't work, i kept having something like C3%98%C2%B4%C3%98%C2 instead of the Japanese characters

Mike
  • 4,592
  • 1
  • 26
  • 48
Marco Dinatsoli
  • 9,244
  • 33
  • 108
  • 224
  • You could decode on the call to the setter (the string you're receiving is URL encoded, you want to decode before storing). – adamdc78 Nov 02 '15 at 02:05
  • I can't reproduce the problem. It might be how you are sending it from the client. If I test in the browser, use the URL bar and let the browser encode the characters, it comes out fine. Do this: add `@Produces("text/plain; charset=UTF-8")` to your `pfpfpf` method, and just return `n.getParameter()` (without decoding). You will see it works fine. Type the url in the browser (without encoding the characters), and you will see the result the same as the query parameter – Paul Samsotha Nov 02 '15 at 02:43
  • You might be double encoding. You do it once, then the client (agent) does it again That's why when decoding you still have an encoded string. – Paul Samsotha Nov 02 '15 at 02:46
  • @peeskillet how can u not be able to reproduce it man? just create a class, and its parameter with @ Queryparam annotation, and pass an object of that class to the method that can be called by "GET" – Marco Dinatsoli Nov 02 '15 at 16:41
  • Do what I said in my comment and see if you still get the same result – Paul Samsotha Nov 02 '15 at 16:41
  • @peeskillet actually that is not correct, i can't add text/plain, because my response is a response type, not a text type – Marco Dinatsoli Nov 02 '15 at 16:43
  • The point is not about the response, the point of the response it just to show in the browser that the result is the same as the request URL query param, meaning the query param is coming in just fine. Try it. If it works, then it's a problem with how you are sending it with the client you are using – Paul Samsotha Nov 02 '15 at 16:43
  • 1
    Sure.... But do what I said, you will see that it works. Then you know you have a problem with the client you are using. Here is something to think about: When you url encode the Japanese characters, you will get stuff like `C3%98%C2%B4%C3%98%C2`. This is what it means to url encode. It is possible that the client you are using is encoding it again. So instead of sending the encoded Japanese characters, you are sending and encoding of the encoding. So you have a double encoding. So when you get it on the server it is the double encoding. So when you decode it once, you have the first encoding – Paul Samsotha Nov 02 '15 at 16:53
  • That is the point I am trying to get at. If you type the Japanese character into the browser URL, you will see that it works. – Paul Samsotha Nov 02 '15 at 16:56
  • @peeskillet i did the request from the browser using arabic characters, and what i did is: `C3%99%C2%81%C3%99%C2%8A%C3%99%C2%` help please – Marco Dinatsoli Nov 05 '15 at 13:52
  • @peeskillet after decoding, i present the data on the browser and i get this `ÙÙ٠تÙرا Ø` – Marco Dinatsoli Nov 05 '15 at 14:08

2 Answers2

4

The solution that works for me is :

on the servlet, i should do this:

request.setCharacterEncoding("UTF-8");

and then on the html page i had to add this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Marco Dinatsoli
  • 9,244
  • 33
  • 108
  • 224
1

This is a good question which has potential clear many doubts about how information is processed (encoded and decoded) between systems.

Before I proceed I must say have a fair understanding on Charset, Encoding etc. You may want to read this answer for a quick heads up.

This has to looked from 2 perspectives - browser and server.

Browser perspective of Encoding

Each browser will render the information/text, now to render the information/text it has to know how to interpret those bits/bytes so that it can render correctly (read my answer's 3rd bullet that how same bits can represent different characters in different encoding scheme).

Browser page encoding

  • Each browser will have a default encoding associated with it. Check this on how to see the default encoding of browser.
  • If you do not specify any encoding on your HTML page then default encoding of browser will take effect and will render the page as per those encoding rules. so, if default encoding is ASCII and you are using Japanese or Chinese or characters from Unicode supplementary plane then you will see garbage value.
  • You can tell browser that do not use your default encoding scheme but use this one to render by website, using <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">.
    • And this exactly what you did/found and you were fine because this meta tag essentially overrode the default encoding of browser.
    • Another way to achieve same effect is do not use this meta tag but just change the browser's default encoding and still you will be fine. But this is not recommended and using Content-Type meta tag in your JSP is recommended.

Try playing around with browser default encoding and meta tag using below simple HTML.

<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head>
    <body>
        の, は, でした <br></br>
        昨夜, 最高
    </body>        
</html>

Server perspective of Encoding

Server should also know how to interpret the incoming stream of data, which basically means that which encoding scheme to use (server part is tricky because there are several possibilities). Read below from here

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications. In addition, the CGI specification contains rules for how web servers decode data of this type and make it available to applications.

This again has 2 parts that how server should decode the incoming request stream and how it should encode the outgoing response stream.

There are several ways to do this depending upon the use case, for example:

  • There are methods like setCharacterEncoding, setContentType etc. in HTTP request and response object, which can be used to set the encoding.
    • This is exactly what you have done in your case that you have told the server that use UTF-8 encoding scheme for decoding the request data because I am expecting advanced Unicode supplementary plane characters. But this is not all, please do read more below.
  • Set the encoding at server or JVM level, using JVM attributes like -Dfile.encoding=utf8. Read this article on how to set the server encoding.

In your case you were fetching the Japanese characters from query string of the URL and query string is part of HTTP request object, so using request.setCharacterEncoding("UTF-8"); you were able to get the desired encoding result.

But same will not work for URL encoding, which is different from request encoding (your case). Consider below example, in both sysout you will not be able to see the desired encoding effect even after using request.setCharacterEncoding("UTF-8"); because here you want URL encoding since the URL will be something like http://localhost:7001/springapp/forms/executorTest/encodingTest/hellothere 昨夜, 最高 and in this URL there is no query string.

@RequestMapping(value="/encodingTest/{quertStringValue}", method=RequestMethod.GET)
    public ModelAndView encodingTest(@PathVariable("quertStringValue") String quertStringValue, ModelMap model, HttpServletRequest request) throws UnsupportedEncodingException {
        System.out.println("############### quertStringValue " + quertStringValue);
        request.setCharacterEncoding("UTF-8");
        System.out.println("############### quertStringValue " + quertStringValue);
        return new ModelAndView("ThreadInfo", "ThreadInfo", "@@@@@@@ This is my encoded output " + quertStringValue);
    }

Depending upon the framework you are using you may need additional configuration to specify a character encoding for requests or URLs so that you can either apply own encoding if the request does not already specify an encoding, or enforce the encoding in any case. This is useful because current browsers typically do not set a character encoding even if specified in the HTML page or form.

In Spring, there is org.springframework.web.filter.CharacterEncodingFilter for configuring request encoding. Read this similar interesting question which is based on this fact.

In nut shell

Every computer program whether an application server, web server, browser, IDE etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

Community
  • 1
  • 1
hagrawal
  • 12,025
  • 4
  • 33
  • 61