7

I am decoding http packets. And I faced a problem that chunk problem. When I get a http packet it has a header and body. When transefer-encoding is chunked I don't know what to do ?

Is there a useful API or class for dechunk the data in JAVA ?

And if someone , experienced about http decoding , please show me a way how to do this ?

CodingForever
  • 95
  • 1
  • 2
  • 4

4 Answers4

12

Use a fullworthy HTTP client like Apache HttpComponents Client or just the Java SE provided java.net.URLConnection (mini tutorial here). Both handles it fully transparently and gives you a "normal" InputStream back. HttpClient in turn also comes with a ChunkedInputStream which you just have to decorate your InputStream with.

If you really insist in homegrowing a library for this, then I'd suggest to create a class like ChunkedInputStream extends InputStream and write logic accordingly. You can find more detail how to parse it in this Wikipedia article.

Community
  • 1
  • 1
BalusC
  • 992,635
  • 352
  • 3,478
  • 3,452
  • Actually I make offline http decoding and I have just header and body of http packet. And I will decode the packet. But I need an api that gets header and body and gives me decoded data. Is there so api ? – CodingForever Sep 15 '10 at 13:38
  • The Wikipedia article contains detail how a chunk look like. You can basically just split on CRLF (\r\n). It are the bytes 10 and 13. The first part is then the header which represents the chunk length in hex. The second part is then the chunk data itself. You just collect and concatenate all those chunks. The `ChunkedInputStream` does exactly that. – BalusC Sep 15 '10 at 13:46
  • Sorry , there are two ChunkedInputStream class first: http://jigsaw.w3.org/Doc/Programmer/api/org/w3c/www/http/ChunkedInputStream.html second: http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/ChunkedInputStream.html Which one is right for this ? And Do you have any information about how to use ChunkedInputStream ? – CodingForever Sep 16 '10 at 08:20
  • The `ChunkedInputStream` part in my answer is clickable (as all other blueish parts). It's an `InputStream`, you can just decorate another `InputStream` with it. E.g. `InputStream input = new ChunkedInputStream(originalInput);`. – BalusC Sep 16 '10 at 11:12
  • Firstly thanks for your answers , they indeed help me. But I have an another question ChunkedInputStream constructor gets SessionInputBuffer( interface ) How I will convert the string( chunked body ) to this format ? – CodingForever Sep 16 '10 at 11:51
  • Hmm, it has changed as per HttpClient 4.x. Well, either pick the HttpClient 3.x one which takes an `InputStream` or homebrew one. Detemining a chunked body is pretty trivial. Just split on CRLF and do the logic math. – BalusC Sep 16 '10 at 12:22
1

If you are looking for a simple API try Jodd Http library (http://jodd.org/doc/http.html). It handles Chunked transfer encoding for you and you get the whole body as a string back.

From the docs:

HttpRequest httpRequest = HttpRequest.get("http://jodd.org");
HttpResponse response = httpRequest.send();

System.out.println(response);
Andrejs
  • 24,146
  • 10
  • 98
  • 92
1

Here is quick-and-dirty alternative that requires no dependency except Oracle JRE:

private static byte[] unchunk(byte[] content) throws IOException {
    ByteArrayInputStream bais = new ByteArrayInputStream(content);
    ChunkedInputStream cis = new ChunkedInputStream(bais, new HttpClient() {}, null);
    return readFully(cis);
}

It uses the same sun.net.www.http.ChunkedInputStream as java.net.HttpURLConnection does behind the scene.

This implementation doesn't provide detailed exceptions (line numbers) on wrong content format.

It works with Java 8 but could fail in with next release. You've been warned.

Could be useful for prototyping though.

You can choose any readFully implementation from Convert InputStream to byte array in Java.

Community
  • 1
  • 1
Vadzim
  • 21,258
  • 10
  • 119
  • 142
1

Apache HttpComponents

Oh, and if we are talking about the client side, HttpUrlConnection does this as well.

Maurice Perry
  • 31,563
  • 8
  • 67
  • 95
  • I will offline http decoding( already captured packets ) and I have just a header and a body. So I need an api that gets just header and body and gives me decoded data. Is there any api like this? – CodingForever Sep 15 '10 at 13:40