4

I am using sockets to create a POST request to a given server. The response comes back mostly ok, and I'm using an InputStream with an encoding of "UTF-8" to read the response from the server. Most of the response makes sense and I'm able to view the HTML correctly, however, seemingly at random, I see codes such as "1ffa", "6e8", "1972", "90", "0" come up as single lines on the response as I'm reading it in. Here's how I create and read the response.

    String hostname = "server";
    SocketFactory socketFactory = SSLSocketFactory.getDefault();
    Socket socket = new Socket(hostname, 8080);
   // Create streams to securely send and receive data to the server
    InputStream in = socket.getInputStream();
    OutputStream out = socket.getOutputStream();
    PrintWriter writer = new PrintWriter(out);
    writer.println("POST /handlerServlet http/1.1");
    writer.println("Host: " + hostname);
    String parameters="params=" + URLEncoder.encode("paramsToEncode", "UTF-8"); 
    writer.println("Content-Length: " + parameters.length());
    writer.println("Content-Type: application/x-www-form-urlencoded");
    writer.println("Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7");
    writer.println("Keep-Alive: 115");
    writer.println("Connection: keep-alive");
    writer.println("\r\n" + parameters + "\r\n");
    writer.flush();
    // Read from in and write to out...
    String input = "";
    BufferedReader reader = new BufferedReader(new InputStreamReader(in, "UTF-8"));
    StringBuffer result = new StringBuffer();
    boolean startWriting = false;
    FileOutputStream outStream1 = new FileOutputStream(new File("/file1.txt"));
    Writer outWriter = new OutputStreamWriter(outStream1, "UTF-8");

    while ( (input = reader.readLine()) != null) {
    result.append(input);
    outWriter.write(input + "\n");
    result.append('\n');
    }
    System.out.println(result.toString());
    outWriter.close();
    // Close the socket
    in.close();

Does any one have any clue as to why I would see characters like this?

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<base href="http://server:8080/HW/YX+JpCEnNDe5B87CCyFj5KR7z9rqlwRK77aMm/44221331.htm">

1ffa

<meta http-equiv="Content-Type"  content="text/html; charset=ISO-8859-1">
<title></title>
</head>
<body bgcolor="#ffffff">
<!-- Created by Oracle Reports 21:14 Tue Jun 29 09:14:32 PM, 2010 -->
....
<tr valign=top>
  <td height=10></td>
  <td width=80 colspan=3 align=center><font size=2 face="helvetica">V002A050001</font></td>
  <
1ffa
td></td>

as you can see, having these characters appear in random locations can cause some hecktick behavior on the HTML code.

Thanks.

Jaime Garcia
  • 5,894
  • 7
  • 47
  • 60
  • Why don't you use [`URLConnection`](http://stackoverflow.com/questions/2793150/how-to-use-java-net-urlconnection-to-fire-and-handle-http-requests) or [Apache HttpComponents Client](http://hc.apache.org/httpcomponents-client/)? They handles this more transparently. – BalusC Jun 30 '10 at 01:31

1 Answers1

8

Do you get a header in your response that says something like this?

Transfer-Encoding: chunked

In this case, it's most likely due to HTTP Chunked Transfer Encoding. It's normal.

Bruno
  • 110,518
  • 24
  • 258
  • 357
  • 1
    ...and you should process it differently based on the response header. Since the other side is apparently a `Servlet`, you can also just set the `Content-Length` header beforehand to avoid that it will send the body in chunks. You can use `response.setContentLength()` for that. – BalusC Jun 30 '10 at 01:28
  • Indeed, setting the content length is a good workaround. I'd also suggest using an existing HTTP client library (unless there are constraints against that). There are plenty around and they tend to handle this well. – Bruno Jun 30 '10 at 01:31