I have the following Java code to parse a website code:
URL url = new URL(urlToParse);
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
urlToParse is passed as a parameter to this function and is equal to "http://www.omegatiming.com/file/download/?id=00010F0200FFFFFFFFFFFFFFFFFFFF03".
The code is coming from here .
The output is Gibberish - full of question marks and unknown characters.
I tried adding these 5 lines after the openConnection() line.
con.setRequestMethod("GET");
con.setDoOutput(true);
con.setReadTimeout(2000);
con.setChunkedStreamingMode(0);
con.connect();
from the solution offered here, but then I get this exception:
Exception in thread "main" java.io.FileNotFoundException: http://www.omegatiming.com/file/download/?id=00010F0200FFFFFFFFFFFFFFFFFFFF03
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1835)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) coming from the line InputStream is =con.getInputStream();
Copying this link to the browser directs me to the website, so it couldn't be that the site is invalid, yet calling con.getresposeCode() returns 404.
When trying to get the error from getErrorStream() it prints this:
<!DOCTYPE html>
<html>
<head>
<title>The resource cannot be found.</title>
<meta name="viewport" content="width=device-width" />
<style>
body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;}
p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
.marker {font-weight: bold; color: black;text-decoration: none;}
.version {color: gray;}
.error {margin-bottom: 10px;}
.expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
@media screen and (max-width: 639px) {
pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
}
@media screen and (max-width: 479px) {
pre { width: 280px; }
}
</style>
</head>
<body bgcolor="white">
<span><H1>Server Error in '/' Application.<hr width=100% size=1 color=silver></H1>
<h2> <i>The resource cannot be found.</i> </h2></span>
<font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">
<b> Description: </b>HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable. Please review the following URL and make sure that it is spelled correctly.
<br><br>
<b> Requested URL: </b>/file/download/<br><br>
<hr width=100% size=1 color=silver>
<b>Version Information:</b> Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.34248
</font>
</body>
HttpException: A public action method 'download' was not found on controller 'SwissTiming.DocMgmt.DMSWeb.Controllers.FileController'.
at System.Web.Mvc.Controller.HandleUnknownAction(String actionName)
at System.Web.Mvc.Controller.<BeginExecuteCore>b__1d(IAsyncResult asyncResult, ExecuteCoreState innerState)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResultBase`1.End()
at System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult)
at System.Web.Mvc.Controller.<BeginExecute>b__15(IAsyncResult asyncResult, Controller controller)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResultBase`1.End()
at System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult)
at System.Web.Mvc.Controller.System.Web.Mvc.Async.IAsyncController.EndExecute(IAsyncResult asyncResult)
at System.Web.Mvc.MvcHandler.<BeginProcessRequest>b__5(IAsyncResult asyncResult, ProcessRequestState innerState)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResultBase`1.End()
at System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult)
at System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)
at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
--><!--
This error page might contain sensitive information because ASP.NET is configured to show verbose error messages using <customErrors mode="Off"/>. Consider using <customErrors mode="On"/> or <customErrors mode="RemoteOnly"/> in production environments.-->
And that is basically where I am stuck, and cannot understand the problem at all. I don't even know where does the ASP.NET comes from.
Other attampts to bypass the problem that did not solve it:
1. Adding
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
httpConnection.setRequestProperty("Accept","/");,
as suggested here. Also tried using the userAgent from this as suggested here.
Still getting the FileNotFoundException in getInputStream().
2. adding
* System.setProperty("http.agent", "");*
as mentioned here.
3. Back to the original problem (printing Gibberish)- I tried changing the call for InputStreamReader this way:
new InputStreamReader(new URL("www.website.com").openStream(), "UTF-8") as mentioned in the comment here, but it didn't change anything.
4. adding the lines:
con.setRequestMethod("POST");
con.setDoInput(true);
Still getting fileNotFoundException.
I'm pretty confused.
I'm not even sure if I have an encoding problem (since before trying to solve by adding things to the connection, there was no exception, "just" wrong output).
Or I have some other problem with the connection that I can't get input from it (and if so, what is special about this specific website, as the websites that lead me to this one, e.g http://www.omegatiming.com/Competition?id=00010F0200FFFFFFFFFFFFFFFFFFFFFF&sport=AQ&year=2015, could be parsed without a problem).
[[here][1]: Using Java to pull data from a webpage?
[here][2]: Trying to read from a URL(in Java) produces gibberish on certain occaisions
[here][3]: URLConnection FileNotFoundException for non-standard HTTP port sources
[here][4]: Setting "User-Agent" parameters for URLConnection for querying Google from a Java application
[here][5]: Setting user agent of a java URLConnection
[here][6]: Trying to read from a URL(in Java) produces gibberish on certain occaisions
[this][1]: http://www.whatsmyuseragent.com/