1

I need to automate file downloading from a website. The file download button appears only after login for which I was provided username and password. In login form there are two more hidden fields one of which is csrf_token_login with a generated value:

<input type="hidden" name="csrf_token_login" value="nl9YERDFpecfITb8QwFWneoaefykxp2b" />

It is clear how to code this in Java (using java.net.HttpUrlConnection) if I would have just login and password (there is excellent explanation for this in Using java.net.URLConnection to fire and handle HTTP requests ): submit POST request, get cookies and set them for any subsequent request. But how can I get a generated value of csrf_token_login on the login form and submit it with other values?

Reading it using getInputStream() on the HttpURLConnection of a login page gives me the csrf value. But at the same time this establishes connection and prevents from setting connection properties for posting data:

private HttpURLConnection logUrlCon;
... 
BufferedReader logInput = new BufferedReader(new InputStreamReader(logUrlCon.getInputStream())); 
... // read and get csrf value OK

logUrlCon.setDoOutput(true); // throws java.lang.IllegalStateException: Already connected

Is there any way of getting this csrf_tiken_login value generated in a login form AND posting it with username and password?

Community
  • 1
  • 1
tv116
  • 11
  • 3

1 Answers1

1

Read login page content and extract the data using regular expressions. Your hidden field has a very distinctive form (with a unique name, etc.), so perfectly suitable for regular expression based data extraction.

Hakan Serce
  • 10,888
  • 3
  • 26
  • 43
  • I tried to read login page. The problem is when I try to read as above or via URLConnection conn = logUrl.openConnection();... I get some data that do not include – tv116 May 29 '12 at 09:31
  • ... I get some data that do not include login.html fields. logUrl is http://www.hupx.hu/login.html, but conn.url is http://www.hupx.hu/mobile.html that provides some data. This is the problem - how to get the login.html input with login form (and its fields). – tv116 May 29 '12 at 09:55
  • I have overcome redirection to other page and managed to read the csrf value using getInputStream(). But then I am unable to write it (with username and password) to the connection, because getInputStream() has already established connection and provided response. Is there any other way to get csrf value from login page, setting up POST request properties and submitting post login request with csrf value and other data? – tv116 Jul 10 '12 at 08:48
  • Please read the following page about how to read/write from/to a URLConnection: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html – Hakan Serce Jul 10 '12 at 14:33
  • It shows how to read from a URLConnection and how to write to (another) URLConnection and read response from it. I need to code how to first read from a URLConnection AND then write to the SAME URLConnection having it still opened. My understanding is that this is not allowed, as in http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4192018 they say: _It is illegal to call getOutputStream() after calling getInputStream(). … The HttpURLConnection() cares about the ordering of this. The call to getInputStream() triggers … an HTTP request._ Therefore I got exception trying setDoOutput(true) – tv116 Jul 16 '12 at 09:10