8

My attempt to log into a website and download a specific file has hit a fall.

Specifically, I am logging into this website http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0

in order so that I can select specific variables and parameters before I download the file and save as an excel or csv.

In particular, I want to toggle the highlighted inputs inputs, before selecting the type of crop, water supply, input level, time period, and geographic areas before downloading the file under 'Visualization and Download' button.

For example, I would like to get the data for Wheat (Crop), rain-fed (Water Supply), High (Input Level), 1961-1990 (Time Period, Baseline), United States of America (Geographic Areas). Then I want to save it as an excel file.

This is my code so far:

# Import library
import requests

# Define url, username, and password
url = 'http://www.gaez.iiasa.ac.at/w/ctrl?_flow=Vwr&_view=Welcome&fieldmain=main_lr_lco_cult&idPS=0&idAS=0&idFS=0'
user, password = 'Username', 'Password'
resp = requests.get(url, auth=(user, password))

Perhaps I'm ingrained in the trenches of the entire process to see an easy, viable solution, but any help is greatly appreciated.

  • 1
    Possible duplicate of [How to "log in" to a website using Python's Requests module?](https://stackoverflow.com/questions/11892729/how-to-log-in-to-a-website-using-pythons-requests-module) – Adam Jul 14 '17 at 16:48
  • 1
    https://stackoverflow.com/questions/11892729/how-to-log-in-to-a-website-using-pythons-requests-module/17633072#17633072 – Adam Jul 14 '17 at 16:48

1 Answers1

18

Website that you linked uses HTTP POST based login from. In your code you have:

resp = requests.get(url, auth=(user, password))

which will use basic http authentication http://docs.python-requests.org/en/master/user/authentication/#basic-authentication

To login to this site you need two things:

  • persistent session cookie
  • HTTP POST request to login form URL

First of all let's create session object that will be holding cookies form server http://docs.python-requests.org/en/master/user/advanced/#session-objects

s = requests.Session()

Next you need to visit site using GET request. This will generate cookie for you (server will send cookie for your session).

s.get(site_url)

Final step will be to login to site. You can use Firebug or Chrome Developer Console (depending of what browser you use) to examine what fields needs to be send (Go to Network tab).

s.post(site_url, data={'_username': 'user', '_password': 'pass'})

This two fields (_username, _password) seems to be valid for your site, but as I examine data which was send during POST request, there were more fields. I don't know if they are necessary.

After that you will be authenticated. Next thing will be to visit URL for file you would like to download.

s.get(file_url)

The link you provided contains query string with various options that are related probably to options you want to be highlighted. You can use it to download file with desired options.

Warning Note

Note that this site is not using HTTPS secure connection. Any credentials you will provide will go through the internet unencrypted and can be potentially see by someone who should not see them.

Little Bobby Tables
  • 3,724
  • 4
  • 25
  • 43