0

I'm trying to use Python to scrape a website, but I have to login first before I can get to the page with the data on it.

The URL for the login page is:

https://tunein.com/account/login/?returnTo=https://amplifier.tunein.com/sessions/new&source=amplifier

I have read numerous threads which seem to answer the question, but I'm struggling to relate it to my own situation.

The code I have (from a response in this thread) is:

import requests

# Fill in your details here to be posted to the login form.
payload = {
    'Username': 'user',
    'Password': 'password'
}

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post('https://tunein.com/account/login/?returnTo=https://amplifier.tunein.com/sessions/new&source=amplifier', data=payload)
    # print the html returned or something more intelligent to see if it's a successful login page.
    print p.text

I have looked at the source code to see what the name of the form fields are, hence the 'Username' and 'Password' attributes in the payload variable.

When I run the script, p.text just returns the HTML of the same page so it obviously isn't logging in correctly. Any suggestions? Is there a better way to do it?

Edit:

The "Form Data" headers once I log in are: Username:user Password:pass Remember:true Remember:false btnLogin:Sign In ReturnTo:https://amplifier.tunein.com/sessions/new Source:amplifier

Does this mean I have to add all of these to my payload variable?

Community
  • 1
  • 1
Danilo
  • 3
  • 4
  • What does `p.status_code` say? – kylieCatt Sep 01 '15 at 14:39
  • It's possible there is an issue with the headers you're sending (or not sending). Do they have API documentation? – kylieCatt Sep 01 '15 at 14:43
  • No they have absolutely nothing useful, that's why I'm resorting to trying to login and then scrape the tables of each page once I'm in. – Danilo Sep 01 '15 at 14:48
  • It's pretty likely there is a header or some form data missing from you request that is preventing it from being accepted. In the network tab of your browser dev tools you can see what headers are being sent as well as form fields that you aren't sending. It's possible and in fact pretty likely this was done to prevent you from doing exactly what you are trying to do. – kylieCatt Sep 01 '15 at 14:54

0 Answers0