1

I am trying to write a webscraping program in python. Howevere, the pages I want to scrape are behind a login. I have an account and have been trying to follow the help posted here . I think I have done everything right, but I cannot get past the login. My code is posted below:

#!/usr/bin/env python                                                                                                                                                

import requests, sys, lxml.html

#logging in
s = requests.Session()
login_url = 'https://login.fidelity.com/ftgw/Fas/Fidelity/RtlCust/Login/'

payload = {
    'ssn' : 'USERNAME',
    'pin' : 'PASSWORD'
}

s.post(login_url, data=payload, headers=dict(referer='https://login.fidelity.com'))

#page to scrape
response = s.get('https://fixedincome.fidelity.com/ftgw/fi/FIBondDetails?requestType=&displayFormat=TABLE&cusip=30382LDK1&ordersystem=TORD&preferenceName=')

print response.content #redirected to the login page
binzabinza
  • 535
  • 5
  • 14
  • 1
    How do you know it doesn't work? See how to create a [mcve]. – Peter Wood Jul 13 '17 at 15:20
  • check the response to post request - is it successful? it is possible the website tries to block web scrapers so you may have to go further to impersonate a web browser user-agent etc – Anentropic Jul 13 '17 at 15:23
  • The response.content is the login page - When I try to GET the url I want to scrape, it redirects to login page. At least I'm pretty sure that's what is happening. – binzabinza Jul 13 '17 at 15:38
  • @Anentropic so I added in headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}, but I still have the same problem. – binzabinza Jul 13 '17 at 15:42
  • until you know the post to login has succeeded there's no point doing the get – Anentropic Jul 13 '17 at 15:44
  • Well if the POST succeeded, I should have the correct response from the GET right? Since I'm using a session? – binzabinza Jul 13 '17 at 16:52
  • yes it sounds like the POST is failing and you're not logged in, so the obvious thing to do would be to check the response the the POST request in case there was some error message giving a clue to the problem – Anentropic Jul 18 '17 at 10:19

1 Answers1

1

You are missing a few things.

The loginurl is

login_url = 'https://login.fidelity.com/ftgw/Fas/Fidelity/RtlCust/Login/Response/dj.chf.ra'

And you need to pass these two additional params in the post

'DEVICE_PRINT' : 'version%3D3.4.2.0_1%26pm_fpua%3Dmozilla%2F5.0+(x11%3B+linux+x86_64%3B+rv%3A41.0)+gecko%2F20100101+firefox%2F41.0%7C5.0+(X11)%7CLinux+x86_64%', 'SavedIdInd' : 'N',

And its SSN and PIN (upper case)

I tried this url after that and it works for me.

response = s.get('https://oltx.fidelity.com/ftgw/fbc/oftop/portfolio')

print response.content

mwahal
  • 36
  • 4
  • BTW, after you successfully login, expect to see this output after the post in r.content { "status": { "result": "success", "nextStep": "Finish", "context": "RtlCust" } } `code` r = s.post(login_url, data=payload, headers=dict(referer='https://login.fidelity.com')) print (r.content) – mwahal Jul 26 '17 at 14:03
  • Wow that worked perfectly. How did you figure out the correct login_url? I figured out the other two parameters I needed to pass, but I always kept using the url in my original post. – binzabinza Jul 26 '17 at 18:09
  • Just look at the page and the form in the page which its posting. You can actually remove the dj.chf.ra and it will still work. From the webpage, look at the form action link, *html* form id="Login" name="Login" action="/ftgw/Fas/Fidelity/RtlCust/Login/Response" method="post" role="form" – mwahal Jul 26 '17 at 18:34
  • As of this date I don't see this working anymore. @mwahal an you confirm it still works for you? – paulperry Oct 21 '18 at 02:15
  • I have not used it since then. I will see if I can find a solution. – mwahal Oct 22 '18 at 17:11
  • The simplest solution is to precede the post request with a get to fidelity website. That will instantiate a cookie session. I just did that and it worked. – mwahal Oct 22 '18 at 21:44