My goal is to scrape data from consumerreports.com, so I am utilizing 'requests' and 'beautifulsoup' for this project. Webscraping aside, I am having a lot of trouble successfully logging in on consumerreports.com through requests.
Here is my code: I created two text files in which I write the post and response, so I can check if it successfully logged in.
import requests
import os.path
#declares any necessary variables
#file1, file2 to check if login is successful
save_path = '/Users/myName/Documents/Webscraping Project/'
login_url = 'https://www.consumerreports.org/cro/index.htm'
my_url = 'https://www.consumerreports.org/cro/index.htm'
pName = os.path.join(save_path, 'post text file'+".txt")
rName = os.path.join(save_path, 'response text file'+".txt")
post_file = open(pName, "w")
response_file = open(rName, "w")
#login using Session class from Requests package
with requests.Session() as s:
payload = {"userName":"myName@university.edu","password":"my_password"}
p = s.post(login_url, data=payload)
print(p.text)
r = s.get(my_url)
#saves files to see if login was successful
post_file.write(str(p.text.encode('utf-8')))
response_file.write(str(r.text.encode('utf-8')))
post_file.close()
response_file.close()
print('Files created.')
This is what I got:
<!DOCTYPE html>
<html>
<head>
<title>405 Not allowed.</title>
</head>
<body>
<h1>Error 405 Not allowed.</h1>
<p>Not allowed.</p>
<h3>Guru Meditation:</h3>
<p>XID: #some number </p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
In addition, I checked the contents of the 'response text file.txt', and was able to determine through basic ctrl+f function that the system had not successfully logged in.
It seems that the web server does not accept the 'post' method, at least for this particular url, and that is why it's returning the error. However, I don't know how to proceed from here. I looked online, and someone suggested using
response = requests.get(login_url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'})
to create a user agent to "log in" or whatever. I'm still fairly new to python, so any advice will be appreciated.