I'm trying to scrape a web forum, and having trouble accessing pages that are behind a login. Inspecting the elements of the login page, I found that the ID of the username and password input elements change each time I refresh the page. My current strategy is to
- Create and use a requests session
- Make GET request for the forum login page
- Use BeautifulSoup to extract the IDs of the username and password input elements
- Use the extracted IDs as the keys, and my account username and password as values, for a payload dict that is passed into a POST request for the login page
- Make GET request for a page on the forum
I'm running into a problem in step 4: the status code of the POST request is 400, indicating that I'm doing something wrong.
Here's an MWE, in which the variables KIWIFARMS_USERNAME
and KIWIFARMS_PASSWORD
have been changed to not be my actual account username and password:
import os
import requests
from bs4 import BeautifulSoup
# login url for forum, and fake forum credentials (they're real in my script)
LOGIN_URL = 'https://kiwifarms.net/login/'
KIWIFARMS_USERNAME = 'username'
KIWIFARMS_PASSWORD = 'password'
with requests.Session( ) as session:
# step 1
r = session.get( LOGIN_URL )
# step 2
soup = BeautifulSoup( r.content, 'lxml' )
# step 3
username_id = soup.find( 'input', { 'autocomplete' : 'username' } )[ 'id' ]
password_id = soup.find( 'input', { 'type' : 'password' } )[ 'id' ]
payload = {
username_id: KIWIFARMS_USERNAME,
password_id : KIWIFARMS_PASSWORD }
# step 4
post = session.post( LOGIN_URL, data = payload )
# failure of step 4 (prints 400)
print( post.status_code )
I've looked at a lot of pages and links, including this, this, this, and this, but I still can't figure out why my post request is getting a 400 Bad Request error.
I have a version of this working in Selenium, but I'd really like to know the mistake I'm making and get this working using Requests. Any help would be greatly appreciated.