Using requests to login to site, when username and password element IDs are not constant

Question

I'm trying to scrape a web forum, and having trouble accessing pages that are behind a login. Inspecting the elements of the login page, I found that the ID of the username and password input elements change each time I refresh the page. My current strategy is to

Create and use a requests session
Make GET request for the forum login page
Use BeautifulSoup to extract the IDs of the username and password input elements
Use the extracted IDs as the keys, and my account username and password as values, for a payload dict that is passed into a POST request for the login page
Make GET request for a page on the forum

I'm running into a problem in step 4: the status code of the POST request is 400, indicating that I'm doing something wrong.

Here's an MWE, in which the variables KIWIFARMS_USERNAME and KIWIFARMS_PASSWORD have been changed to not be my actual account username and password:

import os

import requests
from bs4 import BeautifulSoup

# login url for forum, and fake forum credentials (they're real in my script)
LOGIN_URL = 'https://kiwifarms.net/login/'
KIWIFARMS_USERNAME = 'username'
KIWIFARMS_PASSWORD = 'password'

with requests.Session( ) as session:

  # step 1
  r = session.get( LOGIN_URL )

  # step 2
  soup = BeautifulSoup( r.content, 'lxml' )

  # step 3
  username_id = soup.find( 'input', { 'autocomplete' : 'username' } )[ 'id' ]
  password_id = soup.find( 'input', { 'type' : 'password' } )[ 'id' ]

  payload = {
    username_id: KIWIFARMS_USERNAME,
    password_id : KIWIFARMS_PASSWORD }

  # step 4
  post = session.post( LOGIN_URL, data = payload )

  # failure of step 4 (prints 400)
  print( post.status_code )

I've looked at a lot of pages and links, including this, this, this, and this, but I still can't figure out why my post request is getting a 400 Bad Request error.

I have a version of this working in Selenium, but I'd really like to know the mistake I'm making and get this working using Requests. Any help would be greatly appreciated.

The general way to solve this sort of problem is to inspect how a browser login works using a network tracing programm like Telerik Fiddler, then make sure your code provides the needed header and data. — barny, Mar 08 '20 at 09:00

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2020-03-08T08:41:07.307

The website is generating a _xfToken during the login, also you missed some Form-Data for the POST request.

Here I've maintain the session using requests.Session()and then i parsed the value of _xfToken during my GET request, and then passed it via POST request.

import requests
from bs4 import BeautifulSoup


def Main():
    with requests.Session() as req:
        r = req.get("https://kiwifarms.net/login/login")
        soup = BeautifulSoup(r.text, 'html.parser')
        token = soup.find("input", {'name': '_xfToken'}).get("value")
        data = {
            'username': 'test',
            'password': 'test',
            'remember': '1',
            '_xfRedirect': '/',
            '_xfToken': token
        }
        r = req.post("https://kiwifarms.net/login/login", data=data)
        print(r)


Main()

Output:

<Response [200]>

if you will check r.text so you will see that we are on the right track.

<div class="blockMessage blockMessage--error blockMessage--iconic">
The requested user could not be found.
</div>

That's confirm we are doing it correctly since i didn't passed a valid user/pass.

score 0 · Answer 2 · answered Nov 12 '20 at 08:08

you're trying to post to https://kiwifarms.net/login/ , while the login form action is /login

I got the same error when I had url/login/ in the url. It passed status_code to 200 when I simply changed it to url/login ... (basically just removed the last redundant slash!)

Using requests to login to site, when username and password element IDs are not constant

2 Answers2