105

I am trying to post a request to log in to a website using the Requests module in Python but its not really working. I'm new to this...so I can't figure out if I should make my Username and Password cookies or some type of HTTP authorization thing I found (??).

from pyquery import PyQuery
import requests

url = 'http://www.locationary.com/home/index2.jsp'

So now, I think I'm supposed to use "post" and cookies....

ck = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}

r = requests.post(url, cookies=ck)

content = r.text

q = PyQuery(content)

title = q("title").text()

print title

I have a feeling that I'm doing the cookies thing wrong...I don't know.

If it doesn't log in correctly, the title of the home page should come out to "Locationary.com" and if it does, it should be "Home Page."

If you could maybe explain a few things about requests and cookies to me and help me out with this, I would greatly appreciate it. :D

Thanks.

...It still didn't really work yet. Okay...so this is what the home page HTML says before you log in:

</td><td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_email.gif">    </td>
<td><input class="Data_Entry_Field_Login" type="text" name="inUserName" id="inUserName"  size="25"></td>
<td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_password.gif"> </td>
<td><input  class="Data_Entry_Field_Login"  type="password" name="inUserPass"     id="inUserPass"></td>

So I think I'm doing it right, but the output is still "Locationary.com"

2nd EDIT:

I want to be able to stay logged in for a long time and whenever I request a page under that domain, I want the content to show up as if I were logged in.

Marcus Johnson
  • 1,923
  • 5
  • 18
  • 26

6 Answers6

260

I know you've found another solution, but for those like me who find this question, looking for the same thing, it can be achieved with requests as follows:

Firstly, as Marcus did, check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields. In his example, they are inUserName and inUserPass.

Once you've got that, you can use a requests.Session() instance to make a post request to the login url with your login details as a payload. Making requests from a session instance is essentially the same as using requests normally, it simply adds persistence, allowing you to store and use cookies etc.

Assuming your login attempt was successful, you can simply use the session instance to make further requests to the site. The cookie that identifies you will be used to authorise the requests.

Example

import requests

# Fill in your details here to be posted to the login form.
payload = {
    'inUserName': 'username',
    'inUserPass': 'password'
}

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post('LOGIN_URL', data=payload)
    # print the html returned or something more intelligent to see if it's a successful login page.
    print p.text

    # An authorised request.
    r = s.get('A protected web page url')
    print r.text
        # etc...
ZygD
  • 8,011
  • 21
  • 49
  • 67
tigerFinch
  • 3,316
  • 2
  • 14
  • 8
  • 14
    The question is however, how to get the POST login form? How can I know if it is called inUserName rather than username, USERNAME etc? – lsheng Apr 04 '14 at 06:43
  • 4
    @Twinkle look at the HTML source for the form to see what they're called there. – Aaron Schumacher Apr 07 '14 at 13:05
  • 3
    s.text doesn't seem to work, but I'm still giving you some voting love for showing me this lovely with requests... syntax – Software Prophets Jun 16 '14 at 21:03
  • s.text does not work because it should be something like this: `p = s.post('LOGIN_URL.....` and then `p.text` – Sebastian Feb 18 '15 at 15:58
  • @Sebastian quick question man so Im tinkering around with this and im trying the fb page. and it won't let me in. can you help me out? `https://www.facebook.com/login.php?login_attempt=1` thats the value of the action attribute of the form `email` name value the email input field "pass"` name value the password field after inputing the credentials it still won't let me in what's missing here? – Halcyon Abraham Ramirez Jul 22 '15 at 05:55
  • 2
    @HalcyonAbrahamRamirez I don't think this is the right place for you to seek help. I suggest reading question about you challenge specifically like: http://stackoverflow.com/questions/21928368/login-to-facebook-using-python-requests and if you can't solve it open your own question. – Sebastian Jul 23 '15 at 09:35
  • @Sebastian no worries I got's it i just used chrom developer tools and pasted the parsed form data to requests it's all good. thanks – Halcyon Abraham Ramirez Jul 23 '15 at 19:46
  • 1
    what if the username and password inputs don't have name or id attributes? – stackPusher Dec 09 '17 at 01:58
  • @AaronSchumacher How can I see what username and password are called for this url? https://www.morningstar.com/members/login.html – tommy.carstensen Jan 12 '18 at 14:15
  • Could this possibly work with a google login considering that google doesn't have its login inputs on a single page? – Noah Covey Aug 09 '18 at 02:54
  • @tigerfinch is this feasible with proxy? – brainLoop Nov 12 '18 at 05:35
  • @tigerfinch How do you pass a user-agent header into the post request? – find_all Apr 18 '20 at 05:50
  • If you want to see what exactly is needed for a successful login: open your favorite browser (Chrome for me), go to a login page, open the developer console, go to the network tab and clear log, and finally login with your credentials. Explore the HTTP requests and one of them must be the desired login URL, where credentials are being sent. Now you can look at its header and find the section with form data (= payload). I think this is a better way than just looking at page source, because there could be some JavaScript affecting a final payload. – jirinovo Oct 02 '20 at 07:58
51

If the information you want is on the page you are directed to immediately after login...

Lets call your ck variable payload instead, like in the python-requests docs:

payload = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
url = 'http://www.locationary.com/home/index2.jsp'
requests.post(url, data=payload)

Otherwise...

See https://stackoverflow.com/a/17633072/111362 below.

Community
  • 1
  • 1
katy lavallee
  • 2,571
  • 1
  • 27
  • 26
  • I got it to work a different way using urllib, urrlib2, and cookielib and some HTTP Headers. – Marcus Johnson Aug 10 '12 at 11:43
  • 24
    Sadly I can't delete this because it's the accepted answer. I don't think I understood the question when I posted this (it was clarified after), so not sure why it's accepted. My answer only works if the data you need is on the page you get redirected to after login. @tigerFinch has a much better answer. – katy lavallee Mar 10 '15 at 16:10
42

Let me try to make it simple, suppose URL of the site is http://example.com/ and let's suppose you need to sign up by filling username and password, so we go to the login page say http://example.com/login.php now and view it's source code and search for the action URL it will be in form tag something like

 <form name="loginform" method="post" action="userinfo.php">

now take userinfo.php to make absolute URL which will be 'http://example.com/userinfo.php', now run a simple python script

import requests
url = 'http://example.com/userinfo.php'
values = {'username': 'user',
          'password': 'pass'}

r = requests.post(url, data=values)
print r.content

I Hope that this helps someone somewhere someday.

  • 1
    nice one - note that sometimes inspecting the element of the name / pass field might reveal the file called rather than the button (mine just said 'action' on the button inspection, the url was shown from inspecting the usr / pass fields) – baxx Dec 04 '15 at 20:20
  • 2
    If you're using chrome, open the devtools on the network tab and after making the request you can inspect the actual values, with what keys and where were they sent to, this is useful for forms that don't use traditional mechanics and instead use javascript/ajax to process the form. – Roberto Arosemena Aug 06 '16 at 00:52
  • 1
    in this case any idea on how to make the web page pop up direct instead of print the page content? –  Jul 19 '17 at 10:07
  • You will need to use the ```webbrowser``` module – R. Barrett Jan 08 '20 at 19:25
  • Also his above ```print r.content``` is wrong he should be using ```print(r.content)``` – R. Barrett Jan 08 '20 at 19:25
  • Hi, so I am trying to do this but I just get the login page HTML back. I tried the original URL then using the action tag as you describe (I think) but still the same effect. #REQUEST_URL = 'http://www.hagdms.com/index.cfm?fuseaction=gad.obDetail&observation_ID=529405' REQUEST_URL = 'http://www.hagdms.com/index.cfm?CFID=0&CFTOKEN=074E7EBC-935E-4471-9054CAE59A85FB85fuseaction=gad.obDetail&observation_ID=529405' Can you confirm if I've got the right absolute URL? – Dan Jan 29 '21 at 14:36
  • So I used this identical code for Freecycle https://pybit.es/requests-session.html and it worked. So something is different with www.hagdms.com – Dan Jan 29 '21 at 14:57
8

The requests.Session() solution assisted with logging into a form with CSRF Protection (as used in Flask-WTF forms). Check if a csrf_token is required as a hidden field and add it to the payload with the username and password:

import requests
from bs4 import BeautifulSoup

payload = {
    'email': 'email@example.com',
    'password': 'passw0rd'
}     

with requests.Session() as sess:
    res = sess.get(server_name + '/signin')
    signin = BeautifulSoup(res._content, 'html.parser')
    payload['csrf_token'] = signin.find('input', id='csrf_token')['value']
    res = sess.post(server_name + '/auth/login', data=payload)
naaman
  • 737
  • 7
  • 11
7

Find out the name of the inputs used on the websites form for usernames <...name=username.../> and passwords <...name=password../> and replace them in the script below. Also replace the URL to point at the desired site to log into.

login.py

#!/usr/bin/env python

import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
payload = { 'username': 'user@email.com', 'password': 'blahblahsecretpassw0rd' }
url = 'https://website.com/login.html'
requests.post(url, data=payload, verify=False)

The use of disable_warnings(InsecureRequestWarning) will silence any output from the script when trying to log into sites with unverified SSL certificates.

Extra:

To run this script from the command line on a UNIX based system place it in a directory, i.e. home/scripts and add this directory to your path in ~/.bash_profile or a similar file used by the terminal.

# Custom scripts
export CUSTOM_SCRIPTS=home/scripts
export PATH=$CUSTOM_SCRIPTS:$PATH

Then create a link to this python script inside home/scripts/login.py

ln -s ~/home/scripts/login.py ~/home/scripts/login

Close your terminal, start a new one, run login

DBMage
  • 61
  • 9
David
  • 846
  • 9
  • 18
1

Some pages may require more than login/pass. There may even be hidden fields. The most reliable way is to use inspect tool and look at the network tab while logging in, to see what data is being passed on.

LoMaPh
  • 958
  • 2
  • 14
  • 28