2

I read the answer to the question: "How to “log in” to a website using Python's Requests module?"

The answer reads: "Firstly check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields."

How can I see, what the name attributes for username and password are for this morningstar.com page? https://www.morningstar.com/members/login.html

I have the following code:

import requests

url = 'http://www.morningstar.com/members/login.html'
url = 'http://beta.morningstar.com'

with open('morningstar.txt') as f:
    username, password = f.read().splitlines()

with requests.Session() as s:
    payload = login_data = {
        'username': username,
        'password': password,
        }
    p = s.post(url, data=login_data)
    print(p.text)

But - among other things - it prints:

This distribution is not configured to allow the HTTP request method that was used for this request. The distribution supports only cachable requests.

What should url and data be for the post?

There is another answer, which makes use of selenium, but is it possible to avoid that?

Zoe
  • 23,712
  • 16
  • 99
  • 132
tommy.carstensen
  • 7,083
  • 10
  • 55
  • 91
  • Possible duplicate of [Morningstar.com login using Python Requests](https://stackoverflow.com/questions/44377692/morningstar-com-login-using-python-requests) – tommy.carstensen Jan 12 '18 at 15:12
  • Yep, i don't know if it's possible with `requests`. It seems that a POST request is submited to `https://sso.morningstar.com/sso/json/msusers/authenticate` with the login data in `X-OpenAM` headers. – t.m.adam Jan 12 '18 at 15:14
  • 1
    But i don't think it's a duplicate because the post you linked uses `selenium` – t.m.adam Jan 12 '18 at 15:21

2 Answers2

3

This was kind of hard, i had to use an intercepting proxy, but here it is:

import requests

s = requests.session()
auth_url = 'https://sso.morningstar.com/sso/json/msusers/authenticate'
login_url = 'https://www.morningstar.com/api/v2/user/login'
username = 'username'
password = 'password'

headers = {
    'Access-Control-Request-Method': 'POST',
    'Access-Control-Request-Headers': 'content-type,x-openam-password,x-openam-username',
    'Origin': 'https://www.morningstar.com'
}
s.options(auth_url, headers=headers)

headers = {
    'Referer': 'https://www.morningstar.com/members/login.html',
    'Content-Type': 'application/json',
    'X-OpenAM-Username': username,
    'X-OpenAM-Password': password,
    'Origin': 'https://www.morningstar.com',
}
s.post(auth_url, headers=headers)

data = {"productCode":"DOT_COM","rememberMe":False}
r = s.post(login_url, json=data)

print(s.cookies)
print(r.json())

By now you should have an authenticated session. You should see a bunch of cookies in s.cookies and some basic info about your account in r.json().


The site changed the login mechanism (and probably their entire CMS), so the above code doesn't work any more. The new login process involves one POST and one PATCH request to /umapi/v1/sessions, then a GET request to /umapi/v1/users.

import requests

sessions_url = 'https://www.morningstar.com/umapi/v1/sessions'
users_url = 'https://www.morningstar.com/umapi/v1/users'

userName = 'my email'
password = 'my pwd'
data = {'userName':userName,'password':password}

with requests.session() as s:
    r = s.post(sessions_url, json=data)
    # The response should be 200 if creds are valid, 401 if not
    assert r.status_code == 200
    s.patch(sessions_url)
    r = s.get(users_url)
    #print(r.json()) # contains account details

The URLs and other required values, such as POST data, can be obtained from the developer console (Ctrl+Shift+I) of a web-browser, under the Network tab.

t.m.adam
  • 14,050
  • 3
  • 25
  • 46
  • 1
    It works! Afterwards one can do `r = s.get('some other morningstar url')`, while logged in. You didn't answer my question as stated, but you answered my real question. I feel so dumb, when I read great answers like these. Thank you so much! – tommy.carstensen Jan 12 '18 at 17:26
  • How could I have come up with this answer myself? – tommy.carstensen Jan 12 '18 at 17:34
  • 2
    Usually you'd have to monitor the network traffic in your browser (Inspect > Network), but in this case you'll need an intercepting proxy or wireshark or similar tools, because of many requests / redirects. In my code you'll notice three requests, 1 OPTIONS, 2 POST. This was really challenging for me too! – t.m.adam Jan 12 '18 at 17:49
  • It seems this guy is having the same problem: https://stackoverflow.com/questions/57103517/morningstar-with-python-requests-to-get-10-year-financial-data – tommy.carstensen Aug 22 '19 at 19:50
  • 1
    It seems they changed their CMS, so I had to update the code. Note the the new content is dynamic, so it may not be easy to scrape it with Requests. – t.m.adam Aug 22 '19 at 23:16
1

As seen the code, the username input field is:

<input id="uim-uEmail-input" name="uEmail" placeholder="E-mail Address" data-msat="formField-inputemailuEmail-login" type="email">

the password input field is:

<input id="uim-uPassword-input" name="uPassword" placeholder="Password" data-msat="formField-inputpassworduPassword-login" type="password">

The name is listed for both in each line after name=:

Username: "uEmail" Password: "uPassword"

Ajax1234
  • 58,711
  • 7
  • 46
  • 83
  • Ahhh, yes. I used `Inspect` in Google Chrome, and it showed up. I couldn't see it, when looking at the page source. It answers my question, but I still can't log in. Do you mind, if I wait a second, before I accept your answer? Thanks! – tommy.carstensen Jan 12 '18 at 15:05
  • @tommy.carstensen no problem! Glad to help! – Ajax1234 Jan 12 '18 at 15:05
  • Your answer was correct for my original question as stated, but the other answer https://stackoverflow.com/a/48231042/778533 answers my real question, and I will choose his answer as the correct one. But thank you for answering the original version of the question as worded, prior to me editing it. Thanks! – tommy.carstensen Jan 12 '18 at 17:33