1

I'm writing an AliExpress web scraper using Python and the Requests module along with BeautifulSoup and I got it working well, however I've run into a problem - I get redirected to a login page randomly. My solution to this is to simply log in at the start of my session before scraping, but I don't know how to log in.

The login page (https://login.aliexpress.com) requires only the username and password, but when I try to enter them with my code and test to see if I'm logged in by going to https://home.aliexpress.com/index.htm and looking at the html, it fails as it redirects me back to the login page.

My code after trying multiple solutions to no avail:

import requests

LOGIN_URL = "https://login.aliexpress.com/"
LOGIN_INFO = {
    "loginId": "myemail@email.com",
    "password": "mypassword"
}


with requests.Session() as sess:
    #go to login page
    sess.get(LOGIN_URL)

    #attempt to log in with my login info
    sess.post(LOGIN_URL, data=LOGIN_INFO)

    #go to 'My AliExpress' page to verify successful login
    success = sess.get("https://home.aliexpress.com/index.htm")

    #manually check html to see if I was sent to the login page again
    print(success.text)

This is pretty much what's left after my many failed attempts. Some of the things I've tried are:

  1. Looking at the cookie after the 'sess.get(LOGIN_URL)', it returns this but I don't know what to do with it (in key:value format):

    • ali_apache_tracktmp :
    • ali_apache_track :
    • xman_f : t52Eyo+p3qf6E6fdmL5yJ81g2icRn+2PYjjrWYHlqlDyXAixo92Z5KHMZV8SCV7vP4ZjxEmuTQesVWkqxUi3SpFU1qbRyNRd+d0pIIKVhrIDri2oaWrt6A==
    • JSESSIONID : 30678741D7473C80BEB85825718FB1C6
    • acs_usuc_t : acs_rt=343aef98b0ca4ae79497e31b11c82c29&x_csrf=1b5g78e7fz2rt
    • xman_us_f : x_l=0
    • ali_apache_id : 23.76.146.14.1510893827939.187695.4
    • xman_t : PSIYMbKN2UyuejZBfmP9o5hdmQGoSB0UL0785LnRBxW0bdbdMmtW2A47hHbgTgD7TmFp7QVsOW4kXTsXMncy+iKisKfqagqb4yPxOVFdw+k=
  2. Tried looking for a csrf token and only found the text after '_csrf=' in the 5th bullet above. Tried using it and it didn't work.

  3. Looked at the html form sent when you log in but I don't know html and can only recognize it has a lot more fields than the ones I've seen other people use for other websites (Image of Form Data from Chrome here).

  4. Changing the "myPassword" in my code to the text in the password2 field in image above and changing the "password" key to "password2" too.

  5. Googled for a few hours but didn't find anything that would work.

At this point, I'm at my wits end, so any help on how to proceed would be very much appreciated. I'm not the best coder (still learning), don't know html outside of what I've learned from a few tutorials about scraping, and was hoping to figure it out myself, but hours later I still haven't solved it and realized I could really use the help.

I'm using python 3.5. If there's any more info needed, let me know. Brain is just about turned completely to mush after being stuck and awake for so long.

hpati1117
  • 11
  • 1

1 Answers1

0

I have a suspicion this will not work the way you want it to. Even after somehow accomplishing the login prompt, the following page presents a "slider verification" which to my knowledge requests is unable to do anything about. (If there is a method please let me know).

I have been trying to use cookies instead:

session = requests.Session()
cj = requests.cookies.RequestsCookieJar()
cj.set('KEY', 'VALUE')
session.cookies = cj
response = session.get(url, timeout=5, headers=headers, proxies=proxies)

Previously the scraper worked using headers and proxies for a time, but recently it always prompts a login. I have tried all the keys and values in the cookies as well to no avail.

An idea would be to use selenium to login and capture cookies, then pass it to requests session.

AntoG has a solution to do this: https://stackoverflow.com/a/42114843

Gio
  • 3,679
  • 2
  • 22
  • 40