I'm writing an AliExpress web scraper using Python and the Requests module along with BeautifulSoup and I got it working well, however I've run into a problem - I get redirected to a login page randomly. My solution to this is to simply log in at the start of my session before scraping, but I don't know how to log in.
The login page (https://login.aliexpress.com) requires only the username and password, but when I try to enter them with my code and test to see if I'm logged in by going to https://home.aliexpress.com/index.htm and looking at the html, it fails as it redirects me back to the login page.
My code after trying multiple solutions to no avail:
import requests
LOGIN_URL = "https://login.aliexpress.com/"
LOGIN_INFO = {
"loginId": "myemail@email.com",
"password": "mypassword"
}
with requests.Session() as sess:
#go to login page
sess.get(LOGIN_URL)
#attempt to log in with my login info
sess.post(LOGIN_URL, data=LOGIN_INFO)
#go to 'My AliExpress' page to verify successful login
success = sess.get("https://home.aliexpress.com/index.htm")
#manually check html to see if I was sent to the login page again
print(success.text)
This is pretty much what's left after my many failed attempts. Some of the things I've tried are:
Looking at the cookie after the 'sess.get(LOGIN_URL)', it returns this but I don't know what to do with it (in key:value format):
- ali_apache_tracktmp :
- ali_apache_track :
- xman_f : t52Eyo+p3qf6E6fdmL5yJ81g2icRn+2PYjjrWYHlqlDyXAixo92Z5KHMZV8SCV7vP4ZjxEmuTQesVWkqxUi3SpFU1qbRyNRd+d0pIIKVhrIDri2oaWrt6A==
- JSESSIONID : 30678741D7473C80BEB85825718FB1C6
- acs_usuc_t : acs_rt=343aef98b0ca4ae79497e31b11c82c29&x_csrf=1b5g78e7fz2rt
- xman_us_f : x_l=0
- ali_apache_id : 23.76.146.14.1510893827939.187695.4
- xman_t : PSIYMbKN2UyuejZBfmP9o5hdmQGoSB0UL0785LnRBxW0bdbdMmtW2A47hHbgTgD7TmFp7QVsOW4kXTsXMncy+iKisKfqagqb4yPxOVFdw+k=
Tried looking for a csrf token and only found the text after '_csrf=' in the 5th bullet above. Tried using it and it didn't work.
Looked at the html form sent when you log in but I don't know html and can only recognize it has a lot more fields than the ones I've seen other people use for other websites (Image of Form Data from Chrome here).
Changing the "myPassword" in my code to the text in the password2 field in image above and changing the "password" key to "password2" too.
Googled for a few hours but didn't find anything that would work.
At this point, I'm at my wits end, so any help on how to proceed would be very much appreciated. I'm not the best coder (still learning), don't know html outside of what I've learned from a few tutorials about scraping, and was hoping to figure it out myself, but hours later I still haven't solved it and realized I could really use the help.
I'm using python 3.5. If there's any more info needed, let me know. Brain is just about turned completely to mush after being stuck and awake for so long.