1

I want to open a link from a website using Python, so here is the flow:

  1. I open the main URL (e.g. www.url1.com)

  2. I scrape the page and find the button, it has a redirection link (www.url2.com)

  3. when I use this link in browser it redirects to (www.url3.com) then immediately goes to another (Required link) (www.url4.com)

  4. When I try the same flow using Python requests, it only goes to (www.url3.com)

  5. I tried using the allow_redirects argument without any success

Here is my code:

import requests

headers = {
    'User-Agent': '',
    'authority': '',
    'scheme': '',
    'accept': '',
    'x-requested-with': '',
    'cookie': '',
    'referer': 
    }


def download(req):      
    resp = requests.get(req, headers=headers, allow_redirects=True)
    print(resp.text)

I also tried to print history using this answer.

but it keeps redirecting me too (url3)

Nazim Kerimbekov
  • 3,965
  • 6
  • 23
  • 48
Aya
  • 120
  • 1
  • 2
  • 8
  • If url3 redirects the browser using a `refresh meta tag`, `requests` will not follow it even with `allow_redirects` enabeled as it does not parse the html. [How to follow meta refresh in python](https://stackoverflow.com/questions/2318446/how-to-follow-meta-refreshes-in-python) – wuerfelfreak Aug 11 '19 at 08:44

1 Answers1

1

It's quite difficult to give a full answer without having the actual URLs you are using. That being said I think the problem is due to the fact that you are not saving/keeping track of the cookies, for that I would recommend you using requests.session() when sending requests as it keeps track of the cookies for you.

All in all, I would recommend trying the following code:

import requests

session = requests.session()

headers = {
    'User-Agent': '',
    'authority': '',
    'scheme': '',
    'accept': '',
    'x-requested-with': '',
    'cookie': '',
    'referer': 
    }


def download(req):
    global session
 
    resp = session.get(req, headers=headers, allow_redirects=True)
    print(resp.text)

(PS: if you are scrapping a website I would highly recommend you use a User-Agent in the headers instead of leaving it blank)

Hope this helps

Community
  • 1
  • 1
Nazim Kerimbekov
  • 3,965
  • 6
  • 23
  • 48
  • The referer might also change the behavior of url3 – wuerfelfreak Aug 11 '19 at 08:38
  • @wuerfelfreak indeed! It’s difficult to say without the actual urls/website. – Nazim Kerimbekov Aug 11 '19 at 09:03
  • @Fozoro thank you I tried sessions but no luck yet, All the headers here are filled. if you want the actual urls: url3(which opens on button click) : https://www.egy.best/api?call=NaaapmDdapaqapVcaVUazFTqazzVVVazpVYpqpapjYapaqapVUwqjmVqVVKcVapqpapxgIapaqapmVDvEumzxghwBapqpapwDYpaqapeDYDwDukhqzapqpapmDwapaqapVccqVxpVupqpapKeLUmDxKupaqVcaVUVaqTaVzxaVVaU&auth=7445579426cd5bd4c63e7ae0b8a464f1 But in browser it redirect to another one, while in python request, the response in the html of url3 and no further redirections – Aya Aug 16 '19 at 09:38