0

Here is my scenario.

I have a lot of links. I want to know if any of them redirect to a different site (maybe a particular one) and only get those redirect URLs.(I want to preserve them for further scraping).

I don't want to get contents of webpage. I only want to get the link it redirects to. If there are multiple redirects, I may want to get the urls until say the 3rd redirect (So, that I'm not in a redirect loop).

How do I achieve this? Can I do this in requests?

Requests seems to have a r.status, but it only works after fetching the page.

Kotlinboy
  • 2,815
  • 3
  • 13
  • 24
  • It looks from https://stackoverflow.com/questions/20475552/python-requests-library-redirect-new-url like you can get a history of the redirect links, at least, although this doesn't answer your question about *only* getting the links. – bouteillebleu Nov 24 '17 at 11:07

1 Answers1

0

You can use requests.head(url, allow_redirects=True) which will only get the headers. If the response has the Location header it will follow the redirect and head the next url.

import requests


response = requests.head('http://httpbin.org/redirect/3', allow_redirects=True)

for redirect in response.history:
    print(redirect.url)
print(response.url)

Output:

http://httpbin.org/redirect/3
http://httpbin.org/relative-redirect/2
http://httpbin.org/relative-redirect/1

http://httpbin.org/get
Farhan.K
  • 3,164
  • 1
  • 14
  • 23