Is it posssible to follow only redirect status codes and get redirect links instead of downloading webpage page in requests or other Python library?

Question

Here is my scenario.

I have a lot of links. I want to know if any of them redirect to a different site (maybe a particular one) and only get those redirect URLs.(I want to preserve them for further scraping).

I don't want to get contents of webpage. I only want to get the link it redirects to. If there are multiple redirects, I may want to get the urls until say the 3rd redirect (So, that I'm not in a redirect loop).

How do I achieve this? Can I do this in requests?

Requests seems to have a r.status, but it only works after fetching the page.

It looks from https://stackoverflow.com/questions/20475552/python-requests-library-redirect-new-url like you can get a history of the redirect links, at least, although this doesn't answer your question about *only* getting the links. — bouteillebleu, Nov 24 '17 at 11:07

score 0 · Accepted Answer · answered Nov 24 '17 at 11:32

You can use requests.head(url, allow_redirects=True) which will only get the headers. If the response has the Location header it will follow the redirect and head the next url.

import requests


response = requests.head('http://httpbin.org/redirect/3', allow_redirects=True)

for redirect in response.history:
    print(redirect.url)
print(response.url)

Output:

http://httpbin.org/redirect/3
http://httpbin.org/relative-redirect/2
http://httpbin.org/relative-redirect/1

http://httpbin.org/get

Is it posssible to follow only redirect status codes and get redirect links instead of downloading webpage page in requests or other Python library?

1 Answers1