0

I want to see the last URL of the website with Python. I'm mostly using requests and urllib2, but everything is welcome.

The website I'm trying isn't giving Response 302. It directly redirects using HTML or maybe PHP.

I used requests module for this, but it seems like it doesn't count HTML PHP redirects as "Redirect".

My current code:

def get_real(domain):
    red_domain = requests.get(domain, allow_redirects=True).url
    return red_domain

print(get_real("some_url"))

If there is a way to achieve this, how? Thanks in advance!


Posts I checked:


EDIT: URL I'm trying: http://001.az. It's using HTML to redirect.

HTML Code inside it:

<HTML> <HEAD><META HTTP-EQUIV=Refresh CONTENT="0; url=http://fm.vc"></HEAD> </HTML>
ahmedg
  • 109
  • 1
  • 1
  • 9
  • Can you share the URL? – Andrej Kesely Jul 28 '20 at 22:35
  • Sure, I forgot to share it. Now, I'm editing question. – ahmedg Jul 29 '20 at 06:48
  • Is there a way to achieve that? Yes, of course. Is there a simple way? Unfortunately no. You will have to mimic what a browser would do, meaning parse the page searching the HEADER bloc for a META tag having attribute HTTP-EQUIV with value Refresh. BeautifulSoup could help here, else you will have to use a stock xml/html parser and pray for the HTML to be correct. Please say if BeautifulSoup is an option here. – Serge Ballesta Jul 29 '20 at 07:14
  • Thanks @SergeBallesta! Yes, we can use `BeautifulSoup` and/or `xml/html parser`. – ahmedg Jul 29 '20 at 07:19

1 Answers1

0

BeautifulSoup can help in detecting HTML Meta redirections:

from bs4 import BeautifulSoup

# use request to extract the HTML text
...
soup = BeautifulSoup(html_text.lower(), "html5lib")  # lower because we only want redirections

try:
    content = soup.heap.find('meta', {'http-equiv': 'refresh'}).attrs['content']
    ix = content.index('url=')
    url = content[ix+4:]
    # ok, we have to redirect to url
except AttributeError, KeyError, ValueError:
    url = None

# if url is not None, loop going there...
Serge Ballesta
  • 121,548
  • 10
  • 94
  • 199