1

I've been trying to use to the Python Requests module to get the source code for a particular page, and then scrape it for a particular link on it.

Now, I get the source code just fine, however when I try to scrape for the link, it seemed to print the wrong thing. I checked (by printing the text form of the page that I got), and guess what, during the downloading process, the link seems to have changed. Originally, on the site, the link was of the form something.something/5642001/8bc1fa, but now in the page that I got using Requests, it has morphed into something.something/5642001/128a67. Similar, yes, but I cannot figure why this change is occurring.

This is definitely not a case of dynamic js (I think), since when I view page source in the browser itself, the link is just fine: the trouble occurs only when Requests (or urllib for that matter, I tried that too) fetches that page for me.

I would really appreciate some help with working around this.

Image one: The original affected portion of the link, as seen in the page source in my browser. This is correct.

Image two: How the same thing looks like when Requests fetches the whole page for me. Seriously confused.

The original affected portion of the link, as seen in the page source in my browser. This is correct.

How the same thing looks like when Requests fetches the whole page for me. Seriously confused.

DorianGray
  • 108
  • 2
  • 9
  • Maybe the link is redirecting? See [this post](https://stackoverflow.com/questions/20475552/python-requests-library-redirect-new-url/20475639) which shows you how to get the history of links redirected when making a request. – Jay Mody Sep 09 '20 at 17:15
  • It does seem to be directing. I'll try out what answers over there are suggesting and get back to you. Thanks! – DorianGray Sep 09 '20 at 17:25
  • I would try `requests.get(url, allow_redirects=False)` and see if that works for you. – Jay Mody Sep 09 '20 at 17:27
  • Sadly, it is still giving me the weirdly morphed link. I don't think the redirect is the problem, since its a download link and the redirect might be related to that. It looks like when requests gets the page on which this link is present, the link morphs. The link is perfectly fine in the page source, but morphed when I have a look at what requests gets me What might be causing that? – DorianGray Sep 09 '20 at 17:34
  • 1
    Not entirely sure what's happening in this case. If you post the actual link it might be easier to figure out what's happening. My main suspicion is that the website you are trying to scrape doesn't want to be scraped so they've programmed it to change the links if it knows it's being requested by an automated service. – Jay Mody Sep 09 '20 at 17:51

0 Answers0