0

I have a link to a PDF file that I would like to download. I tried the following:

import requests

class Scraper:

    def __init__(self):
        """Init the class"""

    @staticmethod
    def download(full_url):
        """Download full url pdf"""
        with requests.Session() as req:

            # Init
            r = req.get(full_url, allow_redirects=True)
            localname = 'test.pdf'

            # Download
            if r.status_code == 200: #and r.headers['Content-Type'] == "application/pdf;charset=UTF-8":
                with open(f"{localname}", 'wb') as f:
                    f.write(r.content)
            else:
                pass

However, after downloading, when I try to open it on my computer I receive the message:

"Could not open [FILENAME].pdf because it is either not a supported file type or because the file has been damaged (...)"

  • What is the reason for this? Is it because the first time you visit this page you get redirected and you need to select some preferences?
  • How can we resolve this?
JohnAndrews
  • 5,088
  • 9
  • 57
  • 121

1 Answers1

2

Actually you haven't passed the required parameters for starting the download, as if you have navigate to the url, you will see that you need to Click continue in order to start the download. what's happening in the bacground is GET request to the back-end with the following parameters ?switchLocale=y&siteEntryPassthrough=true to start the download.

You can view that under developer-tools within your browser and navigate to the Network-Tab section.

import requests


params = {
    'switchLocale': 'y',
    'siteEntryPassthrough': 'true'
}


def main(url, params):
    r = requests.get(url, params=params)
    with open("test.pdf", 'wb') as f:
        f.write(r.content)


main("https://www.blackrock.com/uk/individual/literature/annual-report/blackrock-index-selection-fund-en-gb-annual-report-2019.pdf", params)