Save pdf opened in browser using Selenium

Question

So I'm logging into a web-app owned by my company and running a request to generate a pdf, this is all being done in python using the Internet Explorer Driver. I can only use IE because the company system does not work with any other browser.

Once I submit the request, a new IE window pops up with the pdf file I requested. I would like to save the pdf file to my computer. I realize its not easy to work with downloads in IE but there has to be a way to do it. I am also okay with save it as a png or any other format but the pdf is long (spans 2-5 pages typically) so a print screen or screenshot will not work.

Any suggestions on what I can do?

Below is a simple snippet of the code:

driver.implicitly_wait(5)

driver.find_element_by_name("invNumSrchTxt_H").send_keys("ABCDE")  #sending the parameters I need
driver.find_element_by_name("invDt_B").clear()  # Clearing out some preset params
driver.find_element_by_name("invDt_A").clear()


 # This is where I click the button and this pops open a new IE window with my pdf file in it.
 s=driver.find_element_by_name("Print_Invoice")
 s.click()

score 0 · Answer 1 · answered Sep 07 '18 at 07:33

0

You can send directly a request using requests, since IE doesn't support settings configuration, and you should handle a popup.

A possible implementation could be:

import requests


def download_pdf_file(url, filename=None, download_dir=None):
    """
    Download pdf file in url,
    save it in download_dir as filename.
    """
    if download_dir is None: # set default download directory
        download_dir = r'C:\Users\{}\Downloads'.format(os.getlogin())

    if filename is None: # set default filename available
        index = 1
        while os.path.isfile(os.path.join(download_dir, f'pdf_{index}')):
            index += 1
        filename = os.path.join(download_dir, f'pdf_{index}')

    response = requests.get(url) # get pdf data
    with open(os.path.join(download_dir, filename), 'wb') as pdf_file:
        pdf_file.write(response.content) # save it in new file


driver.implicitly_wait(5)

driver.find_element_by_name("invNumSrchTxt_H").send_keys("ABCDE")  #sending the parameters I need
driver.find_element_by_name("invDt_B").clear()  # Clearing out some preset params
driver.find_element_by_name("invDt_A").clear()


# This is where I click the button and this pops open a new IE window with my pdf file in it.
s=driver.find_element_by_name("Print_Invoice")
s.click()

driver.download_pdf_file = download_pdf_file

driver.download_pdf_file(driver.current_url, # pdf url of the new tab
                  filename='myfile.pdf', # custom filename
                  download_dir='') # relative path to local directory

answered Sep 07 '18 at 07:33

Federico Rubbi

636
3
16

That won't really work, the pdf is served from our server and the popup window that it opens in doesn't seem to have a url associated with it. At least not one I could find. I used a modified version of this, its ugly but it works: https://stackoverflow.com/a/39070754/7734550 – doddy Sep 07 '18 at 14:42
You can do it even if the pdf file is on your server. You just have to get the pdf file url. Watch this tutorial: https://www.youtube.com/watch?v=0gNGFEZ3tDM. At 7:10 he explains how to analyze requests. Open _network_ like he does and click on the pdf. You will see something like: Url: | Method: GET | Status: 200 | Type: application/pdf. – Federico Rubbi Sep 07 '18 at 14:56
I would like to point out how much faster and more efficient using requests is. – Federico Rubbi Sep 07 '18 at 14:58
Thank you. I will look into this soon as I can access youtube, my current method definitely leaves a lot to be desired in terms of speed. – doddy Sep 07 '18 at 15:32
Can you give me the url? I could get the pdf url and implement it in the answer if you want. – Federico Rubbi Sep 07 '18 at 15:37
Not sure if I can do that, sensitive company data and all that. But I will give this a shot. If I can recover a url, I should be able to download. Will comment here with what I find. – doddy Sep 07 '18 at 15:43
So I tried this but that page has dev tools blocked. I can not access the F12 menu or inspect elements on the page. – doddy Sep 07 '18 at 15:55
Perhaps you can try to right click on the pdf file and check if you can copy the url – Federico Rubbi Sep 07 '18 at 15:57
Right click pulls up a document properties option, under the "advanced" tab, the base URL is empty as is basically everything else. Managed to get into dev tools for the calling page but it wont process pdf request unless I leave dev tools. – doddy Sep 07 '18 at 16:06
Mhh... did you try to call driver.current_url after you open the pdf file with _s.click()_? – Federico Rubbi Sep 07 '18 at 16:11
The pdf window just has a very general url. It basically looks like this: 'somewebsiteurl/downloadPDF.jsp' – doddy Sep 07 '18 at 17:18
Since the url is "somewebsiteurl/downloadPDF.jsp" we should send a request to this url, because it's basically Java code that sends pdf data. Could you try to do it and save the response in a pdf file? – Federico Rubbi Sep 07 '18 at 17:46
Tried exactly that. It basically throws an error saying you're running too many requests in parallel. Even though I'm not. I'm suspecting its a security mechanism they have put in place. – doddy Sep 07 '18 at 18:51
Quite strange... you can avoid every security mechanism changing your headers, because the default User-Agent is something like 'python-requests'. However you could definitely solve this issue using browsermob-proxy to log the request to the pdf file. Here is a valid implementation: https://stackoverflow.com/questions/38150738/python-browsermob-proxy-with-ie-captures-incorrect-har – Federico Rubbi Sep 07 '18 at 18:57

Save pdf opened in browser using Selenium

1 Answers1