5

I am trying to download an excel file, which I have the link to, but I am required to log in to the page before I can download the file. I have successfully passed the login page with rvest, rcurl and httr, but I am having an extremely difficult time downloading the file after I have logged in.

url <- "https://website.com/console/login.do"
download_url <- "https://website.com/file.xls"
session <- html_session(url)
form <- html_form(session)[[1]]

filled_form <- set_values(form,
                          userid = user,
                          password = pass)

## Save main page url
main_page <- submit_form(session, filled_form)

download.file(download_url, "./file.xls", method = "curl")

When I run the download.file command, the file pops up in my working directory, but it is not the file I am trying to download, and is actually just a corrupted .XLS file with no data.

For reference, if I log in to the website via chrome, and paste the download link into the browser window after I have logged in, the file automatically starts downloading. If I do the same in IE, the file download dialog box pops up and asks me if I want to save the file.

Possibly relevant info:

  • This is for my computer at work, where cookies are disabled, so I cannot use a cookie from my browser
  • I have tried using different methods with httr and rcurl based on numerous posts on SO to no avail

Thanks in advance for your time!

dmunslow
  • 139
  • 1
  • 6

1 Answers1

2

Someone on /r/rstats actually found the answer for this question. The solution for my problem was as follows:

#after login and submit_form do this:
download <- jump_to(main_page, download_url)

# write file to current working directory
writeBin(download$response$content, basename(download_url))

Link to original SO question

dmunslow
  • 139
  • 1
  • 6