0

If we visit this url in chrome, with devtools open, we can clearly see a cookie appear (in chrome developer tools -> 'application' -> 'cookies').

If we attempt the same thing using httr::GET(), we expect to see the cookie, but we do not:

library(httr)

r <- GET("https://aps.dac.gov.in/LUS/Public/Reports.aspx")
r$cookies
# [1] domain     flag       path       secure     expiration name       value     
# <0 rows> (or 0-length row.names)

Why is this, and how can we retrieve the cookie (along with the page html) preferably using either httr and/or rvest (plus other suggestions but without using an actual browser, headless or otherwise, including selenium)

stevec
  • 15,490
  • 6
  • 67
  • 110

1 Answers1

0

The reason this is happening is because the cookie doesn't actually get generated until the user submits the form (by opening chrome developer tools and watching 'application' -> 'cookies' before and after form submission, we see the cookie appear.

Note this can be emulated using chrome incognito (it won't have access to the cookies in regular chrome, so it can be tried repeatedly for demonstration purposes).

stevec
  • 15,490
  • 6
  • 67
  • 110
  • 1
    Did anyboady find a way to get the cookie and scrape the page? I tried `httr::set_cookies()` but I do not understand who to use it ... – Dominik Vogel Oct 17 '20 at 08:44
  • @DominikVogel I couldn't even get the cookie in the first place for some reason. You could ask a new question with your example and link to this page – stevec Oct 17 '20 at 08:45
  • 1
    Already done: https://stackoverflow.com/questions/64391812/scrape-site-that-asks-for-cookies-consent-with-rvest – Dominik Vogel Oct 17 '20 at 08:51