I am trying to scrape news articles like this one: https://www.lefigaro.fr/vox/societe/luc-ferry-une-convention-climat-pluraliste-20191016 . I have a paid subscription and I am trying to log in to the page with my own credentials to scrape the full content of this article. However, even though I manage to fill in the generic login form, I only retrieve the free paragraphs via R. The content I see when I login via Chrome does show the full text.
library(rvest)
#Address of the generic login webpage
login<-"https://connect.lefigaro.fr/login?client=horizon_web&redirect_uri=https://www.lefigaro.fr/"
#create a web session with my credentials
pgsession<-html_session(login)
pgform<-html_form(pgsession)[[1]]
filled_form<-set_values(pgform, email="*******", password="*******")
submit_form(pgsession, filled_form) #this seems to work
url <- "https://www.lefigaro.fr/vox/societe/luc-ferry-une-convention-climat-pluraliste-20191016"
p <- read_html(url)
title <- p %>% html_nodes(".fig-headline--premium") %>% html_text(trim = TRUE) #title
time <- p %>% html_nodes("time") %>% html_text(trim = TRUE) #date
time <- time[[1]]
body <- toString(p %>% html_nodes(".fig-paragraph")%>% html_text(trim = TRUE))
body #i do not get the full text which I do see on my browser as a subscriber