You can combine RSelenium
and rvest
. This is a code snippet to get the links on the first page.
1) Start Selenium. The best tutorial is found here on StackOverflow: can't execute rsDriver (connection refused).
In short, install Docker and the headless browser, start docker in terminal with docker run -d -p 4445:4444 selenium/standalone-chrome
2)
Then go in RStudio and use these lines to start RSelenium
, get on the page, click the Search button and harvest the links:
library(RSelenium)
library(rvest)
library(tidyverse)
remDr <- remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
remDr$navigate("http://rera.rajasthan.gov.in/ProjectSearch")
# find and click
search <- remDr$findElement(using = "id", value = "btn_SearchProjectSubmit") # get search button
search$sendKeysToElement(list("\uE007")) # click search button
# get the html code, probably not neccessary, but I prefer it this way
html <- remDr$getPageSource() %>% .[[1]] %>% read_html()
html <- as.character(html)
# get the links
links <- html %>% read_html() %>% html_nodes("#OuterProjectGrid td a") %>% html_attr("href")
Then you should implement the pagination, e.g. with map
from purrr
.