I try to scrape lots of Pages on a Website using R-Selenium. The Code looks like this:
library(RSelenium)
library(rvest)
rD<-rsDriver(browser = 'firefox', port = 581L)
remDr<-rD$client
vec<-c('/de/shop/head-wc-rebels-irace-ski-set-1819-schwarz-00002001874564-p.html','/de/shop/alpina-jump-20-qvm-skihelm-rosegold-00002001878075-p.html','/de/shop/roxy-backyard-damen-snowboardhose-gelb-00002001878176-p.html','/de/shop/giro-envi-mips-damen-skihelm-lila-00002001883070-p.html')
for (i in vec) {
remDr$navigate(paste0('https://www.ochsnersport.ch',i))
Sys.sleep(10)
Produktinfo_html<-read_html(remDr$getPageSource()[[1]])}
When you run this, you will see, that the loop just doesn't go on when it comes to the third webpage;'https://www.ochsnersport.ch/de/shop/roxy-backyard-damen-snowboardhose-gelb-00002001878176-p.html' -> it is kinda messed up.
This is a MRP, I want to scrape far more sites than them. So also messed up Pages like the one above will probably occur more often.
So when remDr$getPageSource() tries to read the mentioned webpage, it takes ages to read the html and it does it with an error, what ends up in breaking the loop. So first, I already tried implementing a WithTimeout (R.Utils) to stop this command after it runs for some seconds. But this also gives me an error message and breaks the loop. So I really don't know, how to continue?
Thanks for any help!