0

I try to scrape lots of Pages on a Website using R-Selenium. The Code looks like this:

library(RSelenium)
library(rvest)

rD<-rsDriver(browser = 'firefox', port = 581L)
remDr<-rD$client

vec<-c('/de/shop/head-wc-rebels-irace-ski-set-1819-schwarz-00002001874564-p.html','/de/shop/alpina-jump-20-qvm-skihelm-rosegold-00002001878075-p.html','/de/shop/roxy-backyard-damen-snowboardhose-gelb-00002001878176-p.html','/de/shop/giro-envi-mips-damen-skihelm-lila-00002001883070-p.html')

for (i in vec) {
  remDr$navigate(paste0('https://www.ochsnersport.ch',i))
  
  Sys.sleep(10)
  
  Produktinfo_html<-read_html(remDr$getPageSource()[[1]])}

When you run this, you will see, that the loop just doesn't go on when it comes to the third webpage;'https://www.ochsnersport.ch/de/shop/roxy-backyard-damen-snowboardhose-gelb-00002001878176-p.html' -> it is kinda messed up.

This is a MRP, I want to scrape far more sites than them. So also messed up Pages like the one above will probably occur more often.

So when remDr$getPageSource() tries to read the mentioned webpage, it takes ages to read the html and it does it with an error, what ends up in breaking the loop. So first, I already tried implementing a WithTimeout (R.Utils) to stop this command after it runs for some seconds. But this also gives me an error message and breaks the loop. So I really don't know, how to continue?

Thanks for any help!

0 Answers0