0

I am trying to scrape basic data from Auto Trader and I can't get it to work. The outcome always depend on luck. I don't understand the error message because I didn't use summarise at all. Even sometimes it works, it only scrapes a portion of the data I wanted.

Error in UseMethod("summarise_") : no applicable method for 'summarise_' applied to an object of class "factor"


library(rvest)
library(tidyverse)


df_base<-data.frame(title = character(), price = character(), mileage = character())
url1<-'https://www.autotrader.ca/cars/ab/?rcp=100&rcs='
url2 <- '&srt=3&pRng=1%2C&oRng=1000%2C&prx=-2&prv=Alberta&loc=alberta&hprc=True&wcp=True&sts=Used&showcpo=1&inMarket=advancedSearch'

scrape.sleep <-  function(call.period=c(0.5,1)) {
  delay <- runif(1,call.period[1],call.period[2])
  cat(paste0(" delay of ", round(delay,2)," seconds\n"))
  Sys.sleep(delay)
}


for (i in 1:50){
  scrape.sleep(c(0.2, 0.5))
  url_string<-paste(url1,(100*i),url2,sep='')
  tpage<-read_html(url_string)
  x <- length(html_nodes(tpage,'.result-title span') %>% html_text())
  y <- length(html_nodes(tpage,'.price-delta .price-amount')%>% html_text())
  z <- length(html_nodes(tpage,'.dealer-badges .kms')%>% html_text())
  
  if( x == y & x == z) {
    df1 <- data.frame(title= html_nodes(tpage,'.result-title span') %>% html_text(),
                    price=html_nodes(tpage, '.price-delta .price-amount') %>% html_text(),
                    mileage = html_nodes(tpage, '.dealer-badges .kms') %>% html_text())
     
 df_base <- rbind(df_base,df1)
 }
}
barny
  • 5,280
  • 4
  • 16
  • 21
  • I want to thank you so much for your answer I am really struggling. However, It still doesn't work for me. I want to scrape 100 pages of 100 data per page of data. But every time i tried I only got about 2000. How many rows of data did you get? I'm wondering if I got blocked? – Shannon Tse Dec 09 '19 at 07:17
  • I run the whole script and didn't get any error messages. I got a dataframe with 3 columns and 5488 rows that looks fine. `summarise_` comes from the dplyr package. I guess it is used in one of the rvest functions. I'd suggest to update to the most recent rvest and dplyr versions. – Roccer Dec 09 '19 at 08:00

0 Answers0