0

I wrote code that scraps every day of the year and saves it in a separate .xlms file for each day.

start <- as.Date("25-01-19",format="%d-%m-%y")
end   <- as.Date("17-12-19",format="%d-%m-%y")

theDate <- start

while (theDate <= end)
{
  url <- (paste0("http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=",format(theDate,"%d/%m/%y"),"&f=0"))

  site <- read_html(url)

  Info_Ajuste_HTML <- html_nodes(site,'table')

  Info_ajuste <- html_text(Info_Ajuste_HTML)

  head(Info_ajuste,20)

  t <- head(Info_Ajuste_HTML)

  lista_tabela <- site %>%
      html_nodes("table") %>%
      html_table(fill = TRUE) 

  str(lista_tabela)

      head(lista_tabela[[1]], 10)

       if (t =="character(0)") {
         theDate <- theDate + 1
       } else {
           ... code ...  

The url accessed is dynamic and changes for each day. The problem is in the days when the site goes offline, generates the error "character (0)" when executing the command: >head (Info_ajuste, 20), and the error: "{xml_nodeset (0)}" when executing >head (Info_Ajust_HTML).

This is because it downloads a table and on some days the site does not make that table available.

I needed to create an "if" to skip the days that give this error.

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • 1
    Take a look at the `tryCatch()` functionality [here](https://stackoverflow.com/questions/8093914/use-trycatch-skip-to-next-value-of-loop-upon-error) You can wrap your `read_html` in this and if an error occurs because there is no page for a given date, the loop can skip to the next date. – user2474226 Dec 17 '19 at 21:47

1 Answers1

0

You can check the length of Info_Ajuste_HTML and execute remaining code only if some value is captured in it.

library(rvest)

while (theDate <= end)
{
  url <- (paste0("http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=",format(theDate,"%d/%m/%y"),"&f=0"))
  site <- read_html(url)
  Info_Ajuste_HTML <- html_nodes(site,'table')
  if (length(Info_Ajuste_HTML) > 0) { ### <- Added a check here
      Info_ajuste <- html_text(Info_Ajuste_HTML)
      head(Info_ajuste,20)
      t <- head(Info_Ajuste_HTML)
      ##rest of the code
      ##rest of the code
   }
}
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143