0

I am new to R and am currently working on an assignment dealing with web scraping.

I am supposed to read in all the sentences from this web page: https://www.cs.columbia.edu/~hgs/audio/harvard.html

This is my current code:

library(xml2)
library(rvest)
url <- 'https://www.cs.columbia.edu/~hgs/audio/harvard.html'
read_html(url)
sentences <- url %>%
  html_nodes("li") %>%
  html_text()

And everytime I run it, I get this error:

Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "character"

Can you please help me? I don't understand what I'm doing wrong.

סטנלי גרונן
  • 2,740
  • 21
  • 43
  • 62
help
  • 41
  • 8

1 Answers1

1

You forgot to assign a variable (I imagine it was intended to be the same url) to read_html(url). So url %>% html_nodes("li") is reading a "string" instead of a "xml_document", which is what the error is telling you (internally, rvest::html_nodes calls the function xml2::xml_find_all).

You could do this:

html <- read_html(url)

sentences <- html%>%
  html_nodes("li") %>%
  html_text()

Or this, if you are reading url only once

sentences <- read_html(url) %>%
  html_nodes("li") %>%
  html_text()
Gabriel Silva
  • 498
  • 3
  • 9