I would like to find all classes used in the webpage below. Is this possible with rvest or will I need anyway some regex/grepl? I am able to scrape the info once I know the name of the class, but for pages with dynamically built class names it would be convenient to have an overview of the class es used.




language<- page%>%html_nodes(".C49FootnoteLangue")%>%html_text()
  • 435
  • 6
  • 18
  • Do you want to find all classes and save the name in some array? or just apply style to all found classes ? – Rolly Dec 31 '15 at 15:32
  • I would like to find all classes used and save them in a structured way (list, df), for further processing – Lod Dec 31 '15 at 15:52
  • 5
    `page %>% html_nodes("*") %>% html_attr("class") %>% unique()` ? – hadley Dec 31 '15 at 22:11
  • Does exactly what I was looking for. The possibility of using the css selector wildcard escaped me. Thanks (both for the answer and rvest). – Lod Jan 02 '16 at 14:47

1 Answers1


Converting @hadley's comment to a CW answer, you can get a vector of all the classes by using the * wildcard.

Thus, the approach would look like:

page <- read_html(doc_url)

page %>% 
  html_nodes("*") %>% 
  html_attr("class") %>% 
#  [1] NA                          "component"                 "waitBlock"
#  [4] "waitBlockContainer"        "toggle_img"                "btn_impression"
#  [7] "document_language"         "outputEcli"                "C19Centre"
# [10] "C71Indicateur"             "C02AlineaAltA"             "C72Alineadroite"
# [13] "C75Debutdesmotifs"         "C01PointnumeroteAltN"      "C04Titre1"
# [16] "C03Tiretlong"              "C05Titre2"                 "C06Titre3"
# [19] "C07Titre4"                 "C48DispositifIntroduction" "C08Dispositif"
# [22] "C77Signatures"             "C49FootnoteLangue"
  • 177,446
  • 27
  • 370
  • 450