1

I have a internal company html webpage with a div html tag having the following format:

<div id="B4_6_2019">
<div id="B3_6_2019">

I would like to extract all the id names so the end result would be B4_6_2019 B3_6_2019

How would I do that? (the id names are all dates)

QHarr
  • 72,711
  • 10
  • 44
  • 81
southwind
  • 446
  • 2
  • 12

2 Answers2

1

Try doing

library(dplyr)
library(rvest)

url %>%
  read_html() %>%
  html_nodes("div") %>%
  html_attr("id") %>%
  grep("^B\\d+_\\d+_\\d+", ., value = TRUE)
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
1

Try also attribute = value css selector with ends with operator to substring match on end of id value string

library(rvest)
page <- read_html("url")
id<- page %>% 
  html_nodes("[id$='_2019']") %>%
  html_attr(., "id")
QHarr
  • 72,711
  • 10
  • 44
  • 81