I'm trying to scrape data, in the form of a table, and the hrefs from within that table from a town assessor website using the R package rvest. Despite having luck scraping tables from other websites (e.g. wikipedia), I'm unable to get anything from the town assessor.
I am using RStudio v1.1.442 and R v3.5.0.
sessioninfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rvest_0.3.3 xml2_1.2.0 V8_2.2
loaded via a namespace (and not attached):
[1] httr_1.4.0 compiler_3.5.0 selectr_0.4-1 magrittr_1.5 R6_2.4.0 tools_3.5.0 yaml_2.2.0
[8] curl_3.3 Rcpp_1.0.1 stringi_1.4.3 stringr_1.4.0 jsonlite_1.6
I have tried to follow a few examples. First, the wikipedia state population example, which works fine.
url <- "https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population"
population <- url %>%
read_html() %>%
html_nodes("#mw-content-text > div > table:nth-child(11)") %>%
html_table()
population <- population[[1]]
I've also been able to scrape data from yelp without issue. This, for example, gives me the names of the restaurants.
url <- "https://www.yelp.com/search?find_loc=New+York,+NY,+USA"
heading <- url %>%
read_html() %>%
html_nodes(".alternate__373c0__1uacp .link-size--inherit__373c0__2JXk5") %>%
html_text()
The website I'm having trouble with is like this one, which is the output of a search for properties on a specific street.
url <- "https://imo.ulstercountyny.gov/viewlist.aspx?sort=printkey&swis=all&streetname=Lake+Shore+Dr"
helpme <- url %>%
read_html() %>%
html_nodes("#tblList > tbody") %>%
html_table()
I would also like to be able to pull out the hrefs using something like this
helpme <- url %>%
read_html() %>%
html_nodes("#tblList > tbody") %>%
html_attr('href') %>%
html_text()
Unfortunately, my attempts to scrape the table and the href are empty.
Is there something strange about this website. I've used the chrome browser inspector and SelectorGadget to help find the right copy selectors. I've also tried it with the xpath. The result is the same either way.