2

Website: https://www.goodreads.com/book/show/27841061-nevernight Goal: To extract individual user ratings

When I inspect the user rating, I see this.

<span class="staticStars notranslate" title="did not like it">

If I can extract the title I can map the ratings.

rate_map = {'did not like it': 1,
'it was ok': 2,
'liked it': 3,
'really liked it': 4,
'it was amazing': 5}

url = 'https://www.goodreads.com/book/show/27841061-nevernight'
gr_list <- read_html(url)
gr_list %>%  html_node('.staticStars .notranslate') %>%  
  html_attr('title')

The result I am getting for the code is "NA".

Can anyone tell me what I am doing wrong? Thanks.

animus
  • 47
  • 6
  • Possible duplicate: https://stackoverflow.com/questions/45450981/rvest-scrape-2-classes-in-1-tag – MrFlick Aug 29 '19 at 20:04

1 Answers1

1

The css selector .staticStars .notranslate means you are looking for a node with a class notranslate nested in a node with a class staticStars. That is, it would match something like this

<span class="staticStars"><span class="notranslate">foo</span></span>

If you want to to match a node that has both classes, you need to make sure there is no space between the selectors. You can do

url <- 'https://www.goodreads.com/book/show/27841061-nevernight'
gr_list <- read_html(url)
gr_list %>%  html_nodes('.staticStars.notranslate') %>% 
  html_attr('title')

#  [1] NA                NA                "did not like it"
#  [4] "did not like it" "it was amazing"  "it was amazing" 
#  [7] "it was amazing"  "it was amazing"  "it was amazing" 
# [10] "did not like it" "it was amazing"  "really liked it"
# [13] "did not like it" "it was amazing"  "it was amazing" 
# [16] "it was amazing"  "did not like it" "it was amazing" 
# [19] "it was amazing"  "it was amazing"  "it was amazing" 
# [22] "it was amazing"  "it was amazing"  "it was amazing" 
# [25] "it was amazing"  "it was amazing"  "it was amazing" 
# [28] "it was amazing"  "it was amazing"  "liked it" 
MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • That was a mistake on my end. The output is still NA – animus Aug 29 '19 at 20:21
  • Well, the first node doesn't have a title. If you change `html_node` to `html_nodes` you'll get all the nodes and you'll see that most have a title. – MrFlick Aug 29 '19 at 20:23