2

I try to execute this javascript command:

document.querySelectorAll('div.Dashboard-section div.pure-u-1-1 span.ng-scope')[0].innerText

in r using the rvest package using the following code:

library(rvest)

url <- read_html("")

url %>%
  html_nodes("div.Dashboard-section div.pure-u-1-1 span.ng-scope") %>%
  html_text()

but I take as result this:

character(0)

and I was expect this:

"Displaying results 1-25 of 10,897"

what can I do?

Jason_K
  • 43
  • 1
  • 5
  • It's a dynamic webpage. You are reading html before the data is loaded. Check this answer- http://stackoverflow.com/a/29965233/3927604 – kanatti Aug 31 '16 at 13:52
  • You're in violation of their (albeit rather dumb) ToS where you are not permitted to do the following: _"Use robots or intelligent agents to access, search and/or systematically download any portion of IEEE Xplore."_ – hrbrmstr Aug 31 '16 at 14:04

1 Answers1

1

In a nutshell, the rvest package can fetch HTML, but it cannot execute Javascript. The page you tried to fetch loads data via AJAX, javascript.

For a workaround you could use RSelenium package, as user neoFox suggested. Selenium Webdriver would start Firefox or Chrome for you, navigate to the page, wait until it is loaded. and get the data-fragment from the HTML DOM.

Or use the much smaller phantomjs headless browser which would download the HTML page to an html file, without popping up a browser GUI. Read in and parse the downloaded HTML file with R.

Both need some serious configuration. Selenium is java based. Phantomjs requires to read at least its documentation.

You could also inspect the page, find out the POST-request the site is making, and send this POST yourself. Then fetch the JSON it is returning and count the result items yourself.

knb
  • 8,442
  • 4
  • 54
  • 75