10

the html_nodes() function fails as follows when run as executable RScript, but succeeds when run interactively. Does anybody know what could be different in the runs?

The interactive run was run with a fresh session, and the source statement was the first one run.

$ ./test-pdp.R
>
> ################################################################################
> # Setup
> ################################################################################
> suppressPackageStartupMessages(library(plyr))
> suppressPackageStartupMessages(library(dplyr))
> suppressPackageStartupMessages(library(stringr))
> suppressPackageStartupMessages(library(rvest))
> suppressPackageStartupMessages(library(httr))
>
>
> read_html("http://google.com") %>%
+     html_nodes("div") %>%
+     length()
Error in as.vector(x, "list") :
  cannot coerce type 'environment' to vector of type 'list'
Calls: %>% ... <Anonymous> -> lapply -> as.list -> as.list.default
Execution halted

Yet it succeeds when run as source() interactively:

> source("/Users/a6001389/Documents/projects/hottest-deals-page-scrape/src/test-pdp.R", echo=TRUE)
> #!/usr/bin/RScript
> options(echo=TRUE)
> ################################################################################
> # Setup
> ####################################################### .... [TRUNCATED] 
> suppressPackageStartupMessages(library(dplyr))
> suppressPackageStartupMessages(library(stringr))
> suppressPackageStartupMessages(library(rvest))
> suppressPackageStartupMessages(library(httr))
> read_html("http://google.com") %>%
+     html_nodes("div") %>%
+     length()
[1] 17

Thank you, Matt

mpettis
  • 2,468
  • 4
  • 19
  • 29
  • I haven't used rvest, but have experienced similar problems many times with `RSelenium`. It'll probably break the piping, but you may want to explore with `Sys.sleep(5)`. Ocassionally I've had to go to `Sys.sleep(15)` and even 20 to allow the page to load. – PavoDive Feb 12 '16 at 00:13
  • 4
    Try adding `library(methods)` to the start of you script – hadley Feb 12 '16 at 02:56
  • 1
    @hadley : adding `library(methods)` worked. I'd accept it if it were a solution. And thank you. – mpettis Feb 12 '16 at 02:59
  • @PavoDive : Just saw hadley's solution, and that worked, so I didn't try yours. Thanks for responding though. – mpettis Feb 12 '16 at 02:59

2 Answers2

6

Adding the line:

library(methods)

Per the comment to the original question by Hadley Wickham did solve this error. Why it solved the error, I do not know. But I am posting an answer so there is an easily referenced solution here. If why this solves the problem is posted, I will accept that answer.

Adding comment from below from @mekki-macaulay into text here because it really adds some clarity:

This thread might shed some light on it. It seems that in some contexts RSCRIPT doesn't load package::methods by default, whereas interactive sessions do load it by default. It seems that the "when" is not clear, but explicitly calling library(methods) for all RSCRIPT executions seems to be the safe bet: can use package interactively, but Rscript gives errors

Community
  • 1
  • 1
mpettis
  • 2,468
  • 4
  • 19
  • 29
  • 1
    This thread might shed some light on it. It seems that in some contexts `RSCRIPT` doesn't load `package::methods` by default, whereas interactive sessions do load it by default. It seems that the "when" is not clear, but explicitly calling `library(methods)` for all `RSCRIPT` executions seems to be the safe bet: http://stackoverflow.com/questions/19780515/can-use-package-interactively-but-rscript-gives-errors – Mekki MacAulay Feb 12 '16 at 15:56
  • `Rscript` (not `RSCRIPT` or `RScript`) _never_ loads the methods package by default. – hadley Feb 13 '16 at 21:08
  • I stand corrected on the proper use of caps for `Rscript`. My comment about the "when" was based on @KenWilliams' post in the thread I linked. His experience seems to suggest that it does load it by default sometimes. Has that changed in newer R versions? – Mekki MacAulay Feb 13 '16 at 21:46
-1

It's likely a side effect of how the magrittr::%>% operator works. From Magrittr Documentation - Page 8: %>% Pipe:

The magrittr pipe operators use non-standard evaluation. They capture their inputs and examines them to figure out how to proceed. First a function is produced from all of the individual right-hand side expressions, and then the result is obtained by applying this function to the left-hand side. For most purposes, one can disregard the subtle aspects of magrittr's evaluation, but some functions may capture their calling environment, and thus using the operators will not be exactly equivalent to the "standard call" without pipe-operators (Emphasis mine).

As such, try it without %>% to see if it's because html_nodes is incorrectly capturing the environment from the command line (as your error message suggests), whereas in the interactive session, it can grab the session's environment varaibles:

google_node <- read_html("http://google.com");
div_nodes   <- html_nodes(google_node, "div");
length(div_nodes);

Does that work when called as an executable RScript?

Mekki MacAulay
  • 1,687
  • 2
  • 9
  • 23