5

Trying to convert a python project (that uses selenium to scrape twitter tweets without using the limited twitter api) into R programming. Works fine in Python but I want to recreate it in R. New to R but i have some MatLab experience if it helps

install.packages("RSelenium") # install RSelenium 1.7.1

As far as I'm aware the package has been updated. So instead of startserver() i need to user other functions. But based on all the research I get slightly conflicting answers that all don't work:

require(RSelenium) #used require() and library()
remDr <- remoteDriver(browserName = "chrome")
remDr$open()

I get error:

[1] "Connecting to remote server"
Error in checkError(res) : 
  Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused

also tried:

require(RSelenium)
remDr <- rsDriver(browser = c("chrome"))

and i get:

checking Selenium Server versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking chromedriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking geckodriver versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
checking phantomjs versions:
BEGIN: PREDOWNLOAD
BEGIN: DOWNLOAD
BEGIN: POSTDOWNLOAD
[1] "Connecting to remote server"

The chrome browser (61.0.3163.100) launches but I cannot run the next line of my code because of the last line. The browser stays open for about half a minute before self closing and i get this error:

Selenium message:unknown error: unable to discover open pages
  (Driver info: chromedriver=2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f),platform=Windows NT 6.1.7601 SP1 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 60.44 seconds
Build info: version: '3.6.0', revision: '6fbf3ec767', time: '2017-09-27T16:15:40.131Z'
System info: host: 'RENTEC-THINK', ip: '192.168.56.1', os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.8.0_144'
Driver info: driver.version: unknown

Error:   Summary: UnknownError
     Detail: An unknown server-side error occurred while processing the command.
     Further Details: run errorDetails method

I've tried multiple different things, including downloading a chrome driver (v2.33 should support chrome v60-62 https://sites.google.com/a/chromium.org/chromedriver/downloads) and including the path in removedriver or adding the path as a system variable

It's like anything I do does not work, as if there the update for RSelenium messed everything up. Am I doing something stupid ?

I've reached the point where, from all the inconsistent answers I've seen online, that I'm finding myself trying different combinations of different lines of code, mixmatching everything etc in a desperate attempt to try and get this working through trial and error alone

My next attempt is trying to find out where R installed RSelenium then seeing what is in the code :(

I was also thinking about the docker, but I'm not really into installing separate applications just to get my code to work.

user3120554
  • 511
  • 1
  • 8
  • 18

2 Answers2

1

The following worked for me. Note browser, selenium and driver versions...

wdman::selenium(port = 4444L, geckover = "0.24.0", 
                version = "3.141.59",check=FALSE, retcommand = TRUE) %>%
  system(wait=FALSE, invisible=FALSE)

rmDrv = remoteDriver(extraCapabilities = list(marionette = TRUE),
                     browserName="firefox", port = 4444L)
rmDrv$open()

rmDrv$navigate("https://www.google.com")

rmDrv$close()
0

Try:

remDr <- remoteDriver(browserName = "chrome")
Sys.sleep(5)
remDr$open()

Sometimes the driver tries to open too quickly and you get the "Failed to connect to localhost port 4444: Connection refused" error.