8

Simply put: if

x <- read.csv(url)

exists, then R will return the contents of that url. A good example, if you want to try it, might be "http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2008&d=03&e=4&f=2014&g=d&ignore=.csv" . That particular url, if assigned to url and run as above, will load up a data.frame into x from the Yahoo website containing the past 5 years of IBM stock data.

But how to tell, beforehand, if any given url will get you 404'd ?

something like:

is.404.or.not(url)

or maybe

status(connect.to(url))

Thanks!

Ben Bolker
  • 173,430
  • 21
  • 312
  • 389
wht_rbt_obj
  • 101
  • 1
  • 6
  • Maybe this will help? You could modify the solutions to catch "error words" that show up on those particular pages perhaps: https://stackoverflow.com/questions/38114066/using-trycatch-and-rvest-to-deal-with-404-and-other-crawling-errors – mysteRious Apr 19 '18 at 22:52

1 Answers1

8

You could use the RCurl package:

R> library(RCurl)
Loading required package: bitops
R> url.exists("http://google.com")
[1] TRUE
R> url.exists("http://csgillespie.org")
[1] FALSE

Alternatively, you could use the httr package

R> library(httr)
R> http_status(GET("http://google.com"))
$category
[1] "success"

$message
[1] "success: (200) OK"

R> http_status(GET("http://csgillespie.org"))
$category
[1] "server error"

$message
[1] "server error: (503) Service Unavailable"
csgillespie
  • 54,386
  • 13
  • 138
  • 175
  • That is a good idea. Now, I installed that package but R won't run the library(RCurl) command. I do notice on the readme for RCurl that, for linux systems (i'm running Ubuntu), you often have to explicitly install "libcurl-devel" . Now, the kicker is that "libcurl-devel" is what that library is called in the Red Hat RPM world, but I'm running Ubuntu. I have three choices: libcurl4-openssl-dev ; libcurl4-nss-dev ; and libcurl4-gnutls-dev . Any idea what the difference is between those? – wht_rbt_obj Apr 17 '14 at 17:33
  • httr gives same result. That's because it and RCurl depend on some of the same packages that must be in "libcurl-devel" – wht_rbt_obj Apr 17 '14 at 17:37
  • 1
    Just use `apt-get install r-cran-rcurl` and all the dependencies will be taken care of - see http://stackoverflow.com/questions/7765429/unable-to-install-r-package-in-ubuntu-11-04/7765470 for a similar question – csgillespie Apr 17 '14 at 17:40