0

I want to download the data from this website.

http://asphaltoilmarket.com/index.php/state-index-tracker/

But the request keeps getting timed out.

I have tried following methods already, but it keep getting timed out.

library(rvest)
IndexData <- read_html("http://asphaltoilmarket.com/index.php/state-index-tracker/")
library(RCurl)
IndexData <- getURL("http://asphaltoilmarket.com/index.php/state-index-tracker/")
library(httr)
library(XML)
IndexData <- htmlParse(GET(url))

This website opens in the browser without any problem, and I am able to download this data using excel and alteryx.

ok1more
  • 527
  • 4
  • 11

1 Answers1

2

If by "get the data", you mean "scrape the table on that page", then you just need to go a little further.

First thing, you'll want to check the sites robots.txt to see if scraping is allowed. In this case, there is no mention against scraping.

You've got the html for the site, you just need to find the css selector for what you want. You can use developer tools or something like selector gadget to find the table and get its css selector.

After that you use the html, extract the node you're interested in with html_node() then extract the table with html_table().

library(magrittr)
library(rvest)

html <-read_html("http://asphaltoilmarket.com/index.php/state-index-tracker/")

html %>% 
  html_node("#tablepress-5") %>% 
  html_table()
#>             State     Jan     Feb     Mar     Apr     May     Jun     Jul
#> 1         Alabama $496.27 $486.86 $482.16 $498.62 $517.44 $529.20 $536.26
#> 2          Alaska $513.33 $513.33 $513.33 $513.33 $513.33 $525.84 $535.00
#> 3         Arizona $476.00 $469.00 $466.00 $463.00 $470.00 $478.00 $480.00
#> 4        Arkansas $503.50 $500.50 $494.00 $503.00 $516.50 $521.20 $525.00
#> 5      California $305.80 $321.00 $346.20 $365.50 $390.10 $380.50 $345.50
#> 6        Colorado $228.10 $301.45 $320.58 $354.12 $348.70 $277.55 $297.23
#> 7     Connecticut $495.00 $495.00 $495.00 $495.00 $502.50 $502.50 $500.56
#> 8        Delaware $493.33 $458.33 $481.67 $496.67 $513.33 $510.00 $498.33
#> 9         Florida $507.30 $484.32 $487.12 $503.38 $518.52 $517.68 $514.03
#> 10        Georgia $515.00 $503.00 $503.00 $517.00 $534.00 $545.00 $550.00 
Jake Kaupp
  • 7,097
  • 2
  • 21
  • 34
  • Thanks for the help on robots.txt, and further code for creating the table. However the "read_html" still does not work for me. For some reason it has something to do with proxy. I got it working by using `download.file` first and then using `read_html` – ok1more Dec 11 '19 at 19:12