8

I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website

Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.

I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:

Attempt #1 (using RCurl):

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
                    submit = "Show Prices",
                    priceDate.year  = 2014,
                    priceDate.month = 12,
                    priceDate.day   = 15,
                   .opts = curlOptions(ssl.verifypeer = FALSE))

This results in a web page being returned and stored in td.html but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.

Attempt #2 (using rvest):

s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)

Unfortunately, this approach doesn't even leave R and results in the following error message from R:

Submitting with 'submit'
Error in function (type, msg, asError = TRUE)  : <url> malformed

I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.

Any suggestions or tips to solving this seeming simple task would be greatly appreciated!

2 Answers2

12

Well, it appears to work with the httr library.

library(httr)

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"

fd <- list(
    submit = "Show Prices",
    priceDate.year  = 2014,
    priceDate.month = 12,
    priceDate.day   = 15
)

resp<-POST(url, body=fd, encode="form")
content(resp)

The rvest library is really just a wrapper to httr. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at

f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm

you see that it just has the path and not the server name. This appears to be confusing httr. If you do

f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)

that seems to work. Perhaps it's a big that should be reported to rvest. (Tested on rvest_0.1.0)

MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • Awesome!!! Thanks for the quick reply. Your httr solution works great and I get the results I wanted. I'm not familiar enough with rvest to feel comfortable submitting a bug report. BTW, your "fix" for rvest was successful, but I chose to use your httr solution because it was easier to use the results. – Daddy the Runner Dec 24 '14 at 06:37
  • @MrFlick Many thanks! I had the same problem with another login form and your description solved it. I just opened a [rvest issue](https://github.com/hadley/rvest/issues/52) and referenced this SO post. – alex23lemm Dec 31 '14 at 16:48
0

I know this is an old question, but adding the

style='POST'

parameter to postForm does the trick as well.

r-q
  • 63
  • 1
  • 1
  • 6