0

I need go get school name and state from all the options in the dropdown menu.

The URL is http://www.speechanddebate.org/aspx/rankings.aspx?navid=608&pnavid=2

I was trying to code this in R by following an example I found on here but couldn't install a package properly.

I need a very basic how-to, preferably in R, on how to get school name and state from the dropdown menu.

ariel
  • 11
  • 5
  • `rvest` is nice for basic web scraping, but in this case it won't be able to change the dropdown for you because doing so doesn't alter the URL. For that you'll need something like `RSelenium`, which is a little more intense, though well-documented. – alistaire Apr 13 '16 at 06:59
  • 1
    Expanding in the use of `RSelenium`: http://stackoverflow.com/questions/26963927/dropdown-boxes-in-rselenium and http://stackoverflow.com/questions/31616734/read-values-in-dropdown-menu-element-with-rselenium. For this case, I think it's the way to go. Don't forget to check the ToU of the site you intend to scrape! – PavoDive Apr 13 '16 at 07:05

1 Answers1

0

here is a suggestion if you only want a list of entries. It uses only base functions since you seems to have difficulty installing packages (prob firewall/proxy)

urlink <- "http://www.speechanddebate.org/aspx/rankings.aspx?navid=608&pnavid=2"
alllines <- readLines(urlink)
startidx <- (which(grepl("-- View All Districts --", alllines, fixed=T)) + 1)
endindices <- which(grepl("</select>", alllines, fixed=T))
endidx <- head(endindices[endindices > startidx],1)
alllines[startidx:endidx]
mylist <- unname(na.omit(sapply(alllines[startidx:endidx], 
    function(s) strsplit(strsplit(s, ">")[[1]][2], "<")[[1]][1])))
chinsoon12
  • 23,550
  • 4
  • 20
  • 30