1

I'm attempting to loop through a list of IDs to scrape some tables off Fangraphs. The below code works when I insert a single id and remove the for loop, but gives an error (i.e., Error in open.connection(x, "rb") : HTTP error 400.) when I reinsert the for loop. I've looked around at various places including here and here, but nothing I try seems to work. I've also shortened my original list of 1000+ IDs to only 10 and still receive the error.

Can anyone help with this? Feel like this should be a pretty simple scraping task given the url is the exact same except for the IDs and the page layout is pretty straightforward. Thanks so much in advance.

for (id in pitchIDs$playerid) {
    url <- paste("https://www.fangraphs.com/statsd.aspx? 
playerid=",id,"&position=P&type=&gds=&gde=&season=all")
    gamelogs <- url %>%
    read_html() %>%
    html_nodes(xpath = '//*[@id="DailyStats1_dgSeason1_ctl00"]') %>%
    html_table()
    gamelogs$id <- id
}
Abb
  • 57
  • 11
  • you can try with `paste0` to have an empty separtoru when concatenating your url. I am not sure a url with space could work. If it is not that : can you provide a list of ID for reproductibility ? – cderv Jun 21 '18 at 20:30

1 Answers1

1

Looks like I solved the problem. Perhaps paste0 helped do the trick. Thank you @cderv. See code below...

data = c()
for(id in pitchIDs$playerid) {
  url <- read_html(paste0("https://www.fangraphs.com/statsd.aspx? 
         playerid=",id,"&position=P&type=&gds=&gde=&season=all"))
  gamelogs <- url %>%
  html_nodes(xpath = '//*[@id="DailyStats1_dgSeason1_ctl00"]') %>%
  html_table()
  gamelogs <- gamelogs[[1]]
  gamelogs$id <- id
if(is.data.frame(data)) {
  names(gamelogs) = names(data)
  data = rbind(data, gamelogs)
  } else {
    data = gamelogs
  }
}
Abb
  • 57
  • 11