1

I'm working on practicing my skills in R (I come from a background of using STATA, which is probably why I'm running into this issue here), and I'm having trouble defining objects in R, which I would later like to save as .csv files, which is why I need separate objects (I think?) using a loop.

Here is the code I'm running:

for (year in c("34","38","50","54")){
        noquote(paste0("wcq_", year)) <- data.frame(read_lines(html_text(
        html_nodes(read_html(paste0("http://www.rsssf.com/tables/", year, 
        "q.html")), "pre"), trim = TRUE)))

A lot going on there, but here are the most important things:

The URL varies by the last two digits of the year I'm looking at, which is why I created that list in the loop (it actually goes on repeating every 4 years until 1998, but I wanted to simplify everything).

I'm using data.frame and read_lines to scrape html information from the <pre> block of the websites and put it into a table (one variable, with a number of observations that varies across years depending on the number of lines in the <pre> block of the site).

Without the loop, a function what works correctly looks like this:

wcq_34 <- data.frame(read_lines(html_text(html_nodes(
        read_html("http://www.rsssf.com/tables/34q.html"), "pre"), trim = TRUE)))

When I run the loop, I get the error:

"target of assignment expands to non-language object"

What am I doing wrong here? I've tried several different ways of tackling this issue, specified here:

First attempt, defining an object as "wcq_":

wcq_ <- "wcq_"

for (year in c("34","38","50","54",)){
        noquote(paste0(wcq_, year)) <-data.frame(read_lines(html_text(
        html_nodes(read_html(paste0("http://www.rsssf.com/tables/", year,
        "q.html")), "pre"), trim = TRUE)))

This gives me the error:

could not find function "noquote<-"

I've tried using lapply but it wouldn't allow me to edit the URLs nor define the objects based on the year, correct? Here's what I've tried:

"wcq_years_pre2002 <- c(34,38,50,54)


for (year in c(34,38,50,54)){
        wcq_table_pre2002 <- lapply(wcq_years_pre2002, 
        data.frame(read_lines(html_text(
        html_nodes(read_html(paste0("http://www.rsssf.com/tables/",
        year,"q.html")), "pre"), trim = TRUE))))    }

This gives me the same "target of assignment expands to non-language object" error.

Thank you in advance for the help, like I said, I think I'm just used to the macros in STATA, so I'm not quite sure how to work these loops. I posted for the first time yesterday and got so many useful comments, so I'm really grateful to this community for helping me deepen my skills in R :)

Julian
  • 347
  • 1
  • 12
  • You don't need separate objects - use a list. See some discussion here at [How to make a list of data frames?](https://stackoverflow.com/a/24376207/903061) – Gregor Thomas Oct 18 '17 at 16:55
  • To start you off, your first line can become `year_vec = c("34","38","50","54"); wcq = lapply(year_vec, function(year) data.frame(read_lines(html_text( html_nodes(read_html(paste0("http://www.rsssf.com/tables/", year, "q.html"")), "pre"), trim = TRUE))))` – Gregor Thomas Oct 18 '17 at 16:58
  • Then maybe `names(wcq) = year_vec`. Now you can look at `wcq[["34"]]` or `wcq[["50"]]`, and do things like save them all as separate files with a for loop or `lapply`. – Gregor Thomas Oct 18 '17 at 17:03
  • Though I copy/pasted the first bit from you code, and I think you have an issue with two quotes in a row: `"q.html""` should be `"q.html"`. – Gregor Thomas Oct 18 '17 at 17:05
  • 1
    @Gregor, thank you so much this is excellent advice and a great place to start. That link you provided was great as well. Good catch on the double quotes, I edited it to fix that. This worked wonderfully, I need to remember to keep using lists. – Julian Oct 18 '17 at 17:13
  • :) If you've got it working now, I'll close this question as a dupe of the one I linked. If not, I'd suggest editing a bit to show what still isn't working - I don't get any errors with `year_vec = c("34", "38", "50", "54"); wcq = lapply(year_vec, function(year) data.frame(read_lines(html_text(html_nodes(read_html(paste0("http://www.rsssf.com/tables/", year, "q.html")), "pre"), trim = TRUE)))); names(wcq) = year_vec`, though the output still needs some work. But it should be relatively easy work using `for` or `lapply` with everything in a single list. – Gregor Thomas Oct 18 '17 at 17:18
  • Yes, it's working now, feel free to close it. Thank you Gregor! – Julian Oct 18 '17 at 17:23

0 Answers0