0

So as a disclaimer, I have read multiple threads and tried the different things suggested none of which is working.

This is the code I am using, then I will explain what is happening and what I hope to have happen:

library('rnoaa')
library('dplyr')
library('utils')
library('cgwtools')


data_type <- c('tmax','tmin','PRCP', 'SNOW', 'SNWD')

## Station ID for MSO is GHCND:USW00024153
## Station ID for GPI is GHCND:USC00244558
## Station ID for BTM is GHCND:USW00024135

for (i in 2009:2019){
  start_date <- paste(i, '-01-01', sep = "")
  end_date <- paste(i, '-12-31', sep = "")
  assign(paste('mso_data', i, sep = ""), ncdc(datasetid = 'GHCND', stationid = 'GHCND:USW00024153',
             datatypeid = data_type, startdate = start_date, 
             enddate = end_date, limit = 1000))
  a <- paste('mso_data', i, sep = "")

  
  if (i == 1948){
    save(a, file = 'mso_data.RData')
  }
  else {
    resave(a, file = 'mso_data.RData')
  }
}

mso_data <- ncdc(datasetid = 'GHCND', stationid = 'GHCND:USW00024153',
                 datatypeid = data_type, startdate = '2020-01-01', 
                 enddate = '2020-07-07', limit = 1000)
resave(mso_data, file = 'mso_data.RData')

Alright, so what I would like to happen is download multiple years of climate data using the package RNOAA. In another post, someone showed me a different way to download this data, in the end, to use their way I still need to fix how I save the data.

RNOAA the function ncdc() only allows for a maximum 1 year of data to download, so if for instance, you wanted to download 1948 - 2020, I devised the above code. Also, you will see the for loop is (2009:2019) I arbitrarily chose to download 1 decade at a time because the download process is time-intensive. I simply start the for loop at (1948:1959), then (1960:1969), ECT...

I know all the code up to saving works, each individual year of data is visible in my global environment. Where I am having a problem is in the saving. I have tried all of the following extensions ( .RData, .Rda, .rds) which I found in different threads. When I then try to 'read in' that data, it does not exist although I can see it in the destination folder on my computer.

Originally, I was able to save at least my final lines of code that is for the year 2020... all outside the for loop but like I said I am downloading each individual year of data, I have confirmed that.

Thanks

Metgeneer
  • 19
  • 2
  • 2
    *"When I then try to 'read in' that data, it does not exist"*. (a) What code are you using to try to read in the data? (b) What do you mean by "not exist"? Do you get an error? What does it say? – Gregor Thomas Aug 05 '20 at 20:29
  • 1
    Couple more general questions/comments: (a) The file extension doesn't matter - it's just a label. You could use `.Metgeneer` as your file extension and everything would work fine (it would just be weird and confusing to other humans reading your code or looking at your files). (b) Is there a reason you want the years of data saved as separate objects within the same file? Seems like it would be easier to put them in one big list or data frame and save that after the loop, rather than modify the saved file at each iteration... – Gregor Thomas Aug 05 '20 at 20:32
  • In standard R installations, this is line is redundant: `library('utils')`. – Parfait Aug 05 '20 at 21:36
  • To add, read this canonical answer to use [list of data frames](https://stackoverflow.com/a/24376207/1422451) from @GregorThomas. – Parfait Aug 05 '20 at 21:51

2 Answers2

0

I agree with the comments above - it sounds like it would make more sense to make a function for one year, run lapply (or even mclapply to run this in parallel) over it to get a list of results, then either save the list object, export all elements to the global environment, or combine it, e.g. with data.table::rbindlist and save that table.

Example (not tested, since I have no API key):

library(rnoaa)
library(data.table)
getNoaa <- function(yr, type = c('tmax','tmin','PRCP', 'SNOW', 'SNWD')) 
  ncdc(datasetid = 'GHCND', 
    stationid = 'GHCND:USW00024153',
    datatypeid = type, 
    startdate = paste0(yr, '-01-01'), 
    enddate = paste0(yr, '-12-31'), limit = 1000)  
res <- setNames(lapply(2009:2019, getNoaa), paste0("Year", 2009:2019))

# this would export all individual list elements to the global environment:
list2env(res , envir = .GlobalEnv) 

# this would combine the individual lists
res <- rbindlist(res, idcol="Year")
user12728748
  • 6,092
  • 2
  • 3
  • 9
  • So everything but the final command worked, here is the error I received:``` res – Metgeneer Aug 06 '20 at 19:26
  • It seems like the individual list elements returned by the ncdc function are not equivalent data.frames, but S3 list of length two, a slot of metadata (meta), and a slot for data (data). You might have to write some lapply function operating on `res` to extract the relevant information, so that you can then combine these elements. – user12728748 Aug 06 '20 at 19:39
0

So I tried answer #1 and here is the error I received:

res <- rbindlist(res, idcol="Year")
Error in rbindlist(res, idcol = "Year") : 
  Column 1 of item 1 is length 3 inconsistent with column 2 which is length 8. Only length-1 columns are recycled.

I went back and all 73 elements of "res" are 1 by 8 tibbles, so I am confused by the column error, unless they do not like the headers as compared to $data. When ncdc() download, it downloads as a list and the using info is stored in $data.

Also, since I love to learn. From other languages I have always used "for loops" for repetitive tasks. Can someone explain how lapply() accomplishes this and I am assuming that setnames is a way to set multiple variables for downloading individual years.

Metgeneer
  • 19
  • 2