1

I have a large set of csv files in a single directory. These files contain two columns, Date and Price. The filename of filename.csv contains the unique identifier of the data series. I understand that missing values for merged data series can be handled when these times series data are zoo objects. I also understand that, in using the na.locf(merge() function, I can fill in the missing values with the most recent observations.

I want to automate the process of.

  1. loading the *.csv file columnar Date and Price data into R dataframes.
  2. establishing each distinct time series within the Merged zoo "portfolio of time series" objects with an identity that is equal to each of their s.
  3. merging these zoo objects time series using MergedData <- na.locf(merge( )).

The ultimate goal, of course, is to use the fPortfolio package.

I've used the following statement to create a data frame of Date,Price pairs. The problem with this approach is that I lose the <filename> identifier of the time series data from the files.

  result <- lapply(files, function(x) x <- read.csv(x) )

I understand that I can write code to generate the R statements required to do all these steps instance by instance. I'm wondering if there is some approach that wouldn't require me to do that. It's hard for me to believe that others haven't wanted to perform this same task.

agstudy
  • 113,354
  • 16
  • 180
  • 244
user1981673
  • 35
  • 1
  • 6
  • "The problem with this approach is that I lose the identifier of the data from the files" can you explain this? otherwise write your script in a source e. myprocess.R and you can call it using **Rscript** myprocess.R – agstudy Jan 15 '13 at 22:23
  • What I mean is this... The csv filenames are not random but provide the identity of the data (e.g., IBM.csv would mean IBM daily returns, etc.). In using that lapply() function above, I lose the names of the data. – user1981673 Jan 16 '13 at 17:15

2 Answers2

2

Try this:

z <- read.zoo(files, header = TRUE, sep = ",")
z <- na.locf(z)

I have assumed a header line and lines like 2000-01-31,23.40 . Use whatever read.zoo arguments are necessary to accommodate whatever format you have.

G. Grothendieck
  • 211,268
  • 15
  • 177
  • 297
1

You can have better formatting using sapply( keep the files names). Here I will keep lapply.

  1. Assuming that all your files are in the same directory you can use list.files. it is very handy for such workflow.
  2. I would use read.zoo to get directly zoo objects(avoid later coercing)

For example:

zoo.objs <- lapply(list.files(path=MY_FILES_DIRECTORY,
                              pattern='^zoo_*.csv',    ## I look for csv files, 
                                                       ##   which names start with zoo_
                              full.names=T),           ## to get full names path+filename
                   read.zoo)

I use now list.files again to rename my result

 names(zoo.objs) <- list.files(path=MY_FILES_DIRECTORY,
                          pattern='^zoo_*.csv')
agstudy
  • 113,354
  • 16
  • 180
  • 244