17

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.

I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:

>x
"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9

>y
"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34

# result I would like
"1987-01-01"   7.1    NA   3   55.3   66   45
"1987-01-02"   5.2    5    2   NA     NA   NA
"1987-01-03"   NA     NA   NA  77.3   87   34
"1987-01-06"   2.3    NA   9   NA     NA   NA

What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.

I've sat for hours getting almost nowhere, so all help is appreciated.

EDIT: some code included below as per suggestion from Soumendra

atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)

# only number of observations and the observations themselves differ 
>str(atcoa)
'data.frame':   6151 obs. of  8 variables:
 $ date        :Class 'Date'  num [1:6151] 6210 6213 6215 6216 6217 ...
 $ max         : num  4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
 $ min         : num  4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
 $ close       : num  4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
 $ avg         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ tot.vol     : int  877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
 $ turnover    : num  3762300 388900 3835900 8015900 12468100 ...
 $ transactions: int  12 9 24 17 31 26 34 35 37 33 ...

>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02  4.31  4.28    4.31    NA    877733    3762300             12

# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")

>str(ts.atcoa)
Time Series:          
 Name:               object
Data Matrix:        
 Dimension:          6151 7
 Column Names:       a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Row Names:          1970-01-01 01:43:30  ...  1970-01-01 04:12:35
Positions:          
 Start:              1970-01-01 01:43:30
 End:                1970-01-01 04:12:35
With:               
 Format:             %Y-%m-%d %H:%M:%S
 FinCenter:          GMT
 Units:              a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Title:              Time Series Object
 Documentation:      Wed Aug 17 13:00:50 2011

>ts.atcoa[1:1, ]
GMT
 a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 1970-01-01 01:43:30  4.31  4.28    4.31    NA    877733    3762300             12

# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob)  # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25  7.92  7.77    7.79    NA    269172    2119400             19  7.69  7.56    7.64    NA  81176693  593858000             12

EDIT: problem solved by (using zoo package)

atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))

names(atcoa) <- c("a.max", "a.min", "a.close",
                   "a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
                   "b.avg", "b.tot.vol", "b.turnover", "b.transactions")

atco <- merge.zoo(atcoa, atcob)

Thank you all for your help.

zx8754
  • 42,109
  • 10
  • 93
  • 154
Benjamin Allévius
  • 797
  • 1
  • 5
  • 14
  • 1
    You didn't show what you actually tried but `merge.zoo` certainly does not just stack zoo objects. There is a help page on `merge.zoo` which includes documentation and many examples: `?merge.zoo` . Also there are 5 vignettes (pdf documents) that come with zoo that give additional documentation and examples. See my post below for more info. – G. Grothendieck Aug 17 '11 at 08:47
  • Suggest you do it like this: `nms – G. Grothendieck Aug 17 '11 at 20:56

4 Answers4

11

Try this:

Lines.x <- '"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9'

Lines.y <- '"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34'

library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x,  y)

giving:

           V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01  7.1   NA    3 55.3   66   45
1987-01-02  5.2    5    2   NA   NA   NA
1987-01-03   NA   NA   NA 77.3   87   34
1987-01-06  2.3   NA    9   NA   NA   NA
G. Grothendieck
  • 211,268
  • 15
  • 177
  • 297
3

You can create a timeSeries (timeSeries library) object from your dates, merge them (timeSeries default merge behaviour is different from zoo and xts and does exactly what you are asking for) and then make zoo/xts objects out of the result in case you don't want to stay with timeSeries.

One quick way to test is the following, assuming you have two zoo objects zz1 and zz2 -

library(timeSeries)
as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))

Compare the output of the above command with

merge(zz1, zz2)

You can also cbind -

cbind(zz1, zz2)

provided there are no shared columns with same names. Even if such column are there, you can choose the columns by which you cbind, and you will get a zoo object.

cbind(zz1[, 1:2], zz2[, 2:3]) #Assuming other columns are common
Soumendra
  • 1,148
  • 1
  • 15
  • 26
  • 1
    Thank you. When I try your approach, I notice the following: when I use as.timeSeries on a data frame with dates in the format "%Y-%m-%d", the timeSeries object created will have the date format "%Y-%m-%d %H:%M:%S"; that I specify format="%Y-%m-%d" in the function seems to be ignored. If I then merge (using either base::merge or timeSeries::merge) two such timeSeries objects, I get a data frame with dates in the format "%Y-%m-%d", that contains only the dates that are mutual to both objects. However, the columns contain the information I want (like in the example). Any way around this? – Benjamin Allévius Aug 17 '11 at 09:49
  • Code has been included in the original post. – Benjamin Allévius Aug 17 '11 at 11:17
  • what is the difference between: `as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))` and `merge(as.timeSeries(zz1), as.timeSeries(zz2))` ? I like your answer. – johnatasjmo Oct 09 '17 at 06:57
2

here, i found a more generic aproach from stat.ethz.ch

a <- ts(1:10, start=c(2014,6), frequency=12)
b <- ts(1:12, start=c(2015,1), frequency=12)

library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))

to get a ts object back:

as.ts(m)
Selcuk Akbas
  • 609
  • 1
  • 8
  • 18
1

How about this:

## Generate unique sorted time values.
i <- sort(unique(c(index(x), index(y))))

## Empty data matrix.
v <- matrix(nrow=length(i), ncol=6, NA)

## Pull in data items.
v[match(index(x), i), 1:3] <- coredata(x)
v[match(index(y), i), 4:6] <- coredata(y)

## Build new zoo object.
d <- zoo(v, order.by=i)
Ian Ross
  • 965
  • 6
  • 14
  • You may be onto something here, with some modifications: if I skip the index(x) and index(y) in the assignment to i, I will get a vector of the unique dates (if I just use the indices, it will not work since they are unrelated between the two objects - I must compare the dates instead). I can use this to fill the first column of a matrix like you suggest. However, I'm not sure if the "Pull in data items" stage still works: instead I should possibly compare the date column of x and y to the first column of the matrix, and assign elements if there is a match in either. – Benjamin Allévius Aug 17 '11 at 12:11