R: merge two irregular time series

Question

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.

I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:

>x
"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9

>y
"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34

# result I would like
"1987-01-01"   7.1    NA   3   55.3   66   45
"1987-01-02"   5.2    5    2   NA     NA   NA
"1987-01-03"   NA     NA   NA  77.3   87   34
"1987-01-06"   2.3    NA   9   NA     NA   NA

What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.

I've sat for hours getting almost nowhere, so all help is appreciated.

EDIT: some code included below as per suggestion from Soumendra

atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)

# only number of observations and the observations themselves differ 
>str(atcoa)
'data.frame':   6151 obs. of  8 variables:
 $ date        :Class 'Date'  num [1:6151] 6210 6213 6215 6216 6217 ...
 $ max         : num  4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
 $ min         : num  4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
 $ close       : num  4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
 $ avg         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ tot.vol     : int  877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
 $ turnover    : num  3762300 388900 3835900 8015900 12468100 ...
 $ transactions: int  12 9 24 17 31 26 34 35 37 33 ...

>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02  4.31  4.28    4.31    NA    877733    3762300             12

# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")

>str(ts.atcoa)
Time Series:          
 Name:               object
Data Matrix:        
 Dimension:          6151 7
 Column Names:       a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Row Names:          1970-01-01 01:43:30  ...  1970-01-01 04:12:35
Positions:          
 Start:              1970-01-01 01:43:30
 End:                1970-01-01 04:12:35
With:               
 Format:             %Y-%m-%d %H:%M:%S
 FinCenter:          GMT
 Units:              a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Title:              Time Series Object
 Documentation:      Wed Aug 17 13:00:50 2011

>ts.atcoa[1:1, ]
GMT
 a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 1970-01-01 01:43:30  4.31  4.28    4.31    NA    877733    3762300             12

# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob)  # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25  7.92  7.77    7.79    NA    269172    2119400             19  7.69  7.56    7.64    NA  81176693  593858000             12

EDIT: problem solved by (using zoo package)

atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))

names(atcoa) <- c("a.max", "a.min", "a.close",
                   "a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
                   "b.avg", "b.tot.vol", "b.turnover", "b.transactions")

atco <- merge.zoo(atcoa, atcob)

Thank you all for your help.

You didn't show what you actually tried but `merge.zoo` certainly does not just stack zoo objects. There is a help page on `merge.zoo` which includes documentation and many examples: `?merge.zoo` . Also there are 5 vignettes (pdf documents) that come with zoo that give additional documentation and examples. See my post below for more info. — G. Grothendieck, Aug 17 '11 at 08:47

G. Grothendieck · Accepted Answer · 2017-04-08T00:24:30.213

11

Try this:

Lines.x <- '"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9'

Lines.y <- '"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34'

library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x,  y)

giving:

           V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01  7.1   NA    3 55.3   66   45
1987-01-02  5.2    5    2   NA   NA   NA
1987-01-03   NA   NA   NA 77.3   87   34
1987-01-06  2.3   NA    9   NA   NA   NA

edited Apr 08 '17 at 00:24

answered Aug 17 '11 at 08:44

G. Grothendieck

211,268
15
177
297

For Lines.x and Lines.y, do you mean that these should be entered manually? The datasets are too large for that... – Benjamin Allévius Aug 17 '11 at 11:22
I think the comments of the answer above suggest that you use read.zoo to read in your files. x – Sameer Aug 17 '11 at 11:33
Ah, you are right. This solves the problem; I will update the post original post. – Benjamin Allévius Aug 17 '11 at 12:03
If you are using read.zoo you do not need to use read.csv as well as they both use read.table underneath. You can just write read.zoo("ATCOA_full_adj.csv", header=T). HTH – Sameer Aug 17 '11 at 19:13

Soumendra · Answer 2 · 2011-08-17T08:31:39.297

3

You can create a timeSeries (timeSeries library) object from your dates, merge them (timeSeries default merge behaviour is different from zoo and xts and does exactly what you are asking for) and then make zoo/xts objects out of the result in case you don't want to stay with timeSeries.

One quick way to test is the following, assuming you have two zoo objects zz1 and zz2 -

library(timeSeries)
as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))

Compare the output of the above command with

merge(zz1, zz2)

You can also cbind -

cbind(zz1, zz2)

provided there are no shared columns with same names. Even if such column are there, you can choose the columns by which you cbind, and you will get a zoo object.

cbind(zz1[, 1:2], zz2[, 2:3]) #Assuming other columns are common

edited Aug 17 '11 at 08:31

answered Aug 17 '11 at 08:19

Soumendra

1,148
1
15
26

1

Thank you. When I try your approach, I notice the following: when I use as.timeSeries on a data frame with dates in the format "%Y-%m-%d", the timeSeries object created will have the date format "%Y-%m-%d %H:%M:%S"; that I specify format="%Y-%m-%d" in the function seems to be ignored. If I then merge (using either base::merge or timeSeries::merge) two such timeSeries objects, I get a data frame with dates in the format "%Y-%m-%d", that contains only the dates that are mutual to both objects. However, the columns contain the information I want (like in the example). Any way around this? – Benjamin Allévius Aug 17 '11 at 09:49
Code has been included in the original post. – Benjamin Allévius Aug 17 '11 at 11:17
what is the difference between: `as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))` and `merge(as.timeSeries(zz1), as.timeSeries(zz2))` ? I like your answer. – johnatasjmo Oct 09 '17 at 06:57

Selcuk Akbas · Answer 3 · 2018-03-26T19:17:38.853

2

here, i found a more generic aproach from stat.ethz.ch

a <- ts(1:10, start=c(2014,6), frequency=12)
b <- ts(1:12, start=c(2015,1), frequency=12)

library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))

to get a ts object back:

as.ts(m)

edited Mar 26 '18 at 19:17

answered Mar 30 '16 at 08:22

Selcuk Akbas

609
1
8
18

is "m" at the end of 5th row a typo? – EngrStudent Aug 25 '17 at 13:45

score 1 · Answer 4 · answered Aug 17 '11 at 08:11

1

How about this:

## Generate unique sorted time values.
i <- sort(unique(c(index(x), index(y))))

## Empty data matrix.
v <- matrix(nrow=length(i), ncol=6, NA)

## Pull in data items.
v[match(index(x), i), 1:3] <- coredata(x)
v[match(index(y), i), 4:6] <- coredata(y)

## Build new zoo object.
d <- zoo(v, order.by=i)

answered Aug 17 '11 at 08:11

Ian Ross

965
6
14

You may be onto something here, with some modifications: if I skip the index(x) and index(y) in the assignment to i, I will get a vector of the unique dates (if I just use the indices, it will not work since they are unrelated between the two objects - I must compare the dates instead). I can use this to fill the first column of a matrix like you suggest. However, I'm not sure if the "Pull in data items" stage still works: instead I should possibly compare the date column of x and y to the first column of the matrix, and assign elements if there is a match in either. – Benjamin Allévius Aug 17 '11 at 12:11

R: merge two irregular time series

4 Answers4

Linked

Related