How to aggregate timeseries in Python?

Question

I have two different timeseries with partially overlapping timestamps:

import scikits.timeseries as ts
from datetime import datetime 
a = ts.time_series([1,2,3], dates=[datetime(2010,10,20), datetime(2010,10,21), datetime(2010,10,23)], freq='D')
b = ts.time_series([4,5,6], dates=[datetime(2010,10,20), datetime(2010,10,22), datetime(2010,10,23)], freq='D')

which represents following data:

Day:   20. 21. 22. 23.
  a:    1   2   -   3
  b:    4   -   5   6

I would like to calculate a weighted average on every day with coefficients a(0.3) and b(0.7), while ignoring missing values:

Day 20.: (0.3 * 1 + 0.7 * 4) / (0.3 + 0.7) = 3.1 / 1.  = 3.1
Day 21.: (0.3 * 2          ) / (0.3      ) = 0.6 / 0.3 = 2
Day 22.: (          0.7 * 5) / (      0.7) = 3.5 / 0.7 = 5
Day 23.: (0.3 * 3 + 0.7 * 6) / (0.3 + 0.7) = 3.1 / 1.  = 5.1

when I first try to align these timeseries:

a1, b1 = ts.aligned(a, b)

I get correctly masked timeseries:

timeseries([1 2 -- 3],
  dates = [20-Oct-2010 ... 23-Oct-2010],
  freq  = D)

timeseries([4 -- 5 6],
  dates = [20-Oct-2010 ... 23-Oct-2010],
  freq  = D)

but when I do a1 * 0.3 + b1 * 0.7, it ignores values, that are present in one timeseries only:

timeseries([3.1 -- -- 5.1],
   dates = [20-Oct-2010 ... 23-Oct-2010],
   freq  = D)

What should I do to receive the awaited?

timeseries([3.1 2. 5. 5.1],
   dates = [20-Oct-2010 ... 23-Oct-2010],
   freq  = D)

EDIT: The answer should be applicable also to more than two initial timeseries with different weights and differently missing values.

So if we have four timeseries with weights T1(0.1), T2(0.2), T3(0.3) and T4(0.4), their weights at a given timestamp will be:

            |  T1 |  T2 |  T3 |  T4 |
weight      | 0.1 | 0.2 | 0.3 | 0.4 |
-------------------------------------
all present | 10% | 20% | 30% | 40% |
T1 missing  |     | 22% | 33% | 45% |
T1,T2 miss. |     |     | 43% | 57% |
T4 missing  | 17% | 33% | 50% |     |
etc.

"more than two initial timeseries"? You mean T1, T2, T3? Isn't that just ((T1 *agg* T2) *agg* T3)? In which case, any number of time series can be aggregated by simply applying the solution as a reduction. If not, why not? — S.Lott, Oct 20 '10 at 14:28
@S.Lott - not really. How would you handle weights with T1(0.2), T2(0.2) and T3(0.6)? If at a given timestamp T1 is missing, then the 0.6 of T3 represents actually 75% (T2 has then 25%) and not 60% of the whole group. In your ((T1 agg T2) agg T3) logic this would not work. — eumiro, Oct 20 '10 at 14:40
@eumiro: Please **update** your question with this requirement. — S.Lott, Oct 20 '10 at 14:52
@eumiro: Why is your comment very detailed and the question not very detailed? Please **update** your question to be **complete**. — S.Lott, Oct 20 '10 at 15:01

eumiro · Accepted Answer · 2010-10-20T13:59:55.630

I have tried and found this:

aWgt = 0.3
bWgt = 0.7

print (np.where(a1.mask, 0., a1.data * aWgt) +
       np.where(b1.mask, 0., b1.data * bWgt)) / (np.where(a1.mask, 0., aWgt) +
                                                 np.where(b1.mask, 0., bWgt))

# array([ 3.1,  2. ,  5. ,  5.1])

This is applicable to the edited question with more than one initial timeseries. But hopefully someone will find better.

EDIT: And this is my function:

def weightedAvg(weightedTimeseries):
    sumA = np.sum((np.where(ts.mask, 0., ts.data * weight) for ts, weight in weightedTimeseries), axis=0)
    sumB = np.sum((np.where(ts.mask, 0., weight) for ts, weight in weightedTimeseries), axis=0)
    return np.divide(sumA, sumB)

weightedAvg(((a1, 0.3), (bb, 0.7)))
# array([ 3.1,  2. ,  5. ,  5.1])

Works for any number of timeseries ;-)

I think your `weightedAvg` is better than the one I suggested because it is simpler, and takes less memory. Very nice! — unutbu, Oct 20 '10 at 23:42

How to aggregate timeseries in Python?

1 Answers1