17

Imagine that I have a set of measurements of x that are taken by many processes x0 ... xN at times t0 ... tN. Let's assume that at time t I want to make an estimate of the current value of x, based on the assumption that there is no long term trend I know about and that x can be predicted from an algorithm such as exponential smoothing. As we have many processes, and N can get very large, I can't store more than a few values (e.g. the previous state).

One approach here would be to adapt the normal exponential smoothing algorithm. If samples are taken regularly, I would maintain an estimator yn such that:

yn = α.yn-1 + ( 1 - α ). xn

This approach is not great where the sampling is irregular as many samples together would have a disproportionate influence. Therefore this formula could be adapted to:

yn = αn.yn-1 + ( 1 - αn ). xn

where

αn = e-k.(tn - tn-1)

IE dynamically adjusting the smoothing constant depending on the interval between the previous two samples. I'm happy with this approach and it seems to work. It's the first answer given here, and a good summary of these sorts of techniques are given by Eckner in this 2012 paper (PDF).

Now, my question is as follows. I want to adapt the above to estimate the rate of an occurrence. Occasionally an event will occur. Using a similar exponential technique, I want to get an estimate of the rate that event occurs.

Two obvious strategies would be:

  • To use the first or the second technique using the delay between the last two events as the data series xn.
  • To use the first or the second technique using the reciprocal of the delay between the last two events (i.e. the rate) as the data series xn.

Neither of these turn out to be good strategies as far as I can tell. Firstly, take an event that occurs every 500ms (on the one hand) and an event that occurs with a 200ms delay and an 800ms delay on the other. Clearly these both occur twice a second, so the rate estimate given should be the same. Ignoring the time from the last sample seems foolhardy, so I'll concentrate on the second strategy. Using the delay (rather than the reciprocal) does not turn out to be a good predictor because the simulating the 200ms/800ms sample stream produces an estimate of about 1.5 (on the basis the average of reciprocals is not the reciprocal of the average).

But, far more significantly, neither strategy copes with what is actually happening in practice, which is that suddenly all the events stop for a long while. The 'latest' value of y is thus the value at the last event, and the estimate of the rate thus never gets calculated. Hence the rate appears constant. Of course if I was analysing the data retrospectively, this wouldn't be an issue, but I'm analysing it in real time.

I realise another way to do this would be to run some thread periodically (e.g. every 10 seconds) and count the number of occurrences in this 10 second interval. This is very resource heavy my end as the statistics are not needed often, and I am loathe to run a thread that polls everything due to mutex issues. Therefore I'd like to (somehow) use an algorithm which adjusts the state read by (e.g.) the time since the last sample was taken. This seems a reasonable approach as if the performance is measured at times chosen independently of the samples, the measurement time will on average be half way through the period between samples, so a very crude unsmoothed estimate of the rate would be half the reciprocal of the time since the last sample. To complicate things further, my measurement time will not be independent of the samples.

I have a feeling this has a simple answer but it's eluding me. I have a feeling that the correct route is to assume the the events are Poisson distributed, and derive an estimate for λ based on the interval since the last sample and some form of moving average, but my statistics is too rusty to make this work.

There is a near dupe of this question here but the answer doesn't seem to be very satisfactory (I hope I explained why). I'd add that a Kalman filter seems way to heavyweight given I have one variable to estimate and know nothing about it. There are a number of other near dupes, most of which either suggest keeping large bins of values (not realistic here from a memory point of view) or do not address the two issues above.

Community
  • 1
  • 1
abligh
  • 23,144
  • 3
  • 41
  • 81
  • Can you keep a list of the say last 10 or so time stamps the event occurred? – Daniel Brückner May 12 '14 at 19:06
  • How do the statistics of your events look? The reason I'm saying this is that spooky-long periods over which "all the events stop" can often indicate pathological statistics, like power-law statistics, for which these estimators never work well. – Raman Shah May 12 '14 at 19:06
  • @DanielBrückner I could keep a list of a (fixed) small number of events (e.g. 10), and say "how long did those take to occur". However, that wouldn't help with the issue where no more events come (at all) and I need to estimate the rate. – abligh May 12 '14 at 19:07
  • @RamanShah the events should be come in at a roughly constant (probably Poisson distributed in the short term) rate from each of a number of sources, leading to a nice steady average. What I'm trying to measure is (a) big drop offs in these indicating severe connectivity problems, (b) a reasonably accurate 'rate' in steady state conditions so there is something to measure performance against. You could consider it a bit like interface counters on networking equipment (with many many interfaces). – abligh May 12 '14 at 19:11
  • So to clarify, you want your estimator to decay to zero when you haven't seen an event for a long time? I think I'm starting to understand your data and what you want from it, but I'm not certain. – Raman Shah May 12 '14 at 19:24
  • 1
    [Here](http://www.columbia.edu/~ww2040/EstimatingNonhomogPoisson96.pdf) is a paper dealing with your situation of estimating the parameter of an inhomogeneous Poisson process. They also mention the problem of the rate going to zero. – Daniel Brückner May 12 '14 at 19:27
  • @DanielBrückner thanks. Looks very useful. I shall read that. In the meantime you gave me an idea which I shall post below. – abligh May 12 '14 at 19:30
  • Once you start looking for "estimate inhomogeneous poisson process` countless papers occur. This seems a well studied problem. There is even at least one [library](http://cran.r-project.org/web/packages/NHPoisson/NHPoisson.pdf). – Daniel Brückner May 12 '14 at 19:35
  • @DanielBrückner unless I'm misreading that paper, it appears to require that you store the timestamps of all the data. Are you suggesting that you apply it merely to the timestamps of the past N occurrences? Doesn't that mean the 'window' of the moving average would contract when the data becomes more frequent? My issue is the data could be much less than once a second, or tens of times a second, and I'd quite like the averages to have similar-ish time periods. – abligh May 12 '14 at 19:49
  • I made my first comment before I discovered the paper and I did not read it entirely. My idea was to keep the last n events and when you want to know the current rate you calculate it from the last n occurrences and the time since the last occurrence. So you will not have a variable containing the current rate but you will calculate it when needed and maybe cache the result for some time in case it is frequently used. – Daniel Brückner May 12 '14 at 19:55

2 Answers2

20

First, if you assume that the occurrence rate of the events itself is constant (or that you're only interested in its long-term average), then you can simply estimate it as:

        λ* = N / (tt0)

where t is the current time, t0 is the start of observations, N is the number of events observed since t0 and λ* is the estimate of the true frequency λ.

At this point, it's useful to note that the estimation formula given above may be reformulated as the integral:

        λ* = integral( δevent(τ) dτ ) / integral( 1 dτ )

where the variable of integration τ ranges from t0 to t, and δevent(τ) = sum( δ(τ − ti), i = 1 .. N ) is a sum N of Dirac delta functions, with a single delta-peak at the occurrence time ti of each event i.

Of course, this would be a completely useless way to calculate λ*, but it turns out to be a conceptually useful formulation. Basically, the way to view this formula is that the function δevent(τ) measures the instantaneous rate at which the number of events increases at time τ, while the second integrand, which is just the constant 1, measures the rate at which time increases over time (which, of course, is simply one second per second).


OK, but what if the frequency λ itself may change over time, and you want to estimate its current value, or at least its average over a recent period?

Using the ratio-of-integrals formulation given above, we can obtain such an estimate simply by weighing both integrands by some weighing function w(τ) which is biased towards recent times:

        λ*recent = integral( δevent(τ) w(τ) dτ ) / integral( w(τ) dτ )

Now, all that remains is to pick a reasonable w(τ) such that these integrals simplify to something easy to calculate. As it turns out, if we choose an exponentially decaying weighing function of the form w(τ) = exp(k(τ − t)) for some decay rate k, the integrals simplify to:

        λ*recent = sum( exp(k(tit)), i = 0 .. N ) k / ( 1 − exp(k(t0t)) )

In the limit as t0 → −∞ (i.e., in practice, when the total observation time (tt0) is much larger than the weight decay timescale 1/k), this further simplifies to just:

        λ*recent = k sum( exp(k(tit)), i = 0 .. N )

Alas, naïvely applying this formula would still require us to remember all the event times ti. However, we can use the same trick as for calculating usual exponentially weighted averages — given the weighted average event rate λ*recent(t') at some earlier time t', and assuming that no new events have occurred between t' and t, we can calculate the current weighted average event rate λ*recent(t) simply as:

        λ*recent(t) = exp( k(t't) ) λ*recent(t')

Further, if we now observe a new event occurring at exactly time t, the weighted average event rate just after the event becomes:

        λ*recent(t) = k + exp( k(t't) ) λ*recent(t')


Thus, we get a very simple rule: all we need to store is the time tlast of the previous observed event, and the estimated recent event rate λ*last just after said event. (We may initialize these e.g. to tlast = t0 and λ*last = 0; in fact, with λ*last = 0, the value of tlast makes no difference, although for non-zero λ*last it does.)

Whenever a new event occurs (at time tnew), we update these values as:

        λ*lastk + exp( k(tlasttnew) ) λ*last
        tlasttnew

and whenever we wish to know the recent event rate average at the current time t, we simply calculate it as:

        λ*(t) = exp( k(tlastt) ) λ*last


Ps. To correct for the initial bias towards the (arbitrary) initial value of tlast, we can add back the 1 / ( 1 − exp(k(t0t)) ) correction term that we simplified out earlier when we assumed that t ≫ t0. To do that, simply start from tlast = 0 at t = t0, update tlast as above, but calculate the estimated recent event rate average at time t as:

        λ*corr(t) = exp( k(tlastt) ) λ*last / ( 1 − exp(k(t0t)) )

(Here, t0 denotes the time at which you start measuring events, not the occurrence of the first event.)

This will eliminate the initial bias towards zero, at the cost of increasing the early variance. Here's an example plot showing the effects of the correction, for k = 0.1 and a true mean event rate of 2:

Plot of λ* over time, with or without initial bias correction
The red line shows λ*(t) without the initial bias correction (starting from λ*(t0) = 0), while the green line shows the bias-corrected estimate λ*corr(t).


Pps. As the plot above shows, λ*, as calculated above, will not a be continuous function of time: it jumps up by k whenever an event occurs, and decays exponentially towards zero when events do not occur.

If you'd prefer a smoother estimate, you can calculate an exponentially decaying average of λ* itself:

        λ**(t) = integral( λ*(τ) exp(k2(τ − t)) dτ ) / integral( exp(k2(τ − t)) dτ )

where λ* is the exponentially decaying average event rate as calculated above, k2 is the decay rate for the second average, and the integrals are over −∞ < τ ≤ t.

This integral can also be calculated by a step-wise update rule as above:

        λ**lastWt) λ*last + exp( −k2 Δt ) λ**last
        λ*lastk1 + exp( −k1 Δt ) λ*last
        tlasttnew

where k1 and k2 are the decay rates for the first and second averages, Δt = tnewtlast is the elapsed time between the events, and:

        Wt) = k2 ( exp( −k2 Δt ) − exp( −k1 Δt ) ) / (k1k2)

if k1k2, or

        Wt) = k Δt exp( −k Δt )

if k1 = k2 = k (the latter expression arising from the former as the limit when (k1k2) → 0).

To calculate the second average for an arbitrary point in time t, use the same formula:

        λ**(t) = Wt) λ*last + exp( −k2 Δt ) λ**last

except with Δt = ttlast.


As above, this estimate can also be bias-corrected by applying a suitable time-dependent scaling factor:

        λ**corr(t) = λ**(t) / (1 - S(tt0))

where:

        St) = ( k1 exp( −k2 Δt ) − k2 exp( −k1 Δt ) ) / (k1k2)

if k1k2, or

        St) = (1 + k Δt) exp( −k Δt )

if k1 = k2 = k.

The plot below shows the effects of this smoothing. The red and green lines show λ*(t) and λ*corr(t) as above, while the yellow and blue lines show λ**(t) and λ**corr(t), as calculated with k1 = 0.1 (as above) and k2 = 0.2:

Plot of λ* and λ** over time, with or without initial bias correction

Ilmari Karonen
  • 44,762
  • 9
  • 83
  • 142
  • 1
    Fantastic; thanks. I'd pretty much got to the exponential 'scale down' of the previous estimate on each re-estimate, but had missed the way to scale that for later time t. – abligh May 12 '14 at 20:00
  • 1
    Yup, lovely! I've used this ratio-of-integrals formulation before for physics data processing, but the efficient updating and extraction of the average on demand is really nice. – Raman Shah May 12 '14 at 21:15
  • Just to add, I put this in the real code (rather than just a simulator) and it works really well. My only adaption was to start with a slightly larger k and reduce it over the first 50 or so samples to the desired value. This gets the estimate closer (but no doubt with a larger SD) earlier in the run, as I have no good initial estimate of lamda to use initially. – abligh May 13 '14 at 07:30
  • @abligh: You can actually fix the initial bias simply by scaling the estimate by 1 / (1 − exp(k(t0 − t))); see the edit I just made above. – Ilmari Karonen May 13 '14 at 08:08
  • @IlmariKaronen thanks. I need to think about how to use that because some of my sample streams aren't necessarily running at all when the program starts, so for them t0 should be when the program starts, not the first sample, as clearly if the program runs for 10 minutes before the first sample, then gets 2 samples in one second, the best estimate of the rate is not 2! – abligh May 13 '14 at 09:26
  • @abligh: Yes, t0 should be the time at which you start counting events; the first actual event will generally occur some time later, at t1. (Ps. I added some extra detail to my answer above about using second-order smoothing to get rid of the jumps in the average when an event occurs.) – Ilmari Karonen May 13 '14 at 10:40
  • amazing answer. \*bow\* – redlus Nov 22 '16 at 17:03
2

You could try this:

Keep an estimator zn so that at each event:

zn = (zn-1+κ).e-κ.(tn-tn-1)

This will converge towards the event rate in s-1. A sligtly better estimator is then (as there is still an error/noise related if you compute the estimate just before or just after an event) :

wn = zn.e-κ/(2.zn)

In your example it will converge to 2s-1 (the inverse of 500ms)

The constant κ is responsible for the smoothing and is in s-1. Small values will smooth more. If your event rate is roughly of seconds, a value of 0.01s-1

for κ is a good start.

This method has a starting bias, and z0 could be set to an estimate of the value for faster convergence. Small values of κ will keep the bias longer.

There are much more powerful ways of analyzing poisson-like distributions, but they often require large buffers. Frequency analysis such as Fourier transform is one.

galinette
  • 7,938
  • 29
  • 73
  • This is basically the same as the formula I gave, except that you're tracking the average as it was just *before* the most recent event, whereas I'm tracking the average just *after* the most recent event. Of course, either way works; they just differ by *k*. Anyway, +1 for being quicker. :-) – Ilmari Karonen May 12 '14 at 20:00
  • At least you prove it, I don't :) (I did it just intuitively and checked on a random serie...) – galinette May 12 '14 at 20:08
  • After the last edit mine is more accurate as it compensates for the bias related to the interval – galinette May 12 '14 at 20:15
  • It appears that you're calculating something slightly different than me now: my λ\* is a simple exponentially weighted average of the recent event rate, and will decay exponentially towards zero if no new events occur, whereas you seem to be projecting forwards from the last event to calculate an "average of the average" (and your estimate only changes when events happen). I suppose I could add some kind of smoothing term to my formula too, if the discontinuities in the average at each event are considered undesirable; I'll have to think about it a bit more... – Ilmari Karonen May 12 '14 at 20:26
  • Can you explain the derivation of w sub n, and how you would adjust if estimating after the time of the last sample? – abligh May 12 '14 at 20:52