How to Identify if the rate of an event (moving average) exceeds a threshold

Question

EDIT Added information

Originally this was just about a general algorithm and language/platform agnostic. However I'm going to answer this question myself and the answer is in fact specific to the tools in use.

This is for event detection on an IBM mainframe under z/OS , using the Ops/MVS automation tool running a REXX script.

So the answers posted may be applicable in Python, Perl, bash, Java etc; it's just that the product being used in this particular case has a specific function that does the trick.

End of added information

My question is very similar to this:

How to calculate continuous smooth event rate based on event times?

and this would be an answer:

This can be implemented with a moving average. Take your last N events where N is the size of your averaging window. Compute the time difference between the first and the last of these N events. If you are measuring in seconds and want the rate in event per minute you would then divide 60 seconds by your time difference expressed in seconds and you multiply by N-1.

except I'd like to avoid storing information about previous events. I'm also only interested if the moving average exceeds a threshold, so I'm not interested in keeping a trend of rate.

So for example, I want to know if I get more than 3 events/min. This was my first approach:

When the first event comes in, I create a count of 1 and log the start time.
When another event comes in, I increment the count and calculate the rate from the count and the elapsed time
If rate exceed permitted value, generate alert.

I realised this wouldn’t work because if you had an event a week ago and then nothing until 10 events in the last minute, the average ‘rate’ is 11 in a week i.e. 3.6/day, rather than the current rate of 10/min.

So I'm thinking of trying the following:

When the first event comes in, I create a count of 1 and log the start time.
When another event comes in, if the time since the previous event exceeds the interval over which I want to measure the rate (1 min in my example), I effectively discard the previous event and record a count of 1 and the current time as the new start time (because if it's been over 1 min since the previous event, the rate can't exceed x/min right?).
If the time since the previous event hasn't exceeded the monitoring interval, increment the count and calculate the rate from the count and the elapsed time
If rate exceed permitted value, generate alert.

This seems simple but other posts on SO (specifically this question: Estimating rate of occurrence of an event with exponential smoothing and irregular events and it's accepted answer: https://stackoverflow.com/a/23617678/1430420) seem to imply that there's a lot more to it than I think.

Does this answer your question? [Efficient way to compute number of hits to a server within the last minute, in real time](https://stackoverflow.com/questions/11701008/efficient-way-to-compute-number-of-hits-to-a-server-within-the-last-minute-in-r). If not, you'll probably have to clarify what you mean by "moving average" (any sort of averaging would result in your example not triggering the threshold of 10). — Bernhard Barker, May 29 '19 at 12:06
@Dukeling It may do except the language I am using (Ops/MVS REXX on z/OS on an IBM mainframe) does not support arrays..Since asking I've discovered that that Ops/MVS has a built-in function to perform this task. I don;t know if I should expand my answer to include these details and then answer it. or just delete the question. — Steve Ives, May 29 '19 at 13:09
I think you need to make clear that this is using Ops/MVS earlier; in that case, I'd keep the question, as it's specific, and not answered by the other question. — Kevin McKenzie, May 30 '19 at 14:54

score 1 · Answer 1 · answered May 30 '19 at 14:21

Ops/MVS has this functionality built in via the 'OPSTHRSH' function:

https://docops.ca.com/ca-opsmvs/13-5/en/reference-information/command-and-function-reference/ops-rexx-built-in-functions/opsthrsh-function

for this particular scenario, we can invoke it as follows:

if OPSTHRSH('A',60) > 3 then do something...

OPSTHRESH('A',60) will return a count of how many times the current event has triggered for the current address space (task) within a 60 second period. If this value exceeds my trigger level, then take action. 60 seconds after the first event is received, the event count is reset.

score 0 · Answer 2 · answered May 29 '19 at 14:25

Use the following pseudo-code:

boolean update(long timestamp, History h, int windowSize, int minEventsToTrigger) {
    h.removeOlderThan(timestamp - windowSize);
    h.addEvent(timestamp);
    return h.size() >= minEventsToTrigger;
}

Where h is a circular buffer storing timestamps with the following operations:

removeOlderThan(t): removes all events that happened before t. This operation is amortized O(1), since each event will be removed exactly once, and events (except the oldest) will never be queried more than once for removal.
addEvent(t): adds an event at the end of the buffer, or if the buffer is full, removes the oldest event first, and then adds the new event. The operation O(1); and discarding an old event for a newer one guarantees that sudden influxes of events will not overwhelm the system, require extra memory, or break this code -- as long as minEventsToTrigger is smaller than the capacity of h, the results will always be correct.

This pseudo-code is, I believe, optimal in time, and probably also in space. Importantly, it does not require any kind of dynamic allocation.

The update function returns true if, given the new event, at least minEventsToTrigger have been received within windowSize time units, or false otherwise. Note that it is intended to be invoked only when each event is received, and therefore can only accurately rising edges (falling edges will not be detected until the next event). If you wish to remedy this, you have two options:

poll regularly to see if, after calling h.removeOlderThan(timestamp - windowSize);, the condition return h.size() >= minEventsToTrigger; ceases to be true. This can be wasteful if events are very infrequent; if you only do this when the alert is triggered, then you may save a lot of unnecessary operations.
use some kind of timer mechanism to, once an alert is triggered, wake up just after the oldest event is due to expire. This would guarantee minimal delay between events expiring and a check for h.size() >= minEventsToTrigger.

I dont want to store information about previous avents (as far as possible) as the language I am using lacks arrays etc. — Steve Ives, May 30 '19 at 11:09
You can make do with 3 timestamps (if you are only interested in detecting 3 events within 1 minute); you do not need an array for that, but you do need 3 read/write memory locations. The algorithm is the same. — tucuxi, May 30 '19 at 13:12

How to Identify if the rate of an event (moving average) exceeds a threshold

2 Answers2