Calculating the duration an event in a time series python

Question

I have a dataframe as show below:

index                value
2003-01-01 00:00:00  14.5
2003-01-01 01:00:00  15.8
2003-01-01 02:00:00     0
2003-01-01 03:00:00     0
2003-01-01 04:00:00  13.6
2003-01-01 05:00:00   4.3
2003-01-01 06:00:00  13.7
2003-01-01 07:00:00  14.4
2003-01-01 08:00:00     0
2003-01-01 09:00:00     0
2003-01-01 10:00:00     0
2003-01-01 11:00:00  17.2
2003-01-01 12:00:00     0
2003-01-01 13:00:00   5.3
2003-01-01 14:00:00     0
2003-01-01 15:00:00   2.0
2003-01-01 16:00:00   4.0
2003-01-01 17:00:00     0
2003-01-01 18:00:00     0
2003-01-01 19:00:00   3.9
2003-01-01 20:00:00   7.2
2003-01-01 21:00:00   1.0
2003-01-01 22:00:00   1.0
2003-01-01 23:00:00  10.0

The index is datetime and have column record the rainfall value(unit:mm) in each hour,I would like to calculate the "Average wet spell duration", which means the average of continuous hours that exist values (not zero) in a day, so the calculation is

2 + 4 + 1 + 1 + 2 + 5 / 6 (events) = 2.5 (hr)

and the "average wet spell amount", which means the average of sum of the values in continuous hours in a day.

{ (14.5 + 15.8) + ( 13.6 + 4.3 + 13.7 + 14.4 ) + (17.2) + (5.3) + (2 + 4)+ (3.9 + 7.2 + 1 + 1 + 10) } /  6 (events) = 21.32 (mm)

The datafame above is just a example, the dataframe which I have have more longer time series (more than one year for example), how can I write a function so it could calculate the two value mentioned above in a better way? thanks in advance!

P.S. the values may be NaN, and I would like to just ignore it.

@R.p.T just two values, one is duration (hr) (float) and one is amount (mm) (float) — Chi, Mar 18 '18 at 20:08
You want to find the average `rainfall value` of a day or within an certain hourly range or week or what? — RPT, Mar 18 '18 at 20:09
I want to find the duration, so is the average of each continuous hours in a day — Chi, Mar 18 '18 at 20:20
and the average amount is the average of the sum of values of each continuous hours in a day — Chi, Mar 18 '18 at 20:22

score 3 · Accepted Answer · answered Mar 18 '18 at 21:28

3

I believe this is what you are looking for. I have added explanations to the code for each step.

# create helper columns defining contiguous blocks and day
df['block'] = (df['value'].astype(bool).shift() != df['value'].astype(bool)).cumsum()
df['day'] = df['index'].dt.normalize()

# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()

# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)

# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']

Result

         day  sessions  duration  value     amount
0 2003-01-01         6       2.5  127.9  21.316667

answered Mar 18 '18 at 21:28

jpp

134,728
29
196
240

I have a problem, does the dt (second line) means "datetime"?. The command window shows that 'DatetimeIndex' object has no attribute 'datetime' – Chi Mar 30 '18 at 16:51
Looks like you need to elevate your index to a series, e.g. `df = df.reset_index()` or `df['index'] = df.index`. `dt` is an accessor object of a `datetime` series, see [`pd.Series.dt`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html). I'm always disappointed with `pandas` when you can do something with a series but not an index, but apparently it's nontrivial for the dev team to fix. – jpp Mar 30 '18 at 16:54
One more question, is it possible to calculate frequency of continuous hours of zero by modifying somewhere? – Chi Mar 30 '18 at 20:17
I'm sure it is. But you should check and ask separately. – jpp Mar 30 '18 at 20:19

score 2 · Answer 2 · answered Mar 18 '18 at 20:26

I am not exactly sure what you are asking for. But, I think what you are asking for is resample(). If I misunderstood your question, correct me, please.

From Creating pandas dataframe with datetime index and random values in column, I have created a random time series dataframe.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(1), freq='H')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'Day': days, 'Value': data})
df = df.set_index('Day')

View the dataframe

Day                         Value
2018-03-18 20:18:08.205546  29
2018-03-18 21:18:08.205546  56
2018-03-18 22:18:08.205546  82
2018-03-18 23:18:08.205546  13
2018-03-19 00:18:08.205546  35
2018-03-19 01:18:08.205546  53
2018-03-19 02:18:08.205546  25
2018-03-19 03:18:08.205546  23
2018-03-19 04:18:08.205546  21
2018-03-19 05:18:08.205546  12
2018-03-19 06:18:08.205546  15
2018-03-19 07:18:08.205546  9
2018-03-19 08:18:08.205546  13
2018-03-19 09:18:08.205546  87
2018-03-19 10:18:08.205546  9
2018-03-19 11:18:08.205546  63
2018-03-19 12:18:08.205546  62
2018-03-19 13:18:08.205546  52
2018-03-19 14:18:08.205546  43
2018-03-19 15:18:08.205546  77
2018-03-19 16:18:08.205546  95
2018-03-19 17:18:08.205546  79
2018-03-19 18:18:08.205546  77
2018-03-19 19:18:08.205546  5
2018-03-19 20:18:08.205546  78

Now, re-sampling your dataframe

# resample into 2 hours and drop the NaNs
df.resample('2H').mean().dropna()

It gives you,

Day                 Value
2018-03-18 20:00:00 42.5
2018-03-18 22:00:00 47.5
2018-03-19 00:00:00 44.0
2018-03-19 02:00:00 24.0
2018-03-19 04:00:00 16.5
2018-03-19 06:00:00 12.0
2018-03-19 08:00:00 50.0
2018-03-19 10:00:00 36.0
2018-03-19 12:00:00 57.0
2018-03-19 14:00:00 60.0
2018-03-19 16:00:00 87.0
2018-03-19 18:00:00 41.0
2018-03-19 20:00:00 78.0

Similarly, you can resample into days, hours, minutes etc which I leave upto you. You might need to take a look at

not like this, I would like to calculate the average hour which have values and continuous in a day. — Chi, Mar 18 '18 at 20:31

Calculating the duration an event in a time series python

2 Answers2

Linked