0

I want to calculate binary field churn_flag if user churn the game or he/she is still playing.

  1. I have calculated data max date

    max_time = data['time'].max()
    

    Result:

    Timestamp('2017-07-12 01:18:50') (future date)
    
  2. I have calculated each user max date:

    data_max_time = pd.DataFrame(data.groupby(['id'])['time'].max()).reset_index() 
    data_max_time.columns = ['id','user_max_time']
    

    Result:

    2017-07-11 10:33:11 dtype:datetime64[ns]
    
  3. I should check if the difference between these two dates longer or shorter than 2 days. I tried to solve it with:

    (np.datetime64(final_data['max_time'],'D')-np.datetime64(final_data['user_max_time'],'D'))< (np.timedelta64(2,'D'))
    

    Result:

    ValueError: Could not convert object to NumPy datetime 
    

How could I calculate True/False (1/0) field for each user?

Cœur
  • 32,421
  • 21
  • 173
  • 232
Raya
  • 61
  • 1
  • 10
  • Possible duplicate of [Converting between datetime, Timestamp and datetime64](https://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64) – Mel Jul 04 '17 at 09:03

1 Answers1

2

I believe is not necessary converting, use pandas only:

rng = pd.date_range('2017-04-03 15:00:07', periods=10, freq='28.5H')
data = pd.DataFrame({'time': rng, 'id': [1,1,2,2,2,5,5,5,1,2]})  
print (data)
   id                time
0   1 2017-04-03 15:00:07
1   1 2017-04-04 19:30:07
2   2 2017-04-06 00:00:07
3   2 2017-04-07 04:30:07
4   2 2017-04-08 09:00:07
5   5 2017-04-09 13:30:07
6   5 2017-04-10 18:00:07
7   5 2017-04-11 22:30:07
8   1 2017-04-13 03:00:07
9   2 2017-04-14 07:30:07

max_time = data['time'].max()

data_max_time = data.groupby('id')['time'].max()
#data_max_time.columns = ['id','user_max_time']
print (data_max_time)
id
1   2017-04-13 03:00:07
2   2017-04-14 07:30:07
5   2017-04-11 22:30:07
Name: time, dtype: datetime64[ns]

print (max_time - data_max_time)
id
1   1 days 04:30:00
2   0 days 00:00:00
5   2 days 09:00:00
Name: time, dtype: timedelta64[ns]


df = (max_time - data_max_time < pd.Timedelta(2, unit='D')).reset_index(name='a')
print (df)
   id      a
0   1   True
1   2   True
2   5  False
jezrael
  • 629,482
  • 62
  • 918
  • 895