1

I have two datetime (Timestamp) formatted columns in my dataframe, df['start'], df['end']. I'd like to get the duration between the two dates. So I create the duration column

df['duration'] = df['start'] - df['end']

However, now the duration column is formatted as numpy.timedelta64, instead of datetime.timedelta as I would expect.

>>> df['duration'][0]
>>> numpy.timedelta64(0,'ns')

While

>>> df['start'][0] - df['end'][0]
>>> datetime.timedelta(0)

Can someone explain to me why the array subtraction change the timedelta type? Is there a way that I keep the datetime.timedelta as it is easier to work with?

EdChum
  • 294,303
  • 173
  • 671
  • 486
Zhen Sun
  • 745
  • 1
  • 10
  • 19
  • possible duplicate of [Converting between datetime, Timestamp and datetime64](http://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64) – philshem Dec 01 '14 at 09:40
  • Whilst that question will no doubt be useful, it's **not** a duplicate. – Ffisegydd Dec 01 '14 at 09:55

1 Answers1

2

This was one of the motivations for implementing a Timedelta scalar in pandas 0.15.0. See full docs here

In >= 0.15.0 the implementation of a timedelta64[ns] Series is still np.timedelta64[ns] under the hood, but all is completely hidden from the user in a datetime.timedelta sub-classed scalar, Timedelta (which is basically a useful superset of timedelta and the numpy version).

In [1]: df = DataFrame([[pd.Timestamp('20130102'),pd.Timestamp('20130101')]],columns=list('AB'))

In [2]: df['diff'] = df['A']-df['B']

In [3]: df.dtypes
Out[3]: 
A        datetime64[ns]
B        datetime64[ns]
diff    timedelta64[ns]
dtype: object

# this will return a Timedelta in 0.15.2
In [4]: df['A'][0]-df['B'][0]
Out[4]: datetime.timedelta(1)

In [5]: (df['A']-df['B'])[0] 
Out[5]: Timedelta('1 days 00:00:00')
Jeff
  • 108,421
  • 19
  • 199
  • 170