0

I have a DataFrame with two columns of time information. The first is the epoch time in seconds, and the second is the corresponding formatted str time like "2015-06-01T09:00:00+08:00" where "+08:00" denotes the timezone.

I'm aware that time formats are in a horrible mess in Python, and that matplotlib.pyplot seems to only recognise the datetime format. I tried several ways to convert the str time to datetime but none of them would work. When I use pd.to_datetime it will convert to datetime64, and when using pd.Timestamp it converts to Timestamp, and even when I tried using combinations of these two functions, the output would always be either datetime64 or Timestamp but NEVER for once datetime. I also tried the method suggested in this answer. Didn't work. It's kind of driving me up the wall now.

Could anybody kindly figure out a quick way for this? Thanks!

I post a minimal example below:

import matplotlib.pyplot as plt
import time
import pandas as pd
df = pd.DataFrame([[1433120400, "2015-06-01T09:00:00+08:00"]], columns=["epoch", "strtime"])

# didn't work
df["usable_time"] = pd.to_datetime(df["strtime"])  

# didn't work either
df["usable_time"] = pd.to_datetime(df["strtime"].apply(lambda s: pd.Timestamp(s)))  

# produced a strange type called "struct_time". Don't think it'd be compatible with pyplot
df["usable_time"] = df["epoch"].apply(lambda x: time.localtime(x))  

# attempted to plot with pyplot
df["usable_time"] = pd.to_datetime(df["strtime"])
plt.plot(x=df["usable_time"], y=[0.123])
plt.show()
Vim
  • 1,126
  • 1
  • 15
  • 26
  • Can you explain more why not working `df["usable_time"] = pd.to_datetime(df["strtime"])` ? – jezrael Aug 12 '18 at 16:58
  • @jezrael in this way it would convert to `Timestamp`, and pyplot doesnt seem to understand this format. – Vim Aug 12 '18 at 17:03
  • Ok, can yo add your code for ploting? – jezrael Aug 12 '18 at 17:05
  • Pyplot can read Pandas `datetime64` formats without a problem. If you do `plt.plot(df.usable_time, df.epoch)` a graph is rendered without error. (You can add an extra point or two to verify that a line is plotted.) – andrew_reece Aug 12 '18 at 17:05
  • @jezrael Yes. I have added the plotting part. – Vim Aug 12 '18 at 17:09
  • @andrew_reece it didn't produce explit Error but the output graph is empty and the x-axis is labelled with some float numbers where it should be labelled with datetime text instead. – Vim Aug 12 '18 at 17:12
  • @andrew_reece by the way may I ask which `'usable_time'` method you adopted in your working example? – Vim Aug 12 '18 at 17:13
  • @Vim see my answer below. – andrew_reece Aug 12 '18 at 17:20

2 Answers2

2

UPDATE (per comments)
It seems like the confusion here is stemming from the fact that the call to plt.plot() takes positional x/y arguments instead of keyword arguments. In other words, the appropriate signature is:

plt.plot(x, y)

Or, alternately:

plt.plot('x_label', 'y_label', data=obj) 

But not:

plt.plot(x=x, y=y)

There's a separate discussion of why this quirk of Pyplot exists here, also see ImportanceOfBeingErnest's comments below.

Original
This isn't really an answer, more of a demonstration that Pyplot doesn't have an issue with Pandas datetime data. I've added an extra row to df to make the plot clearer:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame([[1433120400, "2015-06-01T09:00:00+08:00"],
                   [1433130400, "2015-07-01T09:00:00+08:00"]], 
                  columns=["epoch", "strtime"])

df["usable_time"] = pd.to_datetime(df["strtime"])  

df.dtypes
epoch                   int64
strtime                object
usable_time    datetime64[ns]
dtype: object

plt.plot(df.usable_time, df.epoch)

plot

pd.__version__ # '0.23.3'
matplotlib.__version__ # '2.2.2'
andrew_reece
  • 16,937
  • 2
  • 20
  • 46
  • 1
    It could simply be that OP is using `plt.plot(x=..., y=...)` and you are using `plt.plot(..., ...)` – ImportanceOfBeingErnest Aug 12 '18 at 17:18
  • @ImportanceOfBeingErnest yeah this seems to be the issue.. I didn't know they are different! Thanks. – Vim Aug 12 '18 at 17:20
  • @ImportanceOfBeingErnest will you post an answer that elucidates the difference? (Or does that already exist?) – andrew_reece Aug 12 '18 at 17:22
  • @ImportanceOfBeingErnest but I still don't understand why they are different... The first and second parameters of plot are just x and y. Would you mind pointing out? – Vim Aug 12 '18 at 17:23
  • 1
    Probably such answers exist already somewhere but are hard to find because those problems hide behind totally different questions as can be seen in this case as well. The signature of `plot` is [`plot([x], y, [fmt], data=None, **kwargs)`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html), `x` and `y` are hence no named or keyword arguments and therefore one cannot (mis)use them as such. Maybe this information can just be added to the answer? – ImportanceOfBeingErnest Aug 12 '18 at 17:29
  • Thanks @ImportanceOfBeingErnest! I found another post that goes into the details a bit more. Updated answer to include a link to that post, the link you provided to the docs, and a quick overview. – andrew_reece Aug 12 '18 at 17:39
1

You can use to_pydatetime (from the dt accessor or Timestamp) to get back native datetime objects if you really want to, e.g.:

pd.to_datetime(df["strtime"]).dt.to_pydatetime()

This will return an array of native datetime objects:

array([datetime.datetime(2015, 6, 1, 1, 0)], dtype=object)

However, pyplot seems to be able to work with pandas datetime series.

YS-L
  • 12,472
  • 2
  • 38
  • 50