1

I have a dataset as

time               MachineId  
1530677359000000000 01081081  
1530677363000000000 01081081  
1530681023000000000 01081090  
1530681053000000000 01081090  
1530681531000000000 01081090

So my codes goes like:

import pandas as pd
from datetime import datetime
import time
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdate

df= pd.read_csv('acn.csv')`

df['time']=pd.to_datetime(df['time'], unit='ns')` #converting the epoch nanosec time to datetime-format

print(df.head())

Output:

   time            MachineId   
0 2018-07-04 04:09:19  1081081.0  
1 2018-07-04 04:09:23  1081081.0  
2 2018-07-04 05:10:23  1081090.0   
3 2018-07-04 05:10:53  1081090.0  
4 2018-07-04 05:18:51  1081090.0 

and now I want to change my data of time to numeric to generate a plot between time and machine id

dates = plt.dates.date2num(df['time'])
df.plot(kind='scatter',x='dates',y='MachineId')
plt.show()

which throws a error as :

AttributeError: 'module' object has no attribute 'dates'

How can I change datetime format to numeric so that a plot can be formed ?

J...S
  • 4,713
  • 1
  • 15
  • 34
surya rahul
  • 563
  • 1
  • 8
  • 24
  • I made an edit to your question to clarify the wording. This edit is pending a peer review. I think the first two big blocks in your question (the original df and the pandas code which calls pd.to_datetime) are not at all relevant to your problem, and should probably be deleted as well, but I left them in for now. – Jeff Ellen Jul 16 '18 at 08:20
  • @surya-rahul You might be interested in [this chart](https://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64). And why do you want to convert it a second time instead of using your datetime column `plt.plot(df.time, df.MachineId, "ro")`? – Mr. T Jul 16 '18 at 09:37
  • In general, it is a bad idea to mix pandas and matplotlib datetime objects: https://stackoverflow.com/a/44214830/8881141 – Mr. T Jul 16 '18 at 09:39
  • @Mr.T `plt.plot(df.time, df.MachineId, "ro")` gives a scatter plot but what is the parameter "ro" here ? how about getting other plots here in this case namely histogram, boxplots ?? – surya rahul Jul 16 '18 at 09:46
  • @suryarahul The documentation for plt.plot here: ( https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html ) explains that the text string, (in this case "ro") controls the color and style of the markers. The matplotlib documentation will also tell you how to make boxplots. However, that still won't fix the AttributeError, which my answer explains the cause and solution of that, so please accept it. – Jeff Ellen Jul 16 '18 at 15:48

2 Answers2

1

You got the following error:

AttributeError: 'module' object has no attribute 'dates'

Your error message is telling you that matplotlib.pyplot.dates (plt.dates) doesn't exist. (The error says that there's a module that you're calling 'dates' but it doesn't exist).

So you need to fix that error before you worry about converting anything. Did you mean to call matplotlib.dates.date2num instead? In your code you have the following:

import matplotlib.dates as mdate

So maybe you meant to call mdate.date2num instead? That should eliminate the AttributeError.

If that doesn't work for you, you could try what is suggested in the link provided by one of the other commenters, to use pandas to_pydatetime. I'm not familiar with it, but in this example page, it is accessed as Series.dt.to_pydatetime()

All of this converting is just necessary because you are trying to use df.plot; maybe you should consider just calling matplotlib directly. For example, could you just use plt.plot_date instead? (here's the link to it). Pandas is excellent, but the plotting interface isn't as mature as the rest of it. As an example (I'm not saying this is the exact problem you are having) but here is a known bug in pandas regarding plotting dates. Here is an older stack overflow thread where someone stubs out a plt.plot_date method for you.

Jeff Ellen
  • 497
  • 2
  • 8
  • Is this an answer or a question? – Mr. T Jul 16 '18 at 08:28
  • I think it's the answer. The original question asks for how to convert `datetime` to `numeric`, but that code is irrelevant without fixing the source of the attribute error. Once that is fixed, I don't think the conversion will be needed. – Jeff Ellen Jul 16 '18 at 08:30
  • To me it looks like a comment asking for clarifications. – Mr. T Jul 16 '18 at 08:32
  • I'm providing a viable alternative exactly following the suggestion of """What, specifically, is the question asking for? Make sure your answer provides that – or a viable alternative. The answer can be “don’t do that”, but it should also include “try this instead”. Any answer that gets the asker going in the right direction is helpful, but do try to mention any limitations, assumptions or simplifications in your answer. """ from here: https://meta.stackexchange.com/help/how-to-answer – Jeff Ellen Jul 16 '18 at 08:34
  • Actually the datetime format is not getting plot when I am assigning the "time" column on x-axis, it is asking for the time column to be in numeric format so I am using `dates = plt.dates.date2num(df['time'])` to change the datetime format to numeric but instead I am getting a error as `AttributeError: 'module' object has no attribute 'dates'` @JeffEllen – surya rahul Jul 16 '18 at 09:18
  • @suryarahul , Please read what I wrote more carefully. You are getting AttributeErrors because you're trying to call things that don't exist. There is no plt.dates either. There's matplitlib.dates.date2num; which many times people use an import as "import matplotlib as mpl" so you might see mpl.dates.date2num in some other example code. I edited my answer to try to make this more clear to you. – Jeff Ellen Jul 16 '18 at 15:40
  • Hello @JeffEllen Following code gives a value error as `ValueError: scatter requires x column to be numeric` `import pandas as pd from datetime import datetime import matplotlib.pyplot as plt import matplotlib.dates as mdate df= pd.read_csv('acn.csv') df['time']=pd.to_datetime(df['time'], unit='ns') mdate.date2num(df['time']) df.plot(kind='scatter',x='time',y='M') plt.show()` – surya rahul Jul 17 '18 at 07:02
  • @suryarahul So now that you have eliminated the AttributeError you can get to your original question. Like it shows in the link I provided at the end, you might also need to do something to use `to_pydatetime()` from `pandas`, so maybe something like `mdate.date2num(df['time'].to_pydatetime())` instead of `pd.to_datetime()` – Jeff Ellen Jul 17 '18 at 08:48
  • 'mdate.date2num(df['time'].to_pydatetime())' instead of 'pd.to_datetime()' gives this`AttributeError: 'Series' object has no attribute 'to_pydatetime'` @JeffEllen – surya rahul Jul 17 '18 at 09:54
  • I didn't say it was exactly that, I said something like that. I have never used it. https://stackoverflow.com/questions/22825349/converting-between-datetime-and-pandas-timestamp-objects shows another example. So this is what I did. I searched to see where it was: – Jeff Ellen Jul 17 '18 at 20:58
  • So this is what I did. I searched to see where to_pydatetime was: http://pandas.pydata.org/pandas-docs/stable/genindex.html On that page: there are three valid uses of to_pydatetime. DatetimeIndex, Series.dt, and Timestamp. Since you're trying to use it in a series, I wonder what Series.dt is. So I look that up: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html And that looks like it might be what you want. Try `df['time'].dt.to_pydatetime()` instead. – Jeff Ellen Jul 17 '18 at 21:01
  • @suryarahul I found an even better link, I put it at the end of my answer, because I think that might be your solution. – Jeff Ellen Jul 17 '18 at 21:04
  • Hey @JeffEllen Actually after converting my epoch `time` column of dataset to `datetime` object, when I am parsing these `datetime` objects to `df['time'].dt.to_pydatetime()` it accepts the parse, but when I try to plot this with `df.plot(kind='scatter',x='time',y='M')` it gives error as `ValueError: scatter requires x column to be numeric` means `df['time'].dt.to_pydatetime()` can't transform the `time` column as numeric type, so it concludes `to_pydatetime()` doesn't parse data as a numeric type. I feel I am near still so far from this plotting. :( – surya rahul Jul 18 '18 at 06:20
  • I'm trying to help you learn, not just give you the answer. (both because that will be better for you overall, and it's easier for me) What you could be doing better is: (1) searching the internet &/or stack overflow, I find these links in just a few seconds (2) read the docs for the functions you are trying to call (3) using a debugger or at least print statements to follow what is happening. For example, in your initial question seemed to indicate that you don't understand how imports are working. (4) Be flexible. Python has more than one way to do everything. Maybe df.plot isn't good. – Jeff Ellen Jul 18 '18 at 09:43
0

You can directly plot dates as well. For example if you want to have the date on the x-axis you pass the dates in ax.plot(df.time, ids). I think this might the closest thing to what you look for.

  • At what point in the question? `df.time` undergoes several conversions. – Mr. T Jul 16 '18 at 08:24
  • Obviously after the final change! Also this piece of code: `dates = plt.dates.date2num(df['time'])` is wrong since instead of plt.dates you need plt.time since that is the name of the column. Also one suggestion I would make because I did something similar to this is that I would drop the time of day and keep the dates. Then plot machine_id against date. – Nikolas Pitsillos Jul 16 '18 at 09:11
  • changing `dates = plt.dates.date2num(df['time'])` to `dates = plt.time.date2num(df['time'])` gives error `'module' object has no attribute 'date2num'` – surya rahul Jul 16 '18 at 09:37
  • You can refer to this as well: [https://stackoverflow.com/questions/27993540/converting-pandas-datetimeindex-to-float-days-format-with-matplotlib-dates-dat] Hope this helps. – Nikolas Pitsillos Jul 16 '18 at 11:26