1

I have a Pandas DataFrame which has two columns containing some angles in the range [-pi, pi). I need to calculate the instantaneous angular velocity on each row, which I can do using diff(), however this naive approach fails when my data crosses the discontinuity from pi to -pi, e.g.

I'm trying to use numpy.unwrap() on my columns but when I try the code below I get a ValueError.

angle_data["theta"].apply(np.unwrap)
<Traceback message> 
ValueError: diff requires input that is at least one dimensional

This also occurs if I copy the columns to a Pandas Series and try to use apply(np.unwrap). I can workaround this by doing

angle_data["theta"] = pd.Series(np.unwrap(angle_data["theta"]))

or by using apply on multiple columns at once, but I'd like to know why the apply(np.unwrap) method doesn't work for a Pandas Series.

Apollo42
  • 15
  • 6
  • Instead of using diff, maybe you could use np.arctan2 as in [here](https://stackoverflow.com/questions/1878907/the-smallest-difference-between-2-angles#answer-2007279) (cos and sin both exists in numpy) – tgrandje Dec 08 '20 at 11:41
  • Thanks, I haven't seen that method before, might give it a try, although from the follow up comments on that answer it might be a bit too much of a performance hit compared to the other workarounds I've found. – Apollo42 Dec 08 '20 at 14:01
  • Don't worry too much about performance. I have been using it for similar problems on big datasets (for "mean" angles in fact) and this hasn't been a problem (afaik, as it is straight from numpy it is already faster than most python instructions...) – tgrandje Dec 08 '20 at 20:34

1 Answers1

1

From the doc :

Help on function unwrap in module numpy:

unwrap(p, discont=3.141592653589793, axis=-1)
    ...
    Parameters
    ----------
    p : array_like
        Input array.
    ...

What your traceback is saying is that by using apply, you are iterating over the column, then applying unwrap to each individual value (which goes against the doc about p).

You can see what is happening by using some custom print like this :

def my_print(x):
    print(x)
    print('-'*50)
df['theta'].apply(my_print)

You will see that each value of the column is passed as an argument one after the other. In other terms, you are looping as you would through a list : quite inefficient.

You already found the right way to use unwrap : by applying it straight to the series, which doesn't iterate over it : np.unwrap(df['theta']).

This is the way to use all numpy functions (spoiler alert : huge performances gains are due if you drop the "apply" method).

So as a rule of thumb : stay away of "apply" when you can (an most of the time, you can indeed) and stick to numpy or built-in functions from pandas.

tgrandje
  • 1,820
  • 8
  • 19
  • Just curious, you say that numpy functions don't iterate over array-like objects, how does that work? Surely they still have to iterate over it in some fashion? Is it the way that the arrays are accessed that makes the difference? – Apollo42 Dec 09 '20 at 06:58
  • Not sure about the mechanisms involved. It is called "vectorization" ; I suspect this is linked to 3 factors : clever python coding, a core written in C (faster than python), and matrix operations. Some of it can be read [here](https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html) (for example...) – tgrandje Dec 09 '20 at 20:04