14

When trying to calculate the exponential moving average (EMA) from financial data in a dataframe it seems that Pandas' ewm approach is incorrect.

The basics are well explained in the following link: http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages

When going to Pandas explanation, the approach taken is as follows (using the "adjust" parameter as False):

   weighted_average[0] = arg[0];
   weighted_average[i] = (1-alpha) * weighted_average[i-1] + alpha * arg[i]

This in my view is incorrect. The "arg" should be (for example) the closing values, however, arg[0] is the first average (i.e. the simple average of the first series of data of the length of the period selected), but NOT the first closing value. arg[0] and arg[i] can therefore never be from the same data. Using the "min_periods" parameter does not seem to resolve this.

Can anyone explain me how (or if) Pandas can be used to properly calculate the EMA of data?

jeronimo
  • 171
  • 1
  • 1
  • 5
  • 2
    Related github issue: https://github.com/pydata/pandas/issues/13638 – naught101 Aug 16 '16 at 02:20
  • [pandas issue 13638 is still open](https://github.com/pydata/pandas/issues/13638), you can follow, upvote and contribute code if you want to see it implemented. – smci Jan 18 '19 at 21:09

4 Answers4

14

There are several ways to initialize an exponential moving average, so I wouldn't say pandas is doing it wrong, just different.

Here would be a way to calculate it like you want:

In [20]: s.head()
Out[20]: 
0    22.27
1    22.19
2    22.08
3    22.17
4    22.18
Name: Price, dtype: float64

In [21]: span = 10

In [22]: sma = s.rolling(window=span, min_periods=span).mean()[:span]

In [24]: rest = s[span:]

In [25]: pd.concat([sma, rest]).ewm(span=span, adjust=False).mean()
Out[25]: 
0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
5           NaN
6           NaN
7           NaN
8           NaN
9     22.221000
10    22.208091
11    22.241165
12    22.266408
13    22.328879
14    22.516356
15    22.795200
16    22.968800
17    23.125382
18    23.275312
19    23.339801
20    23.427110
21    23.507635
22    23.533520
23    23.471062
24    23.403596
25    23.390215
26    23.261085
27    23.231797
28    23.080561
29    22.915004
Name: Price, dtype: float64
chrisb
  • 39,034
  • 8
  • 55
  • 56
  • For a reason I'm not really able to explain, I needed a slightly different form of this line of code (@chrisb): `pd.concat([sma, rest]).ewm(alpha=1/span, adjust=False).mean()` With this, it did what I expected and took the previous average, multiplied it by (span - 1), added the new value, and divided the total by span. – Julian7 Aug 17 '19 at 19:18
  • using ewm is giving me an TypeError: unsupported operand type(s) for /: 'EWM' and 'EWM' , why is this ? how to solve it ? I'm using a windows system with spyder as ide with anaconda. my pandas version is 1.0.5 – Aswin Babu Jun 28 '20 at 19:07
10

You can compute EWMA using alpha or coefficient (span) in Pandas ewm function.

Formula for using alpha: (1 - alpha) * previous_val + alpha * current_val where alpha = 1 / period

Formula for using coeff: ((current_val - previous_val) * coeff) + previous_val where coeff = 2 / (period + 1)

Here is how you can use Pandas for computing above formulas:

con = pd.concat([df[:period][base].rolling(window=period).mean(), df[period:][base]])

if (alpha == True):
    df[target] = con.ewm(alpha=1 / period, adjust=False).mean()
else:
    df[target] = con.ewm(span=period, adjust=False).mean()
arkochhar
  • 249
  • 4
  • 10
  • 1
    I don't understand what "base" is in your code, you probably don't need it at all. Also, would be safer to covert period to float, in particular for python 2. Otherwise good answer – FLab Oct 02 '17 at 16:00
  • 2
    Apologies for being ambiguous. `base` is the base column in DataFrame on which you want to compute EWMA. – arkochhar Oct 03 '17 at 17:34
  • small correction df[target] = con.ewm(alpha=1.0 / period, adjust=False).mean() – AbhijitG Dec 27 '17 at 12:11
3

Here's an example of how Pandas calculates both adjusted and non-adjusted ewm:

name = 'closing'
series = pd.Series([1, 2, 3, 5, 8, 13, 21, 34], name=name).to_frame()
period = 4
alpha = 2/(1+period)

series[name+'_ewma'] = np.nan
series.loc[0, name+'_ewma'] = series[name].iloc[0]

series[name+'_ewma_adjust'] = np.nan
series.loc[0, name+'_ewma_adjust'] = series[name].iloc[0]

for i in range(1, len(series)):
    series.loc[i, name+'_ewma'] = (1-alpha) * series.loc[i-1, name+'_ewma'] + alpha * series.loc[i, name]

    ajusted_weights = np.array([(1-alpha)**(i-t) for t in range(i+1)])
    series.loc[i, name+'_ewma_adjust'] = np.sum(series.iloc[0:i+1][name].values * ajusted_weights) / ajusted_weights.sum()

print(series)
print("diff adjusted=False -> ", np.sum(series[name+'_ewma'] - series[name].ewm(span=period, adjust=False).mean()))
print("diff adjusted=True -> ", np.sum(series[name+'_ewma_adjust'] - series[name].ewm(span=period, adjust=True).mean()))

Mathematical formula can be found at https://github.com/pandas-dev/pandas/issues/8861

Ben
  • 511
  • 5
  • 6
2

If you are calculating ewm of ewm (Like MACD formula), you will have bad results because the second and following ewm will use index starting by 0 and ending with period. I use the following solution.

sma = df['Close'].rolling(period, min_periods=period).mean()
#this variable is used to shift index by non null start minus period
idx_start = sma.isna().sum() + 1 - period
idx_end = idx_start + period
sma = sma[idx_start: idx_end]
rest = df[item][idx_end:]
ema = pd.concat([sma, rest]).ewm(span=period, adjust=False).mean()