I'd like to know how I can recursively (iteratively) compute variance, so that I may calculate the standard deviation of a very large dataset in javascript. The input is a sorted array of positive integers.

You can estimate the variance by randomly sampling the dataset. See [Wikipedia](http://en.wikipedia.org/wiki/Variance#Estimating_the_variance) for details. – vadim123 Apr 28 '13 at 03:56

Unless the data are being made available to you one at a time, recursive methods for computing the variance usually require more computation than straightforward calculatiom. Since the data set is large, one suggestion is to calculate the sum and the sum of squares simultaneously so that only one pass through the array is needed rather than two (as in compute $\sum_i x_i$ and divide by $n$ to get $\bar{x}$. Then compute $\sum_i (x_i\bar{x})^2$). – Dilip Sarwate Apr 28 '13 at 04:03

The values in the data set are too large to compute the sum of all values at once. – Rose Perrone Apr 28 '13 at 04:25
4 Answers
Recall that, for every $n\geqslant1$, $$ \bar x_n=\frac1n\sum_{k=1}^nx_k, $$ and $$ \bar\sigma^2_n=\frac1n\sum_{k=1}^n(x_k\bar x_n)^2=\frac1n\sum_{k=1}^nx_k^2(\bar x_n)^2. $$ Hence simple algebraic manipulations starting from the identities $$ (n+1)\bar x_{n+1}=n\bar x_n+x_{n+1}, $$ and $$ (n+1)(\bar\sigma^2_{n+1}+(\bar x_{n+1})^2)=n(\bar\sigma^2_n+(\bar x_n)^2)+x_{n+1}^2, $$ lead to $$ \bar x_{n+1}=\bar x_n+\frac{x_{n+1}\bar x_n}{n+1}, $$ and $$ \bar\sigma^2_{n+1}=\bar\sigma^2_n+(\bar x_n)^2(\bar x_{n+1})^2+\frac{x_{n+1}^2\bar\sigma^2_n(\bar x_n)^2}{n+1}. $$ Thus, $(n,\bar x_n,x_{n+1})$ yield $\bar x_{n+1}$ and $(n,\bar\sigma^2_n,\bar x_n,\bar x_{n+1},x_{n+1})$ yield $\bar\sigma^2_{n+1}$.
 271,033
 27
 280
 538

2@hackape: The answer is correct. It differs from Schatzi's answer because Schatzi incorrectly treated the data as a sample. The question doesn't talk about a sample; it's not about estimating the variance of a population from a sample, but about calculating the variance of the data. – joriki Jan 06 '20 at 09:24


@joriki: I couldn't match the final answer of Did with that of DolphinDream. I think $\bar x_{n+1}$ should be replaced with $x_{n+1}$ in the last equation. – Hosein Rahnama Jan 13 '22 at 12:25
There are two problems in the preceding answer, the first being the formula for the variance is incorrect(see the formula below for the correct version) and the second is that the formula for the recursion ends up subtracting large, nearly equal, numbers.
The definition for unbiased estimates of mean($\bar x$) and variance($\sigma^2$) for a sample of size n are: $$ \bar x_n=\frac1n\sum_{k=1}^nx_k, $$ and $$ \bar\sigma^2_n=\frac1{n1}\sum_{k=1}^n(x_k\bar x_n)^2 $$
Define the recursion variables
$$ M_n = n \bar x_n=\sum_{k=1}^nx_k, $$ and $$ S_n = (n1)\bar\sigma^2_n=\sum_{k=1}^n(x_k\bar x_n)^2 $$
The recursion relation for $M_{n+1}$ is obvious $$ M_{n+1} = M_n + x_{n+1} $$ and the recursion relation for $S_n$ is obtained via $$ S_{n+1} = (x_{n+1}\bar x_{n+1})^2+\sum_{k=1}^n(x_k\bar x_{n+1})^2\phantom{XXXXXX}\\ \phantom{S_{n+1}} = (x_{n+1}\bar x_{n+1})^2+\sum_{k=1}^n(x_k\bar x_n+\bar x_n\bar x_{n+1})^2\\ \phantom{S_{n+1}XXXXXXXXXX} = (x_{n+1}\bar x_{n+1})^2+\sum_{k=1}^n(x_k\bar x_n)^2+2(\bar x_n\bar x_{n+1})\sum_{k=1}^n(x_n\bar x_n) + \sum_{k=1}^n(\bar x_n \bar x_{n+1})^2\\ $$ And since $$S_n = \sum_{k=1}^n(x_k\bar x_n)^2$$ $$\sum_{k=1}^n(\bar x_n\bar x_{n+1})^2 = n(\bar x_n\bar x_{n+1})^2$$ and $$\sum_{k=1}^n(x_k\bar x_n) = 0$$ this simplifies to $$ S_{n+1} = S_n+(x_{n+1}\bar x_{n+1})^2 +n(\bar x_n \bar x_{n+1})^2 $$
Now, this recursion relation has the nice property that it $S_n$ a sum of squared terms, and thus cannot be negative. Written in terms of $M_n$ and $S_n$, the recursion relations are: $$ M_{n+1} = M_n + x_{n+1} $$ $$ S_{n+1} = S_n+\left(x_{n+1}\frac{M_n+x_{n+1}}{n+1}\right)^2 +n\left(\frac{M_n}{n} \frac{M_n+x_{n+1}}{n+1}\right)^2 $$ and we can further simplify the recursion relation for $S_n$ to \begin{eqnarray} S_{n+1} &= S_n + \left(\frac{n x_{n+1}M_n}{n+1}\right)^2+n\left(\frac{M_nn x_{n+1}}{n(n+1)}\right)^2\\ &=S_n+ \left(1+\frac1n\right)\left(\frac{n x_{n+1}M_n}{n+1}\right)^2\\ &=S_n+ \frac{(n x_{n+1}M_n)^2}{n(n+1)} \end{eqnarray}
So we have the simple recursion relations: $$ M_{n+1} = M_n + x_{n+1} $$ $$ S_{n+1} = S_n + \frac{(n x_{n+1}M_n)^2}{n(n+1)}$$
with the mean given by $$\bar x_n = \frac1n M_n$$ and the unbiased estimate of the variance is given by $$\sigma_n^2 = \frac1{n+1}S_n$$.
 69
 1
 4

2To the contrary, the formula for the variance in Did's answer is correct and yours is incorrect. You're treating the data as a sample, but the question doesn't talk about a sample; it's not about estimating the variance of a population from a sample, but about calculating the variance of the data. – joriki Jan 06 '20 at 09:25
Here are the iterative formulas (with derivations) for the population ($N$ normalized) and sample ($N1$ normalized) standard deviations, which express the $\sigma_{n+1}$ ($s_{n+1}$ for sample) for the $n+1$ value set in terms of $\sigma_{n}$ ($s_{n}$ for sample), $\bar x_{n}$ of the $n$ value set plus the new value $x_{n+1}$ added to the set.
Essentially we need to find:
$$\bar x_{n+1} = f(n, \bar x_n, x_{n+1})$$ and $$\sigma_{n+1} = g(n, \sigma_n, \bar x_n, x_{n+1})$$
Derivation for the Average
For both cases, the average for $n\geqslant1$ is,
for $n$ values: $$ \bar x_n=\frac1n\sum_{k=1}^nx_k $$
for $n+1$ values: $$ \bar x_{n+1}=\frac1{n+1}\sum_{k=1}^{n+1}x_k = \frac1{n+1}(n\bar x_n + x_{n+1}) \leftarrow f(n, \bar x_n, x_{n+1}) $$
Derivation for the Standard Deviation
The standard deviation formulas for population and sample are:
\begin{aligned} \sigma_{n} &= \sqrt {\frac1{n} \sum_{k=1}^{n}(x_k  \bar x_{n})^2 } && \textit{for} \textbf{ population } \textit{Standard Deviation}\\ \\ s_{n} &= \sqrt {\frac1{n1} \sum_{k=1}^{n}(x_k  \bar x_{n})^2 } && \textit{for} \textbf{ sample } \textit{Standard Deviation } \\ \end{aligned}
To consolidate the derivations for both population and sample formulas we'll write the standard deviation using a generic factor $\alpha_{n}$ and replace it at the end to get the population and sample formulas.
with:
\begin{equation} \alpha_{n} = \begin{cases} n & \textit{for} \textbf{ population } \textit{Standard Deviation} \\ n1 & \textit{for} \textbf{ sample } \textit{Standard Deviation } \\ \end{cases} \end{equation}
the equation for the standard deviation for the $n$ values can be written as:
\begin{equation} \tag{1} \begin{aligned} \alpha_{n}\sigma^2_{n} & = \sum_{k=1}^{n}(x_k  \bar x_{n})^2 \\ & = \sum_{k=1}^{n}\big(x_k^2  2 x_k \bar x_n + (\bar x_{n})^2\big) \\ & = \sum_{k=1}^{n}x_k^2  2 \bar x_n \sum_{k=1}^{n}x_{k} + (\bar x_{n})^2\sum_{k=1}^{n}1 \\ & = \sum_{k=1}^{n}x_k^2  2 \bar x_n n \bar x_n + n (\bar x_{n})^2 \\ & = \sum_{k=1}^{n}x_k^2  n (\bar x_{n})^2 \end{aligned} \end{equation}
Thus, the same equation for $n+1$ values is:
\begin{equation} \begin{aligned} \alpha_{n+1}\sigma^2_{n+1} & = \sum_{k=1}^{n+1}(x_k\bar x_{n+1})^2 \\ & = \sum_{k=1}^{n+1}x_k^2  (n+1)(\bar x_{n+1})^2 \\ & = \sum_{k=1}^{n}x_k^2 + (x_{n+1})^2  (n+1)(\bar x_{n+1})^2 \\ & = \sum_{k=1}^{n}x_k^2 + (x_{n+1})^2  (n+1) \big(\frac1{n+1}(n\bar x_{n} + x_{n+1}) \big)^2 \\ & = \sum_{k=1}^{n}x_k^2 + (x_{n+1})^2  \frac1{n+1} \big(n^2(\bar x_{n})^2 + 2 n \bar x_{n} x_{n+1} + (x_{n+1})^2 \big) \\ \end{aligned} \end{equation}
from the equation $(1)$ we substitute $\sum_{k=1}^{n}x_k^2$ with $\alpha_{n}\sigma^2_{n} + n (\bar x_{n})^2$ and get:
\begin{equation} \begin{aligned} \alpha_{n+1}\sigma^2_{n+1} & = \alpha_{n}\sigma^2_{n} + n (\bar x_{n})^2 + (x_{n+1})^2  \frac1{n+1} \big(n^2(\bar x_{n})^2 + 2 n \bar x_{n} x_{n+1} + (x_{n+1})^2 \big) \\ \end{aligned} \end{equation}
arranging the terms and simplifying we get:
$$ \sigma_{n+1} = \sqrt { \Big( \sigma^2_{n} + \frac{n}{n+1} \frac1{\alpha_n} (\bar x_n  x_{n+1})^2 \Big) \frac{\alpha_{n}}{\alpha_{n+1}} } \leftarrow g(n, \sigma_n, \bar x_n, x_{n+1}) $$
Replacing the $\alpha$ values, the specific iterative formulas for population and sample standard deviations are:
\begin{equation} \begin{aligned} \sigma_{n+1} &= \sqrt{ \Big( \sigma^2_{n} + \frac{1}{n+1}(\bar x_n  x_{n+1})^2 \Big) \frac{n}{n+1} } &&\textit{population STD} \\ \\ s_{n+1} &= \sqrt{ \Big( s^2_{n} + \frac{n}{n^21}(\bar x_n  x_{n+1})^2 \Big) \frac{n1}{n} } &&\textit{sample STD} \\ \end{aligned} \end{equation}
 173
 6

1(+1) Very nice and elaborate answer. It would be nice if you also include the derivation of Eq. $(1)$ as most readers will find it nontrivial. – Hosein Rahnama Jan 13 '22 at 11:56

1

I ended up using this incremental approach:
function mean(array) {
var i = 1, j = 0, n = array.length, m = 0;
while (++i < n) if (a = array[i]) m += (a  m) / ++j;
return j ? m : undefined;
}
function variance(array, mean_value) {
if (!mean_value) return undefined;
var i = 1, j = 0, n = array.length, v = 0;
while (++i < n) {
a = Math.pow((array[i]  mean_value), 2)
v += (a  v) / ++j;
}
return v * (n/(n1));
}
 365
 1
 2
 12