What is the purpose of subtracting the mean from data when standardizing? and What is the purpose of dividing by the standard deviation?

11No one likes meanness.. – copper.hat Feb 28 '13 at 19:47

but, I need it! – user63036 Feb 28 '13 at 19:49

8The purpose of subtracting the mean from a dataset is to obtain a dataset whose mean is zero. – Feb 28 '13 at 19:51

6The idea is to allow different data sets to be comparable. Once you compute the mean, you then want to see how the data varies about the mean. Dividing by the standard deviation lets you compare the data distribution with a normal distribution (${\cal N}(0,1)$). – copper.hat Feb 28 '13 at 19:51

1As an example, in manufacturing, many measurements end up being normally distributed (or log normal, or other 'standard' distributions). If you notice that the parameters of data set (mean, $\sigma$, actual distribution) have suddenly changed, then something has probably gone wrong. The mean and $\sigma$ are simple measures that are often sufficient to characterize a lot of data. – copper.hat Feb 28 '13 at 19:57

this is true even if X and Y are of different non normal distributions themselves? still comparable after Z? – Parthiban Rajendran Nov 10 '18 at 16:56

For what it's worth for such an old question & answers: there are of course cases in which it is useful to centre & standardize, for reasons such as those given in the answers; but also cases in which it isn't – you could be throwing away relevant information. Analyse your case and apply good judgement. – pglpm Aug 22 '21 at 07:10
2 Answers
Think of temperature measurements. The numerical values of mean temperature depends on whether we use Fahrenheit or Celsius scale, or some other. It's subject to our arbitrary choice of zero mark on the scale. By subtracting the mean, we remove the influence of that choice. But the choice of unit is still visible in the data because the notion of "$1$ degree change of temperature" is different on different scales. Division by $\sigma$ removes the units: we get a unitless quantity ("$z$score") which is independent of the temperature scale used. (Well, as long as the scale is linear and warmer means higher temperature.) Now it makes sense to compare our data to some standard distribution such as $f(x)=\frac{1}{2\pi}\exp(x^2/2)$ (which is a unitless quantity).
Shorter version: the purpose of subtracting the mean from data when standardizing is to standardize.
Also, what copper.hat said in comments.
 9,206
 30
 104
Another reason is accuracy. When computing the variance, if the mean is large, much accuracy can be lost.
For example, the formula for the variance is $\dfrac{1}{n} \sum_{i=1}^n (x_i\bar x)^2 $ (you can write $\dfrac1{n1}$ instead of $\dfrac1{n}$ if it makes you feel better). If the $x_i$ are all close, even if their mean is large, this will be quite small.
If you write this in the mathematically equivalent form $\left(\dfrac{1}{n} \sum_{i=1}^n x_i^2\right) \left(\dfrac{1}{n} \sum_{i=1}^n x_i \right)^2 $, you will be subtracting two large quantities to get a small quantity. This is the standard recipe for catastrophic cancellation and loss of accuracy.
By the way, if you do a Google search for "online mean and variance", you get a number of useful links including this one from Wikipedia: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance.
 101,285
 9
 66
 160