Two-sample Kolmogorov-Smirnov Test in Python Scipy

Question

I can't figure out how to do a Two-sample KS test in Scipy.

After reading the documentation scipy kstest

I can see how to test where a distribution is identical to standard normal distribution

from scipy.stats import kstest
import numpy as np

x = np.random.normal(0,1,1000)
test_stat = kstest(x, 'norm')
#>>> test_stat
#(0.021080234718821145, 0.76584491300591395)

Which means that at p-value of 0.76 we can not reject the null hypothesis that the two distributions are identical.

However, I want to compare two distributions and see if I can reject the null hypothesis that they are identical, something like:

from scipy.stats import kstest
import numpy as np

x = np.random.normal(0,1,1000)
z = np.random.normal(1.1,0.9, 1000)

and test whether x and z are identical

I tried the naive:

test_stat = kstest(x, z)

and got the following error:

TypeError: 'numpy.ndarray' object is not callable

Is there a way to do a two-sample KS test in Python? If so, how should I do it?

Thank You in Advance

Could you post the line and traceback? – cval Jun 04 '12 at 16:27 — cval, Jun 04 '12 at 16:27

score 132 · Accepted Answer · edited Jul 23 '19 at 12:13

132

You are using the one-sample KS test. You probably want the two-sample test ks_2samp:

>>> from scipy.stats import ks_2samp
>>> import numpy as np
>>> 
>>> np.random.seed(12345678)
>>> x = np.random.normal(0, 1, 1000)
>>> y = np.random.normal(0, 1, 1000)
>>> z = np.random.normal(1.1, 0.9, 1000)
>>> 
>>> ks_2samp(x, y)
Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647)
>>> ks_2samp(x, z)
Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77)

Results can be interpreted as following:

You can either compare the statistic value given by python to the KS-test critical value table according to your sample size. When statistic value is higher than the critical value, the two distributions are different.
Or you can compare the p-value to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.

edited Jul 23 '19 at 12:13

Toby Speight

23,550
47
57
84

answered Jun 04 '12 at 16:32

DSM

291,791
56
521
443

1

That's exactly what I was looking for. Thank You Very Much! – Akavall Jun 04 '12 at 16:35
2

How do you interpret these results? Can you say the samples come from the same distribution just by looking at `statistic` and `p-value`? – FaCoffee Feb 24 '17 at 10:40
4

@FaCoffee This is what the scipy docs say: "_If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same._" – user2738815 Mar 18 '17 at 08:29

score 6 · Answer 2 · edited May 02 '17 at 09:24

6

This is what the scipy docs say:

If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.

Cannot reject doesn't mean we confirm.

edited May 02 '17 at 09:24

piet.t

11,035
20
40
49

answered May 02 '17 at 07:55

jun 小嘴兔

61
1
2

could you explain your answer in further detail? thanks in advance! – King Reload May 02 '17 at 08:19
@KingReload It means when the *p* value is very small, that says the probability of these two samples *Not* coming from the same distribution is very low. In another word, the probability of these two sample coming from same distribution is very high. But you can not be 100% sure about that hence *p* values are never zero. (Sometimes they show as 0, but actually, it's never zero). That's why it is said that *We failed to reject the null hypothesis* instead of *We are accepting the null hypothesis*. Accepting null hypothesis = *distributions of the two samples are the same* – MD Abid Hasan Feb 14 '18 at 22:26
3

p-value high very likely they come from the same distribution, p-value small likely they don't. @MDAbidHasan has it backwards. Indeed, the example in the documentation they give an example: ```For an identical distribution, we cannot reject the null hypothesis since the p-value is high, 41%: >>> >>> rvs4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0) >>> stats.ks_2samp(rvs1, rvs4) (0.07999999999999996, 0.41126949729859719)``` – superhero Feb 23 '18 at 17:35

Two-sample Kolmogorov-Smirnov Test in Python Scipy

2 Answers2

Linked

Related