Although I have read lots of posts about fitting distributions in python, I am still confused about usage floc
and fscale
parameters. For general information I mainly used this, this and this sources.
I know, that given distribution lets say f(x) becomes more general distribution when using loc
and scale
parameters, which can be described by formula
f(x) = f((x-loc)/scale).
In scipy, we have to choice. When fitting a distribution, using formula distr.fit(x)
, the initial guess of loc
parameter is 0 and initial guess of fscale
parameter is 1 (so that we assume that the parametrized distribution is close to nonparametrized distribution). We can also force scipy to fit 'original' distribution f(x) using distr.fit(x, floc = 0, fscale = 1)
.
My question is: is there any general advice when to force scipy to fit 'original distribution' besides the 'parametrized one'?
Here is the example:
# generate some data
from scipy.stats import lognorm, fisk, gamma
from statsmodels.distributions.empirical_distribution import ECDF
import numpy as np
import matplotlib.pyplot as plt
x1 = [18. for i in range(36)]
x2 = [19. for i in range(17)]
x3 = [22. for i in range(44)]
x4 = [27. for i in range(63)]
x5 = [28.2 for i in range(8)]
x6 = [32. for i in range(104)]
x7 = [32.6 for i in range(29)]
x8 = [33. for i in range(85)]
x9 = [33.4 for i in range(27)]
x10 = [34.2 for i in range(49)]
x11 = [36. for i in range(99)]
x12 = [36.2 for i in range(35)]
x13 = [37. for i in range(98)]
x14 = [38. for i in range(25)]
x15 = [38.4 for i in range(39)]
x16 = [39. for i in range(25)]
x17 = [42. for i in range(54)]
# empirical distribution function
xp = x1 + x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16+x17
yp = ECDF(xp)
# fit lognormal distribution with parametrization
pars1 = lognorm.fit(xp)
# fit lognormal distribution with floc = 0
pars2 = lognorm.fit(xp, floc = 0)
#plot the result
X = np.linspace(min(xp), max(xp), 10000)
plt.plot(yp.x, yp.y, 'ro')
plt.plot(X, lognorm.cdf(X, pars1[0], pars1[1], pars1[2]), 'b-')
plt.plot(X, lognorm.cdf(X, pars2[0], pars2[1], pars2[2]), 'g-')
plt.show()
#fit the gamma distribution
pars1 = gamma.fit(xp)
pars2 = gamma.fit(xp, floc = 0)
#plot the result
X = np.linspace(min(xp), max(xp), 10000)
plt.plot(yp.x, yp.y, 'ro')
plt.plot(X, gamma.cdf(X, pars1[0], pars1[1], pars1[2]), 'b-')
plt.plot(X, gamma.cdf(X, pars2[0], pars2[1], pars2[2]), 'g-')
plt.show()
As you can see, the floc = 0
improved a lot the fit in lognorm case, in gamma case it didint change the fit at all.
Sorry for long demontration, here is my question again: Is there any general advice when to specify floc = 0
and fscale = 1
and when to use custome loc = 0
and scale = 1
?