5

I want to use sklearn.mixture.GaussianMixture to store a gaussian mixture model so that I can later use it to generate samples or a value at a sample point using score_samples method. Here is an example where the components have the following weight, mean and covariances

import numpy as np
weights = np.array([0.6322941277066596, 0.3677058722933399])
mu = np.array([[0.9148052872961359, 1.9792961751316835], 
               [-1.0917396392992502, -0.9304220945910037]])
sigma = np.array([[[2.267889129267119, 0.6553245618368836], 
                        [0.6553245618368835, 0.6571014653342457]], 
                       [[0.9516607767206848, -0.7445831474157608], 
                        [-0.7445831474157608, 1.006599716443763]]])

Then I initialised the mixture as follow

from sklearn import mixture
gmix = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmix.weights_ = weights   # mixture weights (n_components,) 
gmix.means_ = mu          # mixture means (n_components, 2) 
gmix.covariances_ = sigma  # mixture cov (n_components, 2, 2) 

Finally I tried to generate a sample based on the parameters which resulted in an error:

x = gmix.sample(1000)
NotFittedError: This GaussianMixture instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

As I understand GaussianMixture is intended to fit a sample using a mixture of Gaussian but is there a way to provide it with the final values and continue from there?

hashmuke
  • 2,285
  • 2
  • 15
  • 27
  • First you need to feed your data into the model to train it, then only it can generate random samples. See the [documentation of sample()](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.sample) – Vivek Kumar Feb 22 '17 at 14:42
  • I don't have an initial data what I have is each component's parameters. I am looking for a workaround or an alternative python library. – hashmuke Feb 22 '17 at 14:49

3 Answers3

3

It seems that it has a check that makes sure that the model has been trained. You could trick it by training the GMM on a very small data set before setting the parameters. Like this:

gmix = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmix.fit(rand(10, 2))  # Now it thinks it is trained
gmix.weights_ = weights   # mixture weights (n_components,) 
gmix.means_ = mu          # mixture means (n_components, 2) 
gmix.covariances_ = sigma  # mixture cov (n_components, 2, 2)
x = gmix.sample(1000)  # Should work now
J. P. Petersen
  • 4,368
  • 3
  • 28
  • 29
3

You rock, J.P.Petersen! After seeing your answer I compared the change introduced by using fit method. It seems the initial instantiation does not create all the attributes of gmix. Specifically it is missing the following attributes,

covariances_
means_
weights_
converged_
lower_bound_
n_iter_
precisions_
precisions_cholesky_

The first three are introduced when the given inputs are assigned. Among the rest, for my application the only attribute that I need is precisions_cholesky_ which is cholesky decomposition of the inverse covarinace matrices. As a minimum requirement I added it as follow,

gmix.precisions_cholesky_ = np.linalg.cholesky(np.linalg.inv(sigma)).transpose((0, 2, 1))
hashmuke
  • 2,285
  • 2
  • 15
  • 27
2

To understand what is happening, what GaussianMixture first checks that it has been fitted:

self._check_is_fitted()

Which triggers the following check:

def _check_is_fitted(self):
    check_is_fitted(self, ['weights_', 'means_', 'precisions_cholesky_'])

And finally the last function call:

def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):

which only checks that the classifier already has the attributes.


So in short, the only thing you have missing to have it working (without having to fit it) is to set precisions_cholesky_ attribute:

gmix.precisions_cholesky_ = 0

should do the trick (can't try it so not 100% sure :P)

However, if you want to play safe and have a consistent solution in case scikit-learn updates its contrains, the solution of @J.P.Petersen is probably the best way to go.

Imanol Luengo
  • 13,290
  • 1
  • 40
  • 62
  • Ya that explains something, I initial tied to assign `gmix.precisions_cholesky_ = None`, with that I was able to generate samples. However this will not work if you are calling `score_samples`, which expects the value to be a numpy array with dimension similar to the covariance. – hashmuke Feb 22 '17 at 19:19