MATLAB: How to choose initial component parameters with gmdistribution.fit

componentgmdistribution.fitgmminitialk-meansparameters

I am training a Gaussian Mixture Model through gmdistribution.fit. However, as far as I am concerned my initial component parameters are random values. Well I would like to change this and instead of random values to initialize my component parameters through K-Means method. Any idea on how to solve this?
The code for training my model is :
options = statset('MaxIter',500,'Display','final');
models(BrandId).gmm = gmdistribution.fit(data',3,'CovType',...
'diagonal','Options',options);

Best Answer

  • Once you have clustered your data via the k-means algorithm, you can definitely use the cluster centers as initial conditions for your Gaussian mixture clustering. The trick is that the initial condition inputs to the gmdistribution.fit functions must be in the proper form (a structure). More information on the function can be found in the documentation, here:
    The other trick here is that the Gaussian mixture clustering routine requires three initial conditions. The initial cluster means (which you are providing from k-means), the initial cluster covariances (you can randomly initialize this), and the initial cluster weights (same as the initial covariances).
    To help get you started, here is some example code:
    % Arbitrary 1-d data vector
    dataLength = 5000;
    muData = [5 30];
    stdData = [4 10];
    dataVec = [muData(1) + stdData(1)*randn(dataLength/2,1); ...
    muData(2) + stdData(2)*randn(dataLength/2,1)];
    % K-means to initially cluster data
    % The second output of the k-means function are the cluster center values
    numberOfClusters = 2;
    [~,kMeansClusters] = kmeans(dataVec,numberOfClusters);
    % Fit GMM using the k-means centers as the initial conditions
    % We only have mean initial conditions from the k-means algorithm, so we
    % can specify some arbitrary initial variance and mixture weights.
    gmInitialVariance = 0.1;
    initialSigma = cat(3,gmInitialVariance,gmInitialVariance);
    % Initial weights are set at 50%
    initialWeights = [0.5 0.5];
    % Initial condition structure for the gmdistribution.fit function
    S.mu = kMeansClusters;
    S.Sigma = initialSigma;
    S.PComponents = initialWeights;
    gmmOfData = gmdistribution.fit(dataVec,numberOfClusters,'Start',S);
    Hope this helps and good luck!