Statistical Pattern Recogmition Andrew R. Webb

[ Pobierz całość w formacie PDF ]
.Normal mixture models 412.3 Normal mixture modelsFinite mixture models have received wide application, being used to model distributionswhere the measurements arise from separate groups, but individual membership is un-known.As methods of density estimation, mixture models are more flexible than thesimple normal-based models of Section 2.2, providing improved discrimination in somecircumstances.Applications of mixture models include (see Section 2.5) textile flaw de-tection (where the data comprise measurements from a background noise and a flaw),waveform classification (where the signal may comprise a sample from a waveform ornoise) and target classification (in which the radar target density is approximated by asum of simple component densities).There are several issues associated with mixture models that are of interest.The mostimportant for density estimation concerns the estimation of the model parameters.Wemay also be interested in how many components are present and whether there are any natural groupings in the data that may be identified.This is the problem of clustering(or unsupervised classification) that we return to in Chapter 10.2.3.1 Maximum likelihood estimation via EMA finite mixture model is a distribution of the formgXp.x/D �j p.x; � /jjD1where g is the number of mixture components, �j �0 are the mixing proportionsPg( �jD1) and p.x; � /;jD1;:::;g, are the component density functions whichjjD1depend on a parameter vector �.There are three sets of parameters to estimate: thejvalues of�j , the components of � and the value of g.The component densities may bejof different parametric forms and are specified using knowledge of the data generationprocess, if available.In the normal mixture model, p.x; � / is the multivariate normaljdistribution, with � Df� ; g.j jjGiven a set of n observations (x1;:::;xn/, the likelihood function isgnYXL./D �j p.xij� / (2.8)jiD1 jD1where denotes the set of parametersf�1;:::;�g; �1;:::;� gand we now denote thegdependence of the component densities on their parameters as p.xj� /.In general, it isjnot possible to solve@L=@ D0 explicitly for the parameters of the model and iterativeschemes must be employed.One approach for maximising the likelihood L./ is touse a general class of iterative procedures known as EM (expectation maximisation)algorithms, introduced in the context of missing data estimation by Dempster et al.(1977), though it had appeared in many forms previously.The basic procedure is as follows.We suppose that we have a set of incomplete datavectorsfxgand we wish to maximise the likelihood L./Dp.fxgj /.Letfygdenote a42 Density estimation parametrictypical complete version offxg, that is, each vector xi is augmented by the missingT T Tvalues so that yi D.xi ;zi /.There may be many possible vectors yi in which we canembed xi , though there may be a natural choice for some problems.In the finite mixturecase, zi is a class indicator vector ziD.z1i;:::;zgi/T , where z D1 if xi belongs tojithe jth component and zero otherwise.General EM procedureLet the likelihood offygbe g.fygj /whose form we know explicitly so that the likeli-hood p.fxgj /is obtained from g.fygj /by integrating over all possiblefygin whichthe setfxgis embedded:ZnYL./Dp.fxgj /D g.xi;zj /dziD1The EM procedure generates a sequence of estimates of ,f.m/g, from an initialestimate.0/ and consists of two steps:41.E-step: Evaluate Q.;.m//DE[log.g.fygj //jfxg;.m/], that is,ZXQ.;.m//D log.g.xi;zij //p.fzgjfxg;.m//dz1:::dznithe expectation of the complete data log-likelihood, conditional on the observed data,fxg, and the current value of the parameters,.m/.2.M-step: Find D.mC1/ that maximises Q.;.m//.Often the solution for theM-step may be obtained in closed form.The likelihoods of interest satisfyLf.mC1/g�Lf.m/gso they are monotonically increasing (see the exercises).An illustration of the EM iterative scheme is shown in Figure 2.1.It is one of a classof iterative majorisation schemes (de Leeuw 1977; de Leeuw and Heiser 1977, 1980) thatfind a local maximum of a function f./by defining an auxiliary function, Q.;.m//,that touches the function f at the point.m/; f.m///and lies everywhere else below it(strictly, iterative majorisation refers to a procedure for finding the minimum of a functionby iteratively defining a majorising function).The auxiliary function is maximised andthe position of the maximum,.mC1/, gives a value of the original function f that isgreater than at the previous iteration.This process is repeated, with a new auxiliaryfunction being defined that touches the curve at.mC1/; f.mC1///, and continuesuntil convergence.The shape as well as the position of the auxiliary function will alsochange as the iteration proceeds.In the case of the EM algorithm, Q.;.m// differsfrom the log-likelihood at by H.;.m//(see Dempster et al., 1977, for details):log.L.//DQ.;.m// H.;.m//Normal mixture models 43log(L( ))(m+1)Q( , ) + hm+1(m)Q( , ) + hm(m) (m+1) (m+2)Figure 2.1 EM illustration: successive maximisation of the function Q.;.m// leads to in-creases in the log-likelihoodwhereH.;.m//DE[log.g.fygj /=p.fxgj //jfxg;.m/]ZX(2.9)D log.p.zijfxg; //p.zijfxg;.m//dz1:::dzniand in Figure 2.1, hmD H.m/;.m//.EM algorithm for mixturesLet us now consider the application of the EM algorithm to mixture distributions.Forfully labelled data, we define the complete data vector y to be the observation augmentedby a class label; that is, yT D [ Pobierz całość w formacie PDF ]

Formularz