搜索
您的当前位置:首页基于完整似然最短信息长度准则的高斯混合模型聚类

基于完整似然最短信息长度准则的高斯混合模型聚类

来源:爱问旅游网
Journal of Southeast University(English Edition)Vo1.29,No.1,PP.43—47 Mar.2013 ISSN 1003—7985 Gaussian mixt aussian mixture mo elmodel clusteringng with completed likelihood minimum message length criterion Zeng Hong Lu Wei Song Aiguo ( School of Instrument Science and Engineering,Southeast University,Nanjing 210096,China) ( College of Engineering,Nanjing Agricultural University,Nanjing 210031,China) Abstract:An improved Gaussian mixture model(GMM)一 based clustering method is proposed for the difficult case tion criterion(AIC),the integrated likelihood criterion (ILC),etc. where the true distribution of data is against the assumed GMM.First,an improved model selection criterion,the completed likelihood minimum message length criterion,is derived.It can measure both the goodness.Of-fit of the candidate G to the data and the goodness—of-partition of hte data.Secondly,by utilizing the proposed criterion as the clustering objective function, an improved expectation— maxiimzation(EM)algorithm is developed,which cai1 avoid poor local optimal solutions compared to the standard EM lagorithm for estimating the model parameters. e experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative G删一based clustering approaches and can robustly provide more accurate clustering results. Key words: Gaussian mixture model; non—Gaussian idstribution; model selection; expectation-maxiimzation lagorithm;completed likelihood minimum message length cri— terion doi:10.3969/j.issn.1003—7985.2013.O1.009 T hues edG aus ssai abna smis ixfoturr ec lumsotedre l(naaGlyMM)isis .Is nco gmmeneorn1aly.  hte GMM—based clustering involves two problems.One is hte estimation of parameters for the mixture models.The other is the model order selection for determining the number of components.The expectation—maximization (EM1 algorithm is often used to estimate the parameters of the mixture model which fits the observed data.Popu. 1ar model selection criteria in the literature include the Bayesian information criterion r BIC1。Akaike’s informa- Reeeived 2012-07-20. Biography:Zeng Hong(1981——),male,doctor,lecmmr,hzeng@seu. edu.ca. Foundafion items:The National Natural Science Foundafion of China (No.61 1O5o48,60972165)。the Doctoral Fund of Ministry of Educa— tion of China(No.201 10o92120o34),the Natural Science Foundation of Jiangsu Province(No.BK2010240),the Technology Foundation for Selected Overseas Chinese Scholar,Ministry of Human Resources and Socila Security ofChina(No.6722oclo0o8),andtheOpenFund of Jina— gsu Province Key Laboratory for Remote Measuring and Control(No. YCCK2Ol0o51. Citation:Zeng Hong,Lu Wei,Song Aiguo.Gaussian mixture model clustering with completed likelihood minimum message length criterion [J].Joumal of Southeast University(English Edition),2013,29(1): 43—47.【doi:10.3969/j.issn.1o03—7985.2013.01.oo9】 However,most previous studies generally assume the Gaussian components for the observed data in the mixture mode1.If the true model is not in the family of the as— sumed ones,the BIC criterion tends to overestimate the correct model size regardless of hte separation of the com— ponents.In the meantime,because the EM algorithm is a local method,it is prone to falling into poor local optima in such a case,leading to meaningless estimation.In or- der to approximate such a distribution more accurately, hte feature weighted GMM,which explicitly takes the non-Gaussian distribution into account,is adopted in Refs.[3—8】.Nevertheless,the approaches in Refs.[3— 8】assume htat the data features rae independent,which is often not the case for real applications.Based on the min— imum message length(MML)criterion,Ref.[9】pro— posed an improved EM algorithm that can effectively avoid poor local optima.But we find that it still tends to select much more Gaussian components than necessary for iftting the data with uniform distribution,giving obscure evidence for the clustering structure of data. We propose a novel method to address the model selec— tion and parameter estimation problems in the GMM- based clustering method when the true data distribution is against the assumed one.In particular,we derive an im- proved model selection criterion for mixture models with an explicit objective of clustering.Furthermore,wiht the proposed criterion as the cost function,an improved EM algorithm is developed for estimating parameters.Ulti- mately,the proposed method is not only able to rectify hte over--fitting tendency of some representative model se・- lection criteria,but also able to avoid poor local optima of the EM algorihtm. 1 Completed Likelihood of the Gaussian Mixture Model Suppose that a D・-dimensional sample follows a K-com-・ ponent mixture distribution,then the probability density function ofY can be written as P(Y l =∑wk=】 p(Y l Ok) (1) where w is the mixing probability for the k-th mixture Zeng Hong,Lu Wei,and Song Aiguo component with 0≤w ≤1 and∑w :l;o is the inter— nal parameters describing the k-th mixture component.O GMM of the complete data Y can be written as follows: MML( )=一logp( )一log(_y l )+ ={0l….,0 ;wl,…,w )denotes the D dimensional vector describing the complete set of parameters for the 丢-。g f,c( )l+譬(t+1。g ) (7) 一mixture mode1.P(・1 0 )defines the k-th Gaussian den— where logp(Y l )is given in Eq.(5);I ( )= sity.The GMM is typically an incomplete data structure mode1.N independent and identically distributed samples E[02logp( l 0)/o0o0 ]is hte expected Fisher in formation matirx associated with the complete data_y,and of he itncomplete data Y re denotaed as Y={Y1,…YⅣ}, f I (O)l denotes its determinant. By differentiating nd tahe complete data rae Y={Y,Z}:{(Yl,z1),…(YⅣ, logp(Y l )in Eq.(5),I。(D)has a block—diagonal zⅣ)},where he mitssing data are Z={z1,…,zⅣ},with structure J ( )=N block—diag{ II” ( 1),…, z ={z l,…,z }being the binary label vector such that ( ),A}where ( )is the Fisher matrix for a z =1 if and only ifY belongs to the k-th mixture com— ponent and z :0 otherwise.Z is normally unknown, nad it must be inferred from y.The observed log—likeli— hood of for the incomplete data Y is Ⅳ logp(Y  l)=∑ln=1 og∑w k=1 P(Y ) (2) The completed log—likelihood of Y is logp(t"I )=∑∑z.klog(w P(Y l Ok)): H=l k=l 1ogp(Y l )+logp(Z l y,0)= N K N X ∑log∑wkp(y l )+∑∑Znk1ogp n 1 =1 n=1 =1 (3) where P is the conditional probability of Y belonging to the 一th component and can be computed as w P(Y l ) P — ——————一 (4) ∑wjJ;1 p(y l ) In practice,the true parameter O in Eqs.(2)and(3) is replaced using the maximum likelihood(ML)estimate .nad then the completed log—likelihood is rewritten as N K logp(Y I )=∑l=l og∑wkk=1 p(Y +∑∑ ln=l k=1 ogp (5) where ifarg m= xaj pw一 (6) otherwise 2 Clustering with Completed Likelihood Minimum Message Length(CL-MML)Criterion 2.1 Completed likelihood minimum message length cri- terion The MML criterion defines a goodness measure for a model with an inherent bias towards simple models…. Based on the formulation of the MML criterion for a gen— eral density model in Ref.[9],the MML criterion ofr hte single observation produced by the k-th component,and A is the Fisher matrix of a multinomial distribution with lA I=(谛l 2… )_。.Since we have no knowledge about the parameters,we adopt the non—informative Jeffrey’s priors as in Ref.[9],i.e., p( )=p( 一,WK)兀p(1 b ) (8) =where P( )。c 1 研,p( ,…, )pc T After substituting P( )and l j ( )l into Eq.(7)and dropping the constant items,we obtain the explicit form of CL—MML for the GMM of the complete data as follows: CL—MML(K):一log( ̄z l )+(一∑∑ ̄.klogp )+ M∑了K 1。g +了Dk(1+logN) (9) where M is the number of parameters in each component. The first item on the right hand side of Eq.(9)emphasi— zes the goodness.of-fit of the candidate GMM.The third and the fourth items control the complexity of the GMM. Compared to the standard MML for the GMM of incom— plete data in Ref.1 9},CL—MML has an extra non—nega— tive penalty item.i.e..the second item on the fight side of Eq.(9、.This item is essentially a measure of the K- component GMM to provide a relevant partition of the da. ta y.If the mixture components are well separated(i.e.. P .is close to l wiht z .=1),such an item will be close to 0.But if the mixture components are poorly separated. such an item will have a large value,implying that such an unreasonable partition cannot discover the clustering structure of data.By minimizing this item.CL—MML prefers smaller K compared to the MML on the same data set.In other words.CL一^ ^ I is expected to be able to rectify the over—fitting tendency of the MML,favoring mixtures which lead to a clustering result of the data wim the greatest evidence. 2.2 Estimation of GMM parameters For the GMM,each component follows the Gaussian Gaussian mixture model clustering with completed likelihood minimum message length criterion 45 distribution,i.e.,P(Y 1 0t)=G(y 1 , ),where and are the mean and the covariance matrix of the k-th where,={rl,r2,r3,r4}are the parameters of the dis— tribution.1 000 data points are generated using a 5-com— ponent uniform mixture mode1.Its parameters are as fol— lows: Gaussian components.For a fixed model order K,we es- timate the GMM parameters O by an improved EM algo— rihm,with CL—tMML in Eq.(9)as the cost function. The proposed EM algorithm alternatively applies the fol— lowing two steps in the t-th iteration until convergence: E-step:Compute the conditional expectation: wl=0.1,w2=w4=w5=0.2,w3=0.3 p::’: P(Y I ) (1O) ,l={一1.89,4.07,4.89,7.94} ,2={1.11,5.11,2.47,3.53} ,3={5.17,6.53,2.77,5.77 r4={4.31,6.49,6.29,6.71 ,5={5.58,8.42,-0.77,2.23 ∑ “P(Y 1 J=1 M—step:Update the parameters of the GMM by N … p: )一 ) r 11、= K max{。,( N p )一等) ∑p Y 一 N (12 1 ∑p ∑p (j, 一 )(j, 一 —...! . ! . . ...............................。....................... ............................一 一 N (13) ∑p In Eq.(11),∑p can be viewed as the evidence ofr the k-th component from the data points.Then according to Eq.(1 1),when one of the components becomes too weak,namely it is not supported by the data,it will then be driven into extinction.Such a modification to the standard EM can be expected to suppress spurious solu— tions 3 Experiments We present experimental results to illustrate the effec. tiveness of CL.MM_L for GMM.based clustering f denoted as GMM+CL.MML)。compared to that of BIC(deno. ted as GMM+BIC).Ⅳ I (denoted as GMM+ MML).as well as the method utilizing the feature. weighted GMM and the integrated likelihood criterion (FWGMM+ILC)for clustering 』. 3.1 Synthetic data We consider a synthetic 2D data set where data from each cluster follow the uniforlTl random distribution: “ (), ,Y2)= ( ro1ht≤eyrw1≤iser 2;r3≤y2≤r4 The Gaussian components are adopted to fit such a uni. form mixture data set,for which the true distribution models are very different from the assumed ones.The models with the number of components K varying from 1 to ,a number that is considered to be safely lrager htan the true number(i.e.,5),are evaluated. is set to be 30 in mis case.We evaluate these methods by the accuracy in estimating the model order and structure. Tab.1 illustrates the number of times that each order is selected over 50 trials.Fig.1 shows typical clustering re一 sults by these four methods. It can be observed that for such a data set.the GMM+ BIC approach not only fails to yield a good estimation of model order(see Tab.1),but also leads to a meaning— less mixture model by the stnadard EM(see Fig.1(a)). AIthough the MML criterion generates a GMM which fits the data wel1.it suffers from severe over.fitting as shown in Fig.1(b)and Tab.1.Since the features are assumed t0 be independent in FWGMM.it also tends to select more components in order to approximate the distribution of data accurately(see Fig.1(c)and Tab.1). Tab.1 Number of times for selected model orders over 50 trials on synthetic data In contrast.due to the introduction of an extra penalty to the MML criterion.the proposed CL.MML criterion. based GMM clustering favors much fewer but more “powerful”components which successfully detect the clusters.The clustering result in a typical successful tria1 ofCL—MML is shown in Fig.1(d). Zeng Hong,Lu Wei,and Song Aiguo are summarized in Tab.2.For each data set.we random— ly split the data 50 times into training and test sets.Train— ing sets are created from 50%of the overall data points. We d0 not use any label in the training stage.K is still set to be 30.After model learning.we label each compo— nent by majority vote using the class labels provided for the test data,and we measure the test set classification ac- curacy as the matching degree between such obtained la- bels and the original true labels.The means and the standard deviations of the classification accuracy,as well as the number of components for each data set.over 50 trials are summarized in Tab.2.The best results are 9 8 7 marked in bold. Tab.2 Comparison of different clustering approaches on real data sets 6 5 4 3 2 1 0 —1 ),l (b) 9 8 7 6 5 4 3 2 1 0 —1 4 —2 一O 2 y1 4 6 8 10 Several trends are apparent.First,the numbers of com— (c) ponents determined by the proposed method are generally less than those by the compared counterparts.This may be due to the reason that the distribution of a real data set often does not strictly follow the Gaussian mixture mod— e1.and most GMM—based clustering approaches tend to generate more components than necessary in order to bet— ter fit the data.However.it is found that the CL—MML can rectify the over—fitting tendency of the compared methods under such circumstances.This Can be explained by the reason that it takes the separation among compo’ nents into account.Secondly,the proposed method yields yl (d) Fig.1 Typical clustering results of different methods on the the most accurate results among all the approaches on these four data sets.This justiifes that the proposed ap— proach can estimate the GMM parameters more properly than the compared ones. synthetic data.(a)GMM+BIC;(b)GMM+MIML;(c)FWGMM+ ILC;(d)GMM十CL-MML we also measure performance on four real-world data Insets from the UCIrepo heThenumber of lasses,t.M……t : 。by  takeivn皿.g th e cap abi  ity川of ht e cnad id ate tvv x ̄number of samples and the dimensi0nality of each data se , .oved GMM—b ased cluste annroach i n ialr,u ̄Ded mprthe difficulfscenario where th。tri— etmedevelo,- ’Gaussian mixture model clustering with completed likelihood minimum message length criterion 47 bution of data iS against the assumed GMM.The expefi— mental results show that the proposed method is not only [5]Markley S C,Miller D J.Joint parsimonious modeling and model order selection for multivariate Gaussian mix— able to rectify the over—fitting tendency of the compared methods for performing the model selection,but alSO able to obtmn higher clustering accuracy compared to the exist— tures[J].IEEE Journal of Selected Topics in Signal Processing,2010,4(3):548—559. 『6]Li Y,Dong M,Hua J.Localized feature selection for ing methods. clustering[J].Pattern Recognition Letters,2008,29 (1):10—18. [7]AlliH M S,Ziou D,Bouguila N,et a1.Image and video segmentation by combining unsupervised generalized [1]Zeng H,Cheung Y M.Feature selection and kernel learning for local learning based clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelli— Gaussin miaxture modeling and feature selection[J]. IEEE Transactions on Circuits and Systems for Video gence,2011,33(8):1532—1547. Technology,2010,20(10):1373—1377. [8]Fan W,Bouguila N,Ziou D,Unsupervised hybrid fea— ure exVacttion selection for high-・imensionald non--Gaussi・・ [2]Jlan A K.Data clusteirng:50 years beyond K-means [J].Pattern Recognition Letters,2010,31(8):651— 666. an data clustering with variational inference[J].IEEE Transactions on Knowledge and Data Engineering,2012, in press. [3]Bouguila N,Almakadmeh K,Boutemedjet S.A finite mixture model for simultaneous high・-dimensional cluste-- irng,localized feature selection nd aoutlier r ̄ection[J]. Expert Systems with Applications,2012,39(7):6641— 6656. [9]Figueiredo M A F,Jain A K.Unsupervised learning of ifnite mixture models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(3):381— 396. [4]Law M H C,Figueiredo M A T,Jln aA K.Simultneousa [10]Wallace C S,Dowe D L.MML clustering of multi—state, feature selection and clustering using mixture models Poisson,von Mises circular and Gaussian distributions [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(9):1154—1166. [J].Statistics and Computing,2000,10(1):73—83. 基于完整似然最短信息长度准则的高斯混合模型聚类 曾 洪 卢 伟 宋爱国 (‘东南大学仪器科学与工程学院,南京210096) ( 南京农业大学工学院,南京210031) 摘要:针对数据真实的概率分布不符合事先假设的高斯混合模型的情形,提出了一种鲁棒的基于高斯混合 模型的聚类方法.首先,提出了一种新的模型选择准则,即完整似然最短信息长度准则.该准则不仅能衡量 模型对数据的拟合优度,还能度量该模型对数据分组的性能.然后,将该准则作为聚类的代价函数,提出了 一种新的期望最大化算法来估计模型参数.与标准的期望最大化算法相比,新算法能较好地避免不理想的 局部最优解.实验结果表明:当数据概率分布模型不符合假设的高斯混合模型时,所提方法可克服现有的基 于高斯混合模型聚类方法过拟合的缺点,鲁棒地得到准确的聚类结果. 关键词:高斯混合模型;非高斯分布;模型选择;期望最大化算法;完整似然最短信息长度准则 中图分类号:TPI81 

因篇幅问题不能全部显示,请点此查看更多更全内容

Top