机器学习十大算法：EM

gy19841213
2 ℃
2020-07-05

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

Chapter5EMGeoffreyJ.McLachlanandShu-KayNgContents5.1Introduction............................................................935.2AlgorithmDescription..................................................955.3SoftwareImplementation................................................965.4IllustrativeExamples....................................................975.4.1Example5.1:MultivariateNormalMixtures......................975.4.2Example5.2:MixturesofFactorAnalyzers......................1005.5AdvancedTopics.......................................................1035.6Exercises..............................................................105References..................................................................113AbstractTheexpectation-maximization(EM)algorithmisabroadlyapplicableapproachtotheiterativecomputationofmaximumlikelihood(ML)estimates,usefulinavarietyofincomplete-dataproblems.Inparticular,theEMalgorithmsimpliﬁesconsiderablytheproblemofﬁttingﬁnitemixturemodelsbyML,wheremixturemodelsareusedtomodelheterogeneityinclusteranalysisandpatternrecognitioncontexts.TheEMalgorithmhasanumberofappealingproperties,includingitsnumericalstability,simplicityofimplementation,andreliableglobalconvergence.TherearealsoextensionsoftheEMalgorithmtotacklecomplexproblemsinvariousdataminingapplications.Itis,however,highlydesirableifitssimplicityandstabilitycanbepreserved.5.1IntroductionTheexpectation-maximization(EM)algorithmhasbeenofconsiderableinterestinrecentyearsinthedevelopmentofalgorithmsinvariousapplicationareassuchasdatamining,machinelearning,andpatternrecognition[20,27,28].TheseminalpaperofDempsteretal.[8]ontheEMalgorithmgreatlystimulatedinterestintheuseofﬁnitemixturedistributionstomodelheterogeneousdata.Thisisbecausetheﬁttingof93©2009byTaylor&FrancisGroup,LLC94EMmixturemodelsbymaximumlikelihood(ML)isaclassicexampleofaproblemthatissimpliﬁedconsiderablybytheEM’sconceptualuniﬁcationofMLestimationfromdatathatcanbeviewedasbeingincomplete[20].Maximumlikelihoodestimationandlikelihood-basedinferenceareofcentralimportanceinstatisticaltheoryanddataanalysis.Maximumlikelihoodestimationisageneral-purposemethodwithattractiveproperties[6,13,31].Finitemixturedistributionsprovideaﬂexibleandmathematical-basedapproachtothemodelingandclusteringofdataobservedonrandomphenomena.WefocushereontheuseoftheEMalgorithmfortheﬁttingofﬁnitemixturemodelsviatheMLapproach.Withthemixturemodel-basedapproachtoclustering,theobservedp-dimensionaldatay1,...,ynareassumedtohavecomefromamixtureofaninitiallyspeciﬁednumbergofcomponentdensitiesinsomeunknownproportionsπ1,...,πg,whichsumto1.Themixturedensityofyjisexpressedasf(yj;Ψ)=gi=1πifi(yj;i)(j=1,...,n)(5.1)wherethecomponentdensityfi(yj;i)isspeciﬁeduptoavectoriofunknownparameters(i=1,...,g).ThevectorofalltheunknownparametersisgivenbyΨ=π1,...,πg−1,T1,...,TgTwherethesuperscriptTdenotesvectortranspose.TheparametervectorΨcanbeestimatedbyML.TheobjectiveistomaximizethelikelihoodL(Ψ),orequivalently,theloglikelihoodlogL(Ψ),asafunctionofΨ,overtheparameterspace.Thatis,theMLestimateofΨ,ˆΨ,isgivenbyanappropriaterootoftheloglikelihoodequation,∂logL(Ψ)/∂Ψ=0(5.2)wherelogL(Ψ)=nj=1logf(yj;Ψ)istheloglikelihoodfunctionforΨformedundertheassumptionofindependentdatay1,...,yn.TheaimofMLestimation[13]istodetermineanestimateˆΨforeachn,sothatitdeﬁnesasequenceofrootsofEquation(5.2)thatisconsistentandasymptoticallyefﬁcient.Suchasequenceisknowntoexistundersuitableregularityconditions[7].Withprobabilitytendingtoone,theserootscorrespondtolocalmaximaintheinterioroftheparameterspace.Forestimationmodelsingeneral,thelikelihoodusuallyhasaglobalmaximumintheinterioroftheparameterspace.ThentypicallyasequenceofrootsofEquation(5.2)withthedesiredasymptoticpropertiesisprovidedbytakingˆΨforeachntobetherootthatgloballymaximizesL(Ψ);inthiscase,ˆΨistheMLE[18].WeshallhenceforthrefertoˆΨastheMLE,eveninsituationswhereitmaynotgloballymaximizethelikelihood.Indeed,intheexampleonmixturemodelstobepresentedinSection5.4.1,thelikelihoodisunbounded.However,theremaystillexistundertheusualregularityconditionsasequenceofrootsofEquation(5.2)withthepropertiesofconsistency,efﬁciency,andasymptoticnormality[16].©2009byTaylor&FrancisGroup,LLC5.2AlgorithmDescription955.2AlgorithmDescriptionTheEMalgorithmisaniterativealgorithm,ineachiterationofwhichtherearetwosteps,theExpectationstep(E-step)andtheMaximizationstep(M-step).AbriefhistoryoftheEMalgorithmcanbefoundin[18].Withintheincomplete-dataframe-workoftheEMalgorithm,welety=(yT1,...,yTn)Tdenotethevectorcontainingtheobserveddataandweletzdenotethevectorcontainingtheincompletedata.Thecomplete-datavectorisdeclaredtobex=(yT,zT)TTheEMalgorithmapproachestheproblemofsolvingthe“incomplete-data”loglike-lihoodEquation(5.2)indirectlybyproceedingiterativelyintermsofthe“complete-data”loglikelihood,logLc(Ψ).Asitdependsexplicitlyontheunobservabledataz,theE-stepisperformedonwhichlogLc(Ψ)isreplacedbytheso-calledQ-function,whichisitsconditionalexpectationgiveny,usingthecurrentﬁtforΨ.Morespecif-ically,onthe(k+1)thiterationoftheEMalgorithm,theE-st