Chapter5EMGeoffreyJ.McLachlanandShu-KayNgContents5.1Introduction............................................................935.2AlgorithmDescription..................................................955.3SoftwareImplementation................................................965.4IllustrativeExamples....................................................975.4.1Example5.1:MultivariateNormalMixtures......................975.4.2Example5.2:MixturesofFactorAnalyzers......................1005.5AdvancedTopics.......................................................1035.6Exercises..............................................................105References..................................................................113AbstractTheexpectation-maximization(EM)algorithmisabroadlyapplicableapproachtotheiterativecomputationofmaximumlikelihood(ML)estimates,usefulinavarietyofincomplete-dataproblems.Inparticular,theEMalgorithmsimplifiesconsiderablytheproblemoffittingfinitemixturemodelsbyML,wheremixturemodelsareusedtomodelheterogeneityinclusteranalysisandpatternrecognitioncontexts.TheEMalgorithmhasanumberofappealingproperties,includingitsnumericalstability,simplicityofimplementation,andreliableglobalconvergence.TherearealsoextensionsoftheEMalgorithmtotacklecomplexproblemsinvariousdataminingapplications.Itis,however,highlydesirableifitssimplicityandstabilitycanbepreserved.5.1IntroductionTheexpectation-maximization(EM)algorithmhasbeenofconsiderableinterestinrecentyearsinthedevelopmentofalgorithmsinvariousapplicationareassuchasdatamining,machinelearning,andpatternrecognition[20,27,28].TheseminalpaperofDempsteretal.[8]ontheEMalgorithmgreatlystimulatedinterestintheuseoffinitemixturedistributionstomodelheterogeneousdata.Thisisbecausethefittingof93©2009byTaylor&FrancisGroup,LLC94EMmixturemodelsbymaximumlikelihood(ML)isaclassicexampleofaproblemthatissimplifiedconsiderablybytheEM’sconceptualunificationofMLestimationfromdatathatcanbeviewedasbeingincomplete[20].Maximumlikelihoodestimationandlikelihood-basedinferenceareofcentralimportanceinstatisticaltheoryanddataanalysis.Maximumlikelihoodestimationisageneral-purposemethodwithattractiveproperties[6,13,31].Finitemixturedistributionsprovideaflexibleandmathematical-basedapproachtothemodelingandclusteringofdataobservedonrandomphenomena.WefocushereontheuseoftheEMalgorithmforthefittingoffinitemixturemodelsviatheMLapproach.Withthemixturemodel-basedapproachtoclustering,theobservedp-dimensionaldatay1,...,ynareassumedtohavecomefromamixtureofaninitiallyspecifiednumbergofcomponentdensitiesinsomeunknownproportionsπ1,...,πg,whichsumto1.Themixturedensityofyjisexpressedasf(yj;Ψ)=gi=1πifi(yj;i)(j=1,...,n)(5.1)wherethecomponentdensityfi(yj;i)isspecifieduptoavectoriofunknownparameters(i=1,...,g).ThevectorofalltheunknownparametersisgivenbyΨ=π1,...,πg−1,T1,...,TgTwherethesuperscriptTdenotesvectortranspose.TheparametervectorΨcanbeestimatedbyML.TheobjectiveistomaximizethelikelihoodL(Ψ),orequivalently,theloglikelihoodlogL(Ψ),asafunctionofΨ,overtheparameterspace.Thatis,theMLestimateofΨ,ˆΨ,isgivenbyanappropriaterootoftheloglikelihoodequation,∂logL(Ψ)/∂Ψ=0(5.2)wherelogL(Ψ)=nj=1logf(yj;Ψ)istheloglikelihoodfunctionforΨformedundertheassumptionofindependentdatay1,...,yn.TheaimofMLestimation[13]istodetermineanestimateˆΨforeachn,sothatitdefinesasequenceofrootsofEquation(5.2)thatisconsistentandasymptoticallyefficient.Suchasequenceisknowntoexistundersuitableregularityconditions[7].Withprobabilitytendingtoone,theserootscorrespondtolocalmaximaintheinterioroftheparameterspace.Forestimationmodelsingeneral,thelikelihoodusuallyhasaglobalmaximumintheinterioroftheparameterspace.ThentypicallyasequenceofrootsofEquation(5.2)withthedesiredasymptoticpropertiesisprovidedbytakingˆΨforeachntobetherootthatgloballymaximizesL(Ψ);inthiscase,ˆΨistheMLE[18].WeshallhenceforthrefertoˆΨastheMLE,eveninsituationswhereitmaynotgloballymaximizethelikelihood.Indeed,intheexampleonmixturemodelstobepresentedinSection5.4.1,thelikelihoodisunbounded.However,theremaystillexistundertheusualregularityconditionsasequenceofrootsofEquation(5.2)withthepropertiesofconsistency,efficiency,andasymptoticnormality[16].©2009byTaylor&FrancisGroup,LLC5.2AlgorithmDescription955.2AlgorithmDescriptionTheEMalgorithmisaniterativealgorithm,ineachiterationofwhichtherearetwosteps,theExpectationstep(E-step)andtheMaximizationstep(M-step).AbriefhistoryoftheEMalgorithmcanbefoundin[18].Withintheincomplete-dataframe-workoftheEMalgorithm,welety=(yT1,...,yTn)Tdenotethevectorcontainingtheobserveddataandweletzdenotethevectorcontainingtheincompletedata.Thecomplete-datavectorisdeclaredtobex=(yT,zT)TTheEMalgorithmapproachestheproblemofsolvingthe“incomplete-data”loglike-lihoodEquation(5.2)indirectlybyproceedingiterativelyintermsofthe“complete-data”loglikelihood,logLc(Ψ).Asitdependsexplicitlyontheunobservabledataz,theE-stepisperformedonwhichlogLc(Ψ)isreplacedbytheso-calledQ-function,whichisitsconditionalexpectationgiveny,usingthecurrentfitforΨ.Morespecif-ically,onthe(k+1)thiterationoftheEMalgorithm,theE-st