1AnIntroductiontotheExpectation-Maximization(EM)AlgorithmAnIntroductiontotheExpectationAnIntroductiontotheExpectation--Maximization(EM)AlgorithmMaximization(EM)AlgorithmCSE802,Spring2006DepartmentofComputerScienceandEngineeringMichiganStateUniversity2OverviewOverviewOverview•Theproblemofmissingdata•AmixtureofGaussian•TheEMAlgorithm–TheQ-function,theE-stepandtheM-step•Someexamples•Summary3MissingDataMissingDataMissingData•Datacollectionandfeaturegenerationarenotperfect.Objectsinapatternrecognitionapplicationmayhavemissingfeatures–Intheclassificationproblemofseabassversussalmon,thelightnessfeaturecanstillbeestimatedifpartofthefishisoccluded.Thisisnotsoforthewidthfeature–Featuresareextractedbymultiplesensors.Oneormultiplesensorsmaybemalfunctioningwhentheobjectispresented–Arespondentfailstoanswerallthequestionsinaquestionnaire–Someparticipantsquitduringthemiddleofastudy,wheremeasurementsaretakenatdifferenttimepoints•Anexampleofmissingdata:“dermatology”fromUCIMissingvalues4MissingDataMissingDataMissingData•Differenttypesofmissingdata–Missingcompletelyatrandom(MCAR)–Missingatrandom(MAR):thefactthatafeatureismissingisrandom,afterconditionedonanotherfeatureExample:peoplewhoaredepressedmightbelessinclinedtoreporttheirincome,andthusreportedincomewillberelatedtodepression.However,if,withindepressedpatientstheprobabilityofreportedincomewasunrelatedtoincomelevel,thenthedatawouldbeconsideredMAR•IfnotMARnorMCAR,themissingdataphenomenonneedstobemodeledexplicitly–Example:ifauserfailstoprovidehis/herratingonamovie,itismorelikelythathe/shedislikesthemoviebecausehe/shehasnotwatchedit5StrategyofCopingWithMissingDataStrategyofCopingWithMissingDataStrategyofCopingWithMissingData•Discardingcases–Ignorepatternswithmissingfeatures–Ignorefeatureswithmissingvalues–Easytoimplement,butwastefulofdata.MaybiastheparameterestimateifnotMCAR•Maximumlikelihood(theEMalgorithm)–Undersomeassumptionofhowthedataaregenerated,estimatetheparametersthatmaximizethelikelihoodoftheobserveddata(non-missingfeatures)•MultipleImputation–Foreachobjectwithmissingfeatures,generate(impute)multipleinstantiationsofthemissingfeaturesbasedonthevaluesofthenon-missingfeatures6MixtureOfGaussianMixtureOfGaussianMixtureOfGaussian•InDHSchapters2and3,theclassconditionaldensitiesareassumedtobeparametric,typicallyGaussian–Whatifthisassumptionisviolated?•Onesolutionistousenon-parametricdensityestimate(Ch4)•Anintermediateapproach:amixtureofGaussian024681000.10.2-10-5051000.020.040.060.080.1-10-5051000.050.10.150.20.25-10-5051000.050.10.150.20.257GaussianMixtureGaussianMixtureGaussianMixture•Letxdenotethefeaturevectorofanobject•AGaussianmixtureconsistsofkdifferentGaussiandistributions,kbeingspecifiedbytheuser–EachoftheGaussianisknownas“componentdistribution”orsimply“component”–Thej-thGaussianhasmeanμjandcovariancematrixCj–Inpractice,kcanbeunknown•TwodifferentwaystothinkaboutaGaussianmixture–ThepdfisasumofmultipleGaussianpdfs–Twostage-datagenerationprocess:Firstacomponentisrandomlyselected.Theprobabilitythatthej-thcomponentisselectedisαjThecomponentselectedisreferredtoas“componentlabel”Thedataxisthengeneratedaccordingto8GaussianMixtureGaussianMixtureGaussianMixture•Letf(x,μj,Cj)denotetheGaussianpdf•Letzbetherandomvariablecorrespondingtotheidentityofthecomponentselected(componentlabel)–Theevent“z=j”meansthatthej-thcomponentisselectedatthefirststageofthedatagenerationprocess–p(z=j)=αj•ThepdfofaGaussianmixtureisaweightedsumofdifferentGaussianpdf9GaussianMixtureandMissingDataGaussianMixtureandMissingDataGaussianMixtureandMissingData•SupposewewanttofitaGaussianmixtureofkcomponentstothetrainingdataset{x1,…,xn}•Theparametersareαj,μjandCj•Thelog-likelihoodisgivenby•Theparameterscanbefoundbymaximizing–Unfortunately,theMLEdoesnothaveaclosedformsolution•Letzibethecomponentlabelofthedataxi•AsimplealgorithmtofindtheMLEviewsalltheziasmissingdata.Afterall,theziarenotobserved(unknown)!•Thisfallsunderthecategoryof“missingcompletelyatrandom”,becausealltheziaremissing.So,themissingphenomenonisnotrelatedtothevalueofxi10TheEMAlgorithmTheEMAlgorithmTheEMAlgorithm•TheEMalgorithmisaniterativealgorithmthatfindstheparameterswhichmaximizethelog-likelihoodwhentherearemissingdata•Completedata:thecombinationofthemissingdataandtheobserveddata–ForthecaseofGaussianmixture,thecompletedataconsistofthepairs(xi,zi),i=1,...,n•TheEMalgorithmisbestusedinsituationwherethelog-likelihoodoftheobserveddataisdifficulttooptimize,whereasthelog-likelihoodofthecompletedataiseasytooptimize–InthecaseofaGaussianmixture,thelog-likelihoodofcompletedata(xiandzi)ismaximizedeasilybytakingaweightedsumandaweightedscattermatrixofthedata•Notethatthelog-likelihoodoftheobserveddataisobtainedbymarginalizing(summingout)thehiddendata11TheEMAlgorithmTheEMAlgorithmTheEMAlgorithm•LetXandZdenotethevectorofallobserveddataandhiddendata,respectively•Letθtbetheparameterestimateatthet-thiteration•DefinetheQ-function(afunctionofθ)•Intuitively,theQ-functionisthe“expectedvalueofthecompletedatalog-likelihood”–FillinallthepossiblevaluesforthemissingdataZ.