Efficient approximations for the marginal likeliho

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

EcientApproximationsfortheMarginalLikelihoodofBayesianNetworkswithHiddenVariablesDavidMaxwellChickeringdmax@cs.ucla.eduDavidHeckermanheckerma@microsoft.comMarch1996(RevisedApril1997)TechnicalReportMSR-TR-96-08MicrosoftResearchAdvancedTechnologyDivisionMicrosoftCorporationOneMicrosoftWayRedmond,WA98052AbstractWediscussBayesianmethodsformodelaveragingandmodelselectionamongBayesian-networkmodelswithhiddenvariables.Inparticular,weexaminelarge-sampleapprox-imationsforthemarginallikelihoodofnaive-Bayesmodelsinwhichtherootnodeishidden.Suchmodelsareusefulforclusteringorunsupervisedlearning.WeconsideraLaplaceapproximationandthelessaccuratebutmorecomputationallyecientap-proximationknownastheBayesianInformationCriterion(BIC),whichisequivalenttoRissanen’s(1987)MinimumDescriptionLength(MDL).Also,weconsiderapproxima-tionsthatignoresomeo-diagonalelementsoftheobservedinformationmatrixandanapproximationproposedbyCheesemanandStutz(1995).WeevaluatetheaccuracyoftheseapproximationsusingaMonte-Carlogoldstandard.Inexperimentswitharticialandrealexamples,wendthat(1)noneoftheapproximationsareaccuratewhenusedformodelaveraging,(2)alloftheapproximations,withtheexceptionofBIC/MDL,areaccurateformodelselection,(3)amongtheaccurateapproximations,theCheeseman{StutzandDiagonalapproximationsarethemostcomputationallyecient,(4)alloftheapproximations,withtheexceptionofBIC/MDL,canbesensitivetothepriordis-tributionovermodelparameters,and(5)theCheeseman{Stutzapproximationcanbemoreaccuratethantheotherapproximations,includingtheLaplaceapproximation,insituationswheretheparametersinthemaximumaposteriori(MAP)congurationarenearaboundary.Keywords:Bayesianmodelaveraging,modelselection,multinomialmixtures,cluster-ing,unsupervisedlearning,Laplaceapproximation1IntroductionThereisgrowinginterestinmethodsforlearninggraphicalmodelsfromdata.Wecon-siderBayesianmethodssuchasthosereviewedinHeckerman(1995)andBuntine(1996).AkeystepintheBayesianapproachtolearninggraphicalmodelsisthecomputationofthemarginallikelihoodofadatasetgivenamodel.Thisquantityistheordinarylike-lihood(afunctionofthedataandthemodelparameters)averagedovertheparameterswithrespecttotheirpriordistribution.Givenacompletedataset|thatis,adatasetinwhicheachsamplecontainsobservationsforeveryvariableinthemodel|themarginallikelihoodcanbecomputedexactlyinclosedformundercertainassumptions(e.g.,CooperandHerskovits,1992;Heckermanetal.,1995).Incontrast,whenobservationsaremissing,includingsituationswheresomevariablesarehidden(i.e.,neverobserved),theexactde-terminationofthemarginallikelihoodistypicallyintractable(e.g.,CooperandHerskovits,1992).Consequently,approximatetechniquesforcomputingthemarginallikelihoodareused.1OneclassofapproximationsthathasreceivedagreatdealofattentioninthestatisticscommunityisbasedonMonte-Carlotechniques.Intheory,theseapproximationsareknowntoconvergetoanaccurateresult.Inpractice,however,theamountofcomputertimeneededforconvergencecanbeenormous.Analternativeclassofapproximationsisbasedonthelarge-samplepropertiesofprobabilitydistributions.Thisclassalsocanbeaccurateundercertainassumptions,andaretypicallymoreecient1thanMonte-Carlotechniques.Onelarge-sampleapproximation,knownasaLaplaceapproximation,iswidelyusedbyBayesianstatisticians(Haughton,1988;Kassetal.,1988;KassandRaftery,1995).Al-thoughthisapproximationisecientrelativetoMonte-Carlomethods,ithasacomputa-tionalcomplexityofO(d2N)(orgreater)wheredisthedimensionofthemodelandNisthesamplesizeofthedata.Consequently,theLaplaceapproximationcanbeacomputationalburdenforlargemodels.Inthispaper,weexamineotherlarge-sampleapproximationsthataremoreecientthantheLaplaceapproximation.TheseapproximationsincludetheBayesianInformationCrite-rion(BIC)(Schwarz,1978),whichisequivalenttoRisannen’s(1987)Minimum-Description-Length(MDL)measure,diagonalandblockdiagonalapproximationsfortheHessiantermintheLaplaceapproximation(BeckerandLeCun,1988;BuntineandWeigand,1994),andanapproximationsuggestedbyCheesemanandStutz(1995).Researchershaveinvestigatedtheaccuracyandeciencyofsomeoftheseapproxima-tions.Forexample,boththeoreticalandempiricalstudieshaveshownthattheLaplaceapproximationismoreaccuratethanistheBIC/MDLapproximation(see,e.g.,Draper,1993,andRaftery,1994).Also,BeckerandLeCun(1989)andMacKay(1992b)reportsuccessfulandunsuccessfulapplicationsofthediagonalapproximation,respectively,inthecontextofparameterlearningforprobabilisticneural-networkmodels.Inthispaper,wellinsomeofthegapsthathavebeenleftbypreviousstudies.Weexamineempiricallytheaccuracyandeciencyofallapproximations,comparingthemtoaMonte-Carlogoldstandard.WedosousingsimpleBayesiannetworksfordiscretevariablesthatcontainasinglehiddenvariable.Toourknowledge,thisempiricalstudyistherstthatcomparestheseapproximationswithaMonte-Carlostandardinthecontextofhidden-variableBayesiannetworks,andtherstthatexaminestheaccuracyoftheCheeseman{Stutzapproximation.Ourstudyismotivatedbyaneedforaccurateandecientmethodsforexploratorydataanalysis.Oneexplorationtaskisclustering.Forexample,supposewehaverepeatedobservationsforthedisc

1 / 37
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功