Learning Bayesian Networks From Data An Efficient

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

1LearningBayesianNetworksfromData:AnEfficientApproachBasedonInformationTheoryJieChengDept.ofComputingScienceUniversityofAlbertaAlberta,T6G2H1Email:jcheng@cs.ualberta.caDavidBell,WeiruLiuFacultyofInformatics,UniversityofUlster,UKBT370QBEmail:{w.liu,da.bell}@ulst.ac.ukAbstractThispaperaddressestheproblemoflearningBayesiannetworkstructuresfromdatabyusinganinformationtheoreticdependencyanalysisapproach.Basedonourthree-phaseconstructionmechanism,twoefficientalgorithmshavebeendeveloped.Oneofouralgorithmsdealswithaspecialcasewherethenodeorderingisgiven,thealgorithmonlyrequire)(2NOCItestsandiscorrectgiventhattheunderlyingmodelisDAG-Faithful[Spirteset.al.,1996].Theotheralgorithmdealswiththegeneralcaseandrequires)(4NOconditionalindependence(CI)tests.ItiscorrectgiventhattheunderlyingmodelismonotoneDAG-Faithful(seeSection4.4).AsystembasedonthesealgorithmshasbeendevelopedanddistributedthroughtheInternet.Theempiricalresultsshowthatourapproachisefficientandreliable.1IntroductionTheBayesiannetworkisapowerfulknowledgerepresentationandreasoningtoolunderconditionsofuncertainty.ABayesiannetworkisadirectedacyclicgraph(DAG)withaprobabilitytableforeachnode.ThenodesinaBayesiannetworkrepresentpropositionalvariablesinadomain,andthearcsbetweennodesrepresentthedependencyrelationshipsamongthevariables.OnconstructingBayesiannetworksfromdatabases,weusenodestorepresentdatabaseattributes.Inrecentyears,manyBayesiannetworkstructurelearningalgorithmshavebeendeveloped.Thesealgorithmsgenerallyfallintotwogroups,search&scoringbasedalgorithmsanddependencyanalysisbasedalgorithms.AnoverviewofthesealgorithmsispresentedinSection6.Althoughsomeofthesealgorithmscangivegoodresultsonsomebenchmarkdatasets,therearestillseveralproblems:•Nodeorderingrequirement.Alotofpreviousworkassumesthatnodeorderingisavailable.Unfortunately,inmanytimesthisisnotthecase.•Lackofefficiency.Somerecentalgorithmsdonotneednodeordering,buttheyaregenerallynotveryefficient.AllpracticabledependencyanalysisbasedalgorithmsrequireexponentialnumbersofCItests.•Lackofpubliclyavailablelearningtools.Althoughtherearemanyalgorithmsforthistask,onlyafewBayesiannetworklearningsystemsarepubliclyavailableandonlyoneofthem(TETRADII[Spirtes,et.al.,1996])canbeappliedtoreal-worlddataminingapplicationswherethedatasetsoftenhavehundredsofvariablesandmillionsofrecords.ThismotivatesustodoresearchintheareaofBayesiannetworkstructurelearning.Wedevelopedtwoalgorithmsforthistask,AlgorithmAandAlgorithmB.AlgorithmAdealswithaspecialcasewherethenodeorderingisgiven,whichrequires)(2NOCItestsandiscorrectgiventhattheunderlyingmodelisDAGfaithful.AlgorithmBdealswiththegeneralcaseandrequires)(4NOCItests.ItiscorrectgiventhattheunderlyingmodelismonotoneDAGfaithful.Basedonthesetwoalgorithms,wehavedevelopedaBayesiannetworklearningsystem,calledBayesianNetworkPowerConstructor.ThesystemhasbeenavailableontheInternetsinceOctober1997andhasalreadyenjoyedoveronethousanddownloads.Duetotheveryencouragingexperimentalresultsandpositivefeedbackfromotherusers,weplantoexpandourworktoallowcontinuousandhiddenvariablesintheunderlyingmodels.Wealsoplantodevelopacommercialversionofoursystemandintegrateittolargedatamining,knowledgebaseanddecisionsupportsystems.2Theremainderofthepaperisorganizedasfollows.Section2introducesBayesiannetworklearningfromaninformationtheoreticperspective.InSection3wepresentaBayesiannetworklearningalgorithm(AlgorithmA)foraspecialcasewhennodeorderingisgiven.Thecorrectnessproofandcomplexityanalysisarealsogiven.Section4presentsanextensionofAlgorithmA(calledAlgorithmB),whichdoesnotrequirenodeordering.Then,wegiveitscorrectnessproofandcomplexityanalysis.InSection5,wefirstintroduceourBayesiannetworklearningsystem(BNPowerConstructor),whichimplementsbothofouralgorithms.Then,weanalyzetheexperimentalresultsofbothalgorithmsonreal-worlddatasets.Section6surveysthealgorithmsofBayesiannetworklearningalgorithms.Finally,weconcludeourworkandproposesomefutureresearchdirectionsinSection7.2LearningBayesianNetworksUsingInformationTheoryInthissection,wefirstgivesomebasicconceptsrelatedwithBayesiannetworklearning.Thenweintroduceavitalconceptusedinourapproach-d-separation,fromaninformationtheoreticperspective.2.1BasicConceptsDefinition2.1.AdirectedgraphGcanbedefinedasanorderedpairthatconsistsofafinitesetVofnodesandanirreflexiveadjacencyrelationEonV.ThegraphGisdenotedas),(EV.ForeachEyx∈),(wesaythatthereisanarc(directededge)fromnodextonodey.Inthegraph,thisisdenotedbyanarrowfromxtoyandxandyarecalledthestartpointandtheendpointofthearrowrespectively.Wealsosaythatnodexandnodeyareadjacentorxandyareneighborsofeachother.xisalsocalledaparentofyandyiscalledachildofx.Byusingtheconceptsofparentandchildrecursively,wecanalsodefinetheconceptofancestoranddescendent.Wealsocallanodethatdoesnothaveanyparentarootnode.ByirreflexiveadjacencyrelationwemeanthatforanyVx∈,Exx∉),(,i.e.,anarccannothaveanodeasbothitsstartpointandendpoint.Definition2.2.InBayesiannetworklearning,weoftenneedtofindapaththatconnectstwonodeswithoutconsideringthedirectionalityoftheedgesonthepath.Todistin

1 / 41
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功