MachineLearning,9,309-347(1992)©1992KluwerAcademicPublishers,Boston.ManufacturedinTheNetheriands.ABayesianMethodfortheInductionofProbabilisticNetworksfromDataGREGORYECOOPERGFC@MED.PITT.EDUSectionofMedicalInformatics,DepartmentofMedicine,UniversityofPittsburgh,B50ALothropHall,Pittsburgh,PA15261EDWARDHERSKOVITSEHH@SUMEX-AIM.STANFORD.EDUNoeticSystems,Incorporated,2504MarylandAvenue,Baltimore,MD21218Editor:TomDietterichAbstract.ThispaperpresentsaBayesianmethodforconstructingprobabilisticnetworksfromdatabases.Inpar-ticular,wefocusonconstructingBayesianbeliefnetworks.Potentialapplicationsincludecomputer-assistedhypoth-esistesting,automatedscientificdiscovery,andautomatedconstructionofprobabifisticexpertsystems.Weextendthebasicmethodtohandlemissingdataandhidden(latent)variables.Weshowhowtoperformprobahilisticinferencebyaveragingovertheinferencesofmultiplebeliefnetworks.Resultsarepresentedofapreliminaryevaluationofanalgorithmforconstructingabeliefnetworkfromadatabaseofcases.Finally,werelatethemethodsinthispapertopreviouswork,andwediscussopenproblems.Keywords.probabilisticnetworks,Bayesianbeliefnetworks,machinelearning,induction1.IntroductionInthispaper,wepresentaBayesianmethodforconstructingaprobabilisticnetworkfromadatabaseofrecords,whichwecallcases.Onceconstructed,suchanetworkcanprovideinsightintoprobabilisticdependenciesthatexistamongthevariablesinthedatabase.Oneapplicationistheautomateddiscoveryofdependencyrelationships.Thecomputerprogramsearchesforaprobabilistic-networkstructurethathasahighposteriorprobabilitygiventhedatabase,andoutputsthestructureanditsprobability.Arelatedtaskiscomputer-assistedhypothesistesting:Theuserentersahypotheticalstructureofthedependencyrelationshipsamongasetofvariables,andtheprogramcalculatestheprobabilityofthestructuregivenadatabaseofcasesonthevariables.Wecanalsoconstructanetworkanduseitforcomputer-baseddiagnosis.Forexample,supposewehaveadatabaseinwhichacasecontainsdataaboutthebehaviorofsomesys-tem(i.e.,findings).Supposefurtherthatacasecontainsdataaboutwhetherthisparticularbehaviorfollowsfrompropersystemoperation,oralternatively,iscausedbyoneofseveralpossiblefaults.Assumethatthedatabasecontainsmanysuchcasesfrompreviousepisodesofproperandfaultybehavior.Themethodthatwepresentinthispapercanbeusedtoconstructfromthedatabaseaprobabilisticnetworkthatcapturestheprobabilisticdependen-ciesamongfindingsandfaults.Suchanetworkthencanbeappliedtoclassifyfuturecasesofsystembehaviorbyassigningaposteriorprobabilitytoeachofthepossiblefaultsandtotheeventpropersystemoperation.Inthispaper,wealsoshalldiscussdiagnosticinfer-encethatisbasedoncombiningtheinferencesofmultiplealternativenetworks.310G.F.COOPERANDE.HERSKOVITSTable1.Adatabaseexample.Thetermcaseinthefirstcolumndenotesasingletraininginstance(record)inthedatabase--asforexample,apatientcase.Forbrevity,inthetextwesome-timesuse0todenoteabsentand1todenotepresent.VariablevaluesforeachcaseCasexlx2x31presentabsentabsent2presentpresentpresent3absentabsentpresent4presentpresentpresent5absentabsentabsent6absentpresentpresent7presentpresentpresent8absentabsentabsent9presentpresentpresent10absentabsentabsentLetusconsiderasimpleexampleofthetasksjustdescribed.Supposethefictitiousdata-baseofcasesintable1isthetrainingset.Supposefurtherthatx1representsafaultinthesystem,andthatx2andx3representtwofindings.Giventhedatabase,whatarethequali-tativedependencyrelationshipsamongthevariables?Forexample,doxlandx3influenceeachotherdirectly,ordotheydosoonlythroughx2?Whatistheprobabilitythatx3willbepresentifXlispresent?Clearly,therearenocategoricallycorrectanswerstoeachofthesequestions.Theanswersdependonanumberoffactors,suchasthemodelthatweusetorepresentthedata,andourpriorknowledgeaboutthedatainthedatabaseandtherelationshipsamongthevariables.Inthispaper,wedonotattempttoconsiderallsuchfactorsintheirfullgenerality.Rather,wespecializethegeneraltaskbypresentingoneparticularframeworkforconstructingprob-abilisticnetworksfromdatabases(as,forexample,thedatabaseintable1)suchthatthesenetworkscanbeusedforprobabilisticinference(as,forexample,thecalculationofP(x3=presentIx1=present)).Inparticular,wefocusonusingaBayesianbeliefnetworkasamodelofprobabilisticdependency.Ourprimarygoalistoconstructsuchanetwork(ornetworks),givenadatabaseandasetofexplicitassumptionsaboutourpriorprobabilisticknowledgeofthedomain.ABayesianbelief-networkstructureBsisadirectedacyclicgraphinwhichnodesrepre-sentdomainvariablesandarcsbetweennodesrepresentprobabilisticdependencies(Cooper,1989;Horvitz,Breese,&Henrion,1988;Lauritzen&Spiegelhalter,1988;Neapolitan,1990;Pearl,1986;Pearl,1988;Shachter,1988).AvariableinaBayesianbelief-networkstructuremaybecontinuous(Shachter&Kenley,1989)ordiscrete.Inthispaper,weshallfocusourdiscussionondiscretevariables.Figure1showsanexampleofabelief-networkstructurecontainingthreevariables.Inthisfigure,wehavedrawnanarcfromx1tox2toindicatethatthesetwovariablesareprobabilisticallydependent.Similarly,thearcfromx2tox3indicatesaprobabilisticdependencybetweenthesetwovariables.Theabsenceofanarcfromx1tox3