LearningwithBayesianNetworksDavidHeckermanPresentedbyColinRickert太原房产网BayesiannetworksrepresentanadvancedformofgeneralBayesianprobabilityABayesiannetworkisagraphicalmodelthatencodesprobabilisticrelationshipsamongvariablesofinterest1Themodelhasseveraladvantagesfordataanalysisoverrulebaseddecisiontrees1Outline1.Bayesianvs.classicalprobabilitymethods2.AdvantagesofBayesiantechniques3.ThecointosspredictionmodelfromaBayesianperspective4.ConstructingaBayesiannetworkwithpriorknowledge5.OptimizingaBayesiannetworkwithobservedknowledge(data)6.ExamquestionsBayesianvs.theClassicalApproachTheBayesianprobabilityofaneventx,representstheperson’sdegreeofbelieforconfidenceinthatevent’soccurrencebasedonpriorandobservedfacts.Classicalprobabilityreferstothetrueoractualprobabilityoftheeventandisnotconcernedwithobservedbehavior.Bayesianvs.theClassicalApproachBayesianapproachrestrictsitspredictiontothenext(N+1)occurrenceofaneventgiventheobservedprevious(N)events.Classicalapproachistopredictlikelihoodofanygiveneventregardlessofthenumberofoccurrences.ExampleImagineacoinwithirregularsurfacessuchthattheprobabilityoflandingheadsortailsisnotequal.Classicalapproachwouldbetoanalyzethesurfacestocreateaphysicalmodelofhowthecoinislikelytolandonanygiventhrow.Bayesianapproachsimplyrestrictsattentiontopredictingthenexttossbasedonprevioustosses.AdvantagesofBayesianTechniquesHowdoBayesiantechniquescomparetootherlearningmodels?1.Bayesiannetworkscanreadilyhandleincompletedatasets.2.Bayesiannetworksallowonetolearnaboutcausalrelationships3.Bayesiannetworksreadilyfacilitateuseofpriorknowledge4.Bayesianmethodsprovideanefficientmethodforpreventingtheoverfittingofdata(thereisnoneedforpre-processing).HandlingofIncompleteDataImagineadatasamplewheretwoattributevaluesarestronglyanti-correlatedWithdecisiontreesbothvaluesmustbepresenttoavoidconfusingthelearningmodelBayesiannetworksneedonlyoneofthevaluestobepresentandcaninfertheabsenceoftheother:Imaginetwovariables,oneforgun-ownerandtheotherforpeaceactivist.DatashouldindicatethatyoudonotneedtocheckbothvaluesLearningaboutCausalRelationshipsWecanuseobservedknowledgetodeterminethevalidityoftheacyclicgraphthatrepresentstheBayesiannetwork.Forinstanceisrunningacauseofkneedamage?Priorknowledgemayindicatethatthisisthecase.Observedknowledgemaystrengthenorweakenthisargument.UseofPriorKnowledgeandObservedBehaviorConstructionofpriorknowledgeisrelativelystraightforwardbyconstructing“causal”edgesbetweenanytwofactorsthatarebelievedtobecorrelated.CausalnetworksrepresentpriorknowledgewhereastheweightofthedirectededgescanbeupdatedinaposteriormannerbasedonnewdataAvoidanceofOverFittingDataContradictionsdonotneedtoberemovedfromthedata.Datacanbe“smoothed”suchthatallavailabledatacanbeusedThe“Irregular”CoinTossfromaBayesianPerspectiveStartwiththesetofprobabilities={1,…,n}forourhypothesis.Forcointosswehaveonlyonerepresentingourbeliefthatwewilltossa“heads”,1-fortails.Predicttheoutcomeofthenext(N+1)flipbasedonthepreviousNflips:for1,…,ND={X1=x1,…,Xn=xn}WanttoknowprobabilitythatXn+1=xn+1=headsrepresentsinformationwehaveobservedthusfar(i.e.={D}BayesianProbabilitiesPosteriorProbability,p(|D,):ProbabilityofaparticularvalueofgiventhatDhasbeenobserved(ourfinalvalueof).Inthiscase={D}.PriorProbability,p(|):PriorProbabilityofaparticularvalueofgivennoobserveddata(ourprevious“belief”)ObservedProbabilityor“Likelihood”,p(D|,):LikelihoodofsequenceofcointossesDbeingobservedgiventhatisaparticularvalue.Inthiscase={}.p(D|):RawprobabilityofDBayesianFormulasforWeightedCoinToss(IrregularCoin)where*Onlyneedtocalculatep(|D,)andp(|),therestcanbederivedIntegrationTofindtheprobabilitythatXn+1=heads,wemustintegrateoverallpossiblevaluesoftofindtheaveragevalueofwhichyields:ExpansionofTerms1.Expandobservedprobabilityp(|D,):2.Expandpriorprobabilityp(|):*“Beta”functionyieldsabellcurveuponintegrationwhichisatypicalprobabilitydistribution.Canbeviewedasourexpectationoftheshapeofthecurve.BetaFunctionandIntegrationIntegratinggivesthedesiredresult:CombineproductofbothfunctionstoyieldKeyPointsMultiplytheresultsofthebetafunction(priorprobability)withresultsofthecointossfunctionfor(observedprobability).Resultisourconfidenceforthisvalueof.Integratingtheproductofthetwowithrespecttooverallvaluesof01,isnecessarytoyieldtheaveragevaluethatbestfitstheobservedfacts+priorknowledge.BayesianNetworks1.Constructpriorknowledgefromgraphofcausalrelationshipsamongvariables.2.Updatetheweightsoftheedgestoreflectconfidenceofthatcausallinkbasedonobserveddata(i.e.posteriorknowledge).ExampleNetworkConsideracreditfraudnetworkdesignedtodeterminetheprobabilityofcreditfraudbasedoncertaineventsVariablesinclude:Fraud(f):whetherfraudoccurredornotGas(g):whethergaswaspurchasedwithin24hoursJewelry(J):whetherjewelrywaspurchasedinthelast24hoursAge(a):AgeofcardholderSex(s):SexofcardholderTaskofdeterminingwhichvariablestoincludeisnottrivial,involvesdecisionanalysis.ConstructGraphBase