MachineLearningandGraphicalModels王立威北京大学信息科学技术学院1wanglw@cis.pku.edu.cn(LectureI)OutlineAbriefoverviewofMachineLearningGraphicalModels•Representation•Inference•Learning2DefinitionofMachineLearning:•Learningfromexperiences.“AcomputerprogramissaidtolearnfromexperienceEwithrespecttosomeclassoftasksTandperformancemeasureP,ifitsperformanceattasksinT,asmeasuredbyP,improveswithexperienceE.”-TomMitchell3“Classical”MachineLearningTasks:•Classification:spamfilter,facerecognition,…•RegressionHook’slaw,Kepler’slaw,…•RankingSearchengine•Probability(Distribution)Estimation4}1,1{:nfRRRnf:RRnf:“Classical”MachineLearningAlgorithms•ClassificationSVMBoostingRandomForestBagging(Deep)NeuralNetworks•RegressionLassoBoosting5SupportVectorMachines(SVMs)SVM:thelargemarginclassifierSVM:hingelossminimization+regularization22ll/BoostingBoosting:(implicit)largemarginclassifierBoosting:explossminimization(+regularization)ll/1“Classical”MachineLearningTheories•VCtheoryCapacityofthehypothesisspace•PAC-theory•MargintheoryConfidence•EmpiricalProcessesCapacity•PAC-BayestheoryPACinBayesframework•RegularizationCapacity,smoothness8MLtheories:QuantificationofOccam’sRazorForceLengthHook’slawComparisonof“Classical”MachineLearningTheories•Regularization:BayesianoptimalityOnlyasymptotic(convergence,rate,non-uniform)•VC/PAC,Margin,PAC-Bayes,…Relativeoptimality(optimalinahypothesisspace)Non-asymptotic(finitesamplebounds)10Limitationsofthe“Classical”ML•RepresentationEuclideanrepresentationforinput.Simplerepresentationforoutput.HowtorepresentSTRUCTURESindata?11OutlineAbriefoverviewofMachineLearningGraphicalModels•Representation•Inference•Learning12ChapterI:Representation13ProbabilisticGraphicalModels:WhatandWhy•PGMs:Amodelforjointprobabilitydistributionoverrandomvariables.Representdependenciesandindependenciesbetweentherandomvariables.•Whyisprobabilitydistributionimportant?Genesanddiseases,andeverything•WhyPGMwasinventedbycomputerscientist,whynotthestatisticians?14TwotypesofPGMs•Directedgraph:BayesianNetworks(BNs).•Undirectedgraph:MarkovRandomFields(MRFs)15BayesianNetworks(BNs)16(Intuitively)HowBNsRepresentJointpdfs:17Example1:ACB)|()|()()(BCPABPAPABCPGivenB,CandAareindependentNote:Dependencyvs.Causality(Intuitively)HowBNsRepresentJointpdfs:18Example2:ACB)|()|()()(ACPABPAPABCPGivenA,BandCareindependent(Intuitively)HowBNsRepresentJointpdfs:19Example3:ACB)|()()()(BCAPCPBPABCPBandCareindependent;ButgivenA,BandCareNOTindependent(Intuitively)HowBNsRepresentJointpdfs:20Example4:ACB)|()|()|()|()|()()()(EGPDFPCEPADPABCPBPAPABCDEFGPEDFGLearning:Findafactorizationruleaccordingtopreviousexamples.21ACB)|()|()|()|()|()()()(EGPDFPCEPADPABCPBPAPABCDEFGPEDFGFactorization:22ACBEDFGBNmustbeDAGThegraphmustbeacyclic!Definition(FactorizeaccordingtoaDAG):AprobabilitydistributionPissaidtobefactorizedaccordingtoadirectedacyclicgraphGif23Definition(BayesianNetwork):ABayesiannetworkisapair(P,G)ofaprobabilitydistributionandaDAG,wherePisfactorizedaccordingtoG,andallthe“local”conditionalprobabilitydistributionsaregiven.24Giventhefactorization,whichvariablesareindependentofC,givenC’sParentsAandB?25ACBEDFGFD,)|()|()|()|()|()()()(EGPDFPCEPADPABCPBPAPABCDEFGP26ACBEDFG)|()|()|()|()|()()()(EGPDFPCEPADPABCPBPAPABCDEFGP)|()|()|(ABDPABCPABCDP)|()|()|(ABFPABCPABCFPQuestion:LetCbeanode(randomvariable)inaBN.Whichnodes(randomvariables)areindependentofC,givenC’sParents?27Theorem(LocalMarkovpropertyforBN):ForanynodeC(randomvariable)inaBN,allnodesthatarenotdescendentsofCareindependentofC,givenC’sparents.Sparsevs.Dense28ACBDACBDVSWhatisthejointpdfoftherightBN?IsthereanyindependenceintherightBN?Question:GivenaBN=(P,G),canyoudetermine,forthreearbitrarysetsofrandomvariablesX={…},Y={…},andZ={…},whetherthefollowingconditionalindependencyhold?29Definition(activetrailinBN)Letx,ybetwonodesandZbeasetofnodes.ApathbetweenxandyaresaidtobeanactivetrialgivenZ,ifthefollowingsaretrue:1)Letbethepath,thenmoroneofitsdescendantsisnotinZ.Thatis,wheneverthereisa“v-structure”inthepath,themiddlenodeoroneofitsdescendantsisinZ;2)NoothernodealongthepathisinZ.30yymxx...''...Definition(D-separationinBN)LetX,Y,Zbethreesetsofnodes.XandYaresaidtobeD-separatedbyZifforeverynodexinXandeveryyinY,andeverypathbetweenxandy,thepathisnotanactivetrialgivenZ.Theorem(Informal)TheindependenciesinaBNareexactlythosecharacterizedbyD-separation.31TheoremForanyBN(P,G),andarbitrarysetsofnodesX,Y,Z.IfXandYareD-separatedbyZinG,then32P(Y|Z)P(X|Z)P(X,Y|Z)TheoremForanyDAGG,andanysetsofnodesX,Y,Z.IfXandYarenotD-separatedbyZinG,thentheremustexistaprobabilitydistributionPwhichfactorizeaccordingtoG,but33P(Y|Z)P(X|Z)P(X,Y|Z)34IsthereaBNthathaspreciselytheseindependencies?NoteverydistributioncanberepresentedbyaBNsatisfyingexactlyalltheindependencies!TheRepresentationLimitofBNTosumup:•BNrepresentsjointprobabilitydistributionsthatcanfactorizeaccordingto:•ThelocalindependenciesinBNarecharacterizedbyparentsandnon-descendants.•Theglobali