StructuralExtensiontoLogisticRegression:DiscriminativeParameterLearningofBeliefNetClassiersRussellGreiner(greiner@cs.ualberta.ca)DeptofComputingScience,UniversityofAlberta,Edmonton,ABT6G2H1CanadaXiaoyuanSu(xsu1@umsis.miami.edu)Electrical&ComputerEngineering,UniversityofMiami,CoralGables,FL33124,USABinShen(bshen@cs.ualberta.ca)DeptofComputingScience,UniversityofAlberta,Edmonton,ABT6G2H1CanadaWeiZhou(w2zhou@math.uwaterloo.ca)SchoolofComputerScience,UniversityofWaterloo,Waterloo,ONN2L3G1,CanadaAbstract.Bayesianbeliefnets(BNs)areoftenusedforclassicationtaskstypicallytoreturnthemostlikelyclasslabelforeachspeciedinstance.ManyBN-learners,however,attempttondtheBNthatmaximizesadifferentobjectivefunctionviz.,likelihood,ratherthanclassicationaccuracytypicallybyrstlearninganappropriategraphicalstructure,thenndingtheparametersforthatstructurethatmaximizethelikelihoodofthedata.Astheseparametersmaynotmaximizetheclassicationaccuracy,discriminativeparameterlearnersfollowthealternativeapproachofseekingtheparametersthatmaximizeconditionallikelihood(CL),overthedistributionofinstancestheBNwillhavetoclassify.Thispaperrstformallyspeciesthistask,showshowitextendsstandardlogisticregression,andanalyzesitsinherentsampleandcomputationalcomplexity.Wethenpresentageneralalgorithmforthistask,ELR,thatappliestoarbitraryBNstructuresandthatworkseffectivelyevenwhengivenincompletetrainingdata.Unfortunately,ELRisnotguaranteedtondtheparametersthatoptimizeconditionallikelihood;moreover,eventheoptimal-CLparametersneednothaveminimalclassicationerror.ThispaperthereforepresentsempiricalevidencethatELRproduceseffectiveclassiers,oftensuperiortotheonesproducedbythestandardgenerativealgorithms,especiallyincommonsituationswherethegivenBN-structureisincorrect.Keywords:(Bayesian)beliefnets,Logisticregression,Classication,PAC-learning,Computational/samplecomplexity1.IntroductionManytasksincludingfaultdiagnosis,patternrecognitionandforecast-ingcanbeviewedasclassication,aseachrequiresassigningtheclass(label)toagiveninstance,whichisspeciedbyasetofattributes.Anincreasingnumberofprojectsareusing(Bayesian)beliefnets(BN)torepresenttheunderlyingdistribution,andhencethestochasticmappingfromevidencetoresponse.Thispaperextendstheearlierresultsthatappearin[27]and[49].Thise-mailaddressisavailableforallproblemsandquestions.©2005KluwerAcademicPublishers.PrintedintheNetherlands.elr.tex;15/02/2005;9:44;p.12Whenthisdistributionisnotknownapriori,wecantrytolearnthemodel.OurgoalisanaccurateBNi.e.,onethatreturnsthecorrectanswerasoftenaspossible.Whileaperfectmodelofthedistributionwillperformoptimallyforanypossiblequery,learnerswithlimitedtrainingdataareunlikelytoproducesuchamodel;moreover,optimalitymaybeimpossibleforlearn-ersconstrainedtoarestrictedrangeofpossibledistributionsthatexcludesthecorrectone(e.g.,whenonlyconsideringparameterizationsofagivenBN-structure).Here,itmakessensetondtheparametersthatdowellwithrespecttothequeriesposed.Thisdiscriminativelearningtaskdiffersfromthegenera-tivelearningthatisusedtolearnanoverallmodelofthedistribution[47].Followingstandardpractice,ourdiscriminativelearnerwillseektheparame-tersthatmaximizethelogconditionallikelihood(LCL)overthedata,ratherthansimplelikehoodthatis,giventhedataS=fhci;eiig(eachclasslabelC=ciassociatedwithevidenceE=ei),adiscriminativelearnerwilltrytondparametersQthatmaximizedLCL(S)(Q)=1jSjåhci;eii2SlogPQ(cijei)(1)ratherthantheonesthatmaximizeåhci;eii2SlogPQ(ci;ei)[47].OptimizingtheLCLoftherootnode(giventheotherattributes)ofanaïve-bayesstructurecanbeformulatedasastandardlogisticregressionprob-lem[39,32].Generalbeliefnetsextendnaïve-bayes-structuresbypermittingadditionaldependenciesamongtheattributes.ThispaperprovidesageneraldiscriminativelearningtoolELRthatcanlearntheparametersforanarbitrarystructure,completingtheanalogyNaïve-bayes:GeneralBeliefNet::LogisticRegression:ELR:(2)Moreover,whilemostalgorithmsforlearninglogisticregressionfunctionsrequirecompletetrainingdata,theELRalgorithmcanacceptincompletedata.Wealsopresentempiricalevidence,fromalargenumberofdatasets,todemon-stratethatELRworkseffectively.Section2providesthefoundations,overviewingbeliefnetsthendeningourtask:discriminativelylearningtheparameters(foraxedbeliefnetstruc-ture,G)thatmaximizeLCL.Section3formallyanalysesthistask,providingbothsampleandcomputationalcomplexity,andnotinghowtheseresultscomparewithcorrespondingresultsforgenerativelearning.SeeingthatourtaskisNP-hardingeneral,Section4presentsagradient-descentdiscrimi-nativeparameterlearningalgorithmforgeneralBNs,ELR.Section5reportsempiricalresultsthatdemonstratethatourELRproducesaclassierthatisoftensuperiortoonesproducedbystandardlearningalgorithms(whichmax-imizelikelihood),overavarietyofsituations,involvingbothcompleteandelr.tex;15/02/2005;9:44;p.2StructuralExtensiontoLogisticRegression3incompletedata.Section6providesabriefsurveyoftherelevantliterature.Thewebpage[25]providestheproofsofallofourtheoreticclaims(extendingtheproofsketchintheAppendix),aswellasmoreinformationabou