清华大学人工智能NN-SVM_542109004

514824
1 ℃
2020-05-15

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

2015/4/201SupportVectorMachines2015/4/202ContentIntroductionOptimalhyperplaneforlinearlyseparablepatternsOptimalhyperplanefornonseparablepatternsBuildaSVMExampleApplications2015/4/203SystemRiskErrorTrainingerrorbytrainingsetGeneralizationerrorbytestsetRiskminimizationEmpiricalrisk(trainingerror)minimizationStructuralriskminimizationIntroduction2015/4/2042015/4/205IntroductionSVM(Vapnik,1992),akindofuniversalfeedforwardnetworks,whichcanbeusedforpatternclassificationandnonlinearregression.ThemainideaofSVMistoconstructahyperplaneasthedecisionsurfaceinsuchawaythatthemarginofseparationbetweenpositiveandnegativeexamplesismaximized.Precisely,SVMisanimplementationofthemethodofstructuralriskminimization.Theerrorrateonatestsetisboundedbythesumofthetraining-errorrateandatermthatdependsontheVapnik-Chervonenkis(VC)dimensionCentraloftheconstructionofSVMlearningalgorithmistheinner-productkernelbetweena“supportvector”xiandthevectorxdrawnfromtheinputspace.2015/4/206Introduction(con.)SVMcouldbeusedtoconstructthethreetypesoflearningmachinesPolynomiallearningmachinesRadial-basisfunctionnetworksTwo-layerperceptrons(i.e.,withasinglehiddenlayer)SVMcouldbeusedtoimplementthelearningprocessusingagivensetoftrainingdata,automaticallydeterminingtherequirednumberofhiddenunits.Whereastheback-propagationalgorithmisdevisedspecificallytotrainamultilayerperceptron,theSVMlearningalgorithmisofamoregenericnaturebecauseithaswiderapplicability.Optimalhyperplaneforlinearlyseparablepatterns2015/4/207LinearClassifiersfxydenotes+1denotes-1f(x,w,b)=sign(wx+b)Howwouldyouclassifythisdata?wx+b0wx+b0LinearClassifiersfxydenotes+1denotes-1f(x,w,b)=sign(wx+b)Howwouldyouclassifythisdata?LinearClassifiersfxydenotes+1denotes-1f(x,w,b)=sign(wx+b)Howwouldyouclassifythisdata?LinearClassifiersfxydenotes+1denotes-1f(x,w,b)=sign(wx+b)Anyofthesewouldbefine....butwhichisbest?LinearClassifiersfxydenotes+1denotes-1f(x,w,b)=sign(wx+b)Howwouldyouclassifythisdata?Misclassifiedto+1classClassifierMarginfxydenotes+1denotes-1f(x,w,b)=sgn(wx+b)Definethemarginofalinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehittingadatapoint.ClassifierMarginfxyf(x,w,b)=sgn(wx+b)Definethemarginofalinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehittingadatapoint.MaximumMarginfxydenotes+1denotes-1f(x,w,b)=sign(wx+b)Themaximummarginlinearclassifieristhelinearclassifierwiththemaximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)LinearSVMSupportVectorsarethosedatapointsthatthemarginpushesupagainst1.Maximizingthemarginisgoodaccordingtointuition2.Impliesthatonlysupportvectorsareimportant;othertrainingexamplesareignorable.3.Empiricallyitworksveryverywell.2015/4/2015LinearSVMMathematicallyTrainingsample:Assume:thepattern(class)representedbythesubsetyi=+1andyi=-1are“linearlyseparable”.Hyperplane:ForallpatternDecisionfunctionMarginofseparationρ:theclosestdatapointNiiiyx1)},{(0bxwT1010iTiTyforbxwyforbxw))sgn(()(bxwxf2015/4/2016LinearSVMMathematically(cont.)Goal:tofindtheparticularhyperplaneofwhichthemarginofseparationismaximized.(optimalhyperplane)Figure.Illustrationoftheideaofanoptimalhyper-planeforlinearlyseparablepatterns.2015/4/2017OptimalhyperplaneTorepresentamultidimensionallineardecisionsurfaceintheinputspacewithoptimalvaluew0andb0.DiscriminantfunctionAlgebraicmeasureofthedistancefromxtotheoptimalhyperplaneisr.rispositiveifxisonthepositivesideofhyperplane,viceversa.00)(bxwxgT00wwrxxp000bxwT2015/4/2018Optimalhyperplane(cont.)Bydefinition,g(xp)=0,wherexpisonthehyperplane.00000000000)()()(wr0)(wxgrFigure.Geometricinterpretationofalgebraicdistancesofpointstotheoptimalhyperplaneforatwo-dimensionalcase.Distance，2015/4/2019Optimalhyperplane(cont.)Forsupportvectorx(s),g(x(s))=1or-1,wherey(s)=+1or-1,bydefinitionofsupportvector,wehaveThen，marginConclusion:tominimize|w|1111)()(0)(00)(sssyifwyifwwxgr022wr2015/4/2020FindingtheoptimalhyperplaneGoal:todevelopacomputationallyefficientprocedureforusingthetrainingsample{(xi,yi)}Ni=1,findtheoptimalhyperplane,subjecttotheconstraintwithwandb,Andtheweightvectorwminimizesthecostfunction,UsingthemethodofLagrangemultipliers(Bertsekas,1995),toconstructLagrangianfunctionWheretheauxiliarynonnegativevariablesαiarecalledLagrangemultipliers.NiiTiiTbxwywwbwJ1]1)([21),,()(NiforbxwyiTi,,2,11)(2015/4/2021Findingtheoptimalhyperplane(cont.)Conditions=Results0),,(0),,(bbwJwbwJ011NiiiNiiiiyxyw2015/4/2022Findingtheoptimalhyperplane(cont.)LagranianFunctionandWehaveobjectfunctionNiforyxxyybwJQiNiiiNiNjjTijijiNii,...,2,10021),,()(1111NiiNiNiiiiTiiTybxwywwbwJ11121),,(NiNjjTijijiiTiNiiTxxyyxwyww1112015/4/2023Findingtheoptimalhyperplane(cont.)OnlyseveralαihasnonzerovalueDecisionFunctionforclassificationNibxwyiTii,...,2,10]1)([0]1)([sgn()(0,0,0bxxyxfiiii2015/4/2024OverwiewTraining:Giventrainingset{(xi,yi)}Ni=1MaxobjectfunctionQ(α),Subjectt