A Tutorial on Support Vector Machines for Pattern

pppppp520
1 ℃
2020-05-04

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

DataMiningandKnowledgeDiscovery,2,121–167(1998)c°1998KluwerAcademicPublishers,Boston.ManufacturedinTheNetherlands.ATutorialonSupportVectorMachinesforPatternRecognitionCHRISTOPHERJ.C.BURGESburges@lucent.comBellLaboratories,LucentTechnologiesEditor:UsamaFayyadAbstract.ThetutorialstartswithanoverviewoftheconceptsofVCdimensionandstructuralriskminimization.WethendescribelinearSupportVectorMachines(SVMs)forseparableandnon-separabledata,workingthroughanon-trivialexampleindetail.Wedescribeamechanicalanalogy,anddiscusswhenSVMsolutionsareuniqueandwhentheyareglobal.Wedescribehowsupportvectortrainingcanbepracticallyimplemented,anddiscussindetailthekernelmappingtechniquewhichisusedtoconstructSVMsolutionswhicharenonlinearinthedata.WeshowhowSupportVectormachinescanhaveverylarge(eveninﬁnite)VCdimensionbycomputingtheVCdimensionforhomogeneouspolynomialandGaussianradialbasisfunctionkernels.WhileveryhighVCdimensionwouldnormallybodeillforgeneralizationperformance,andwhileatpresentthereexistsnotheorywhichshowsthatgoodgeneralizationperformanceisguaranteedforSVMs,thereareseveralargumentswhichsupporttheobservedhighaccuracyofSVMs,whichwereview.Resultsofsomeexperimentswhichwereinspiredbytheseargumentsarealsopresented.Wegivenumerousexamplesandproofsofmostofthekeytheorems.Thereisnewmaterial,andIhopethatthereaderwillﬁndthatevenoldmaterialiscastinafreshlight.Keywords:supportvectormachines,statisticallearningtheory,VCdimension,patternrecognition1.IntroductionThepurposeofthispaperistoprovideanintroductoryyetextensivetutorialonthebasicideasbehindSupportVectorMachines(SVMs).Thebooks(Vapnik,1995;Vapnik,1998)containexcellentdescriptionsofSVMs,buttheyleaveroomforanaccountwhosepurposefromthestartistoteach.Althoughthesubjectcanbesaidtohavestartedinthelateseventies(Vapnik,1979),itisonlynowreceivingincreasingattention,andsothetimeappearssuitableforanintroductoryreview.Thetutorialdwellsentirelyonthepatternrecognitionproblem.Manyoftheideastherecarrydirectlyovertothecasesofregressionestimationandlinearoperatorinversion,butspaceconstraintsprecludedtheexplorationofthesetopicshere.Thetutorialcontainssomenewmaterial.Alloftheproofsaremyownversions,whereIhaveplacedastrongemphasisontheirbeingbothclearandself-contained,tomakethematerialasaccessibleaspossible.Thiswasdoneattheexpenseofsomeeleganceandgenerality:howevergeneralityisusuallyeasilyaddedoncethebasicideasareclear.ThelongerproofsarecollectedintheAppendix.Bywayofmotivation,andtoalertthereadertosomeoftheliterature,wesummarizesomerecentapplicationsandextensionsofsupportvectormachines.Forthepatternrecog-nitioncase,SVMshavebeenusedforisolatedhandwrittendigitrecognition(CortesandVapnik,1995;Sch¨olkopf,BurgesandVapnik,1995;Sch¨olkopf,BurgesandVapnik,1996;BurgesandSch¨olkopf,1997),objectrecognition(Blanzetal.,1996),speakeridentiﬁcation(Schmidt,1996),charmedquarkdetection1,facedetectioninimages(Osuna,Freundand122BURGESGirosi,1997a),andtextcategorization(Joachims,1997).Fortheregressionestimationcase,SVMshavebeencomparedonbenchmarktimeseriespredictiontests(M¨ulleretal.,1997;Mukherjee,OsunaandGirosi,1997),theBostonhousingproblem(Druckeretal.,1997),and(onartiﬁcialdata)onthePEToperatorinversionproblem(Vapnik,GolowichandSmola,1996).Inmostofthesecases,SVMgeneralizationperformance(i.e.errorratesontestsets)eithermatchesorissigniﬁcantlybetterthanthatofcompetingmethods.TheuseofSVMsfordensityestimation(Westonetal.,1997)andANOVAdecomposition(Stitsonetal.,1997)hasalsobeenstudied.Regardingextensions,thebasicSVMscontainnopriorknowledgeoftheproblem(forexample,alargeclassofSVMsfortheimagerecognitionproblemwouldgivethesameresultsifthepixelswereﬁrstpermutedrandomly(witheachimagesufferingthesamepermutation),anactofvandalismthatwouldleavethebestperformingneuralnetworksseverelyhandicapped)andmuchworkhasbeendoneonincorporatingpriorknowledgeintoSVMs(Sch¨olkopf,BurgesandVapnik,1996;Sch¨olkopfetal.,1998a;Burges,1998).AlthoughSVMshavegoodgeneralizationperformance,theycanbeabysmallyslowintestphase,aproblemaddressedin(Burges,1996;OsunaandGirosi,1998).Recentworkhasgeneralizedthebasicideas(Smola,Sch¨olkopfandM¨uller,1998a;SmolaandSch¨olkopf,1998),shownconnectionstoregularizationtheory(Smola,Sch¨olkopfandM¨uller,1998b;Girosi,1998;Wahba,1998),andshownhowSVMideascanbeincorporatedinawiderangeofotheralgorithms(Sch¨olkopf,SmolaandM¨uller,1998b;Sch¨olkopfetal,1998c).Thereadermayalsoﬁndthethesisof(Sch¨olkopf,1997)helpful.TheproblemwhichdrovetheinitialdevelopmentofSVMsoccursinseveralguises-thebiasvariancetradeoff(GemanandBienenstock,1992),capacitycontrol(Guyonetal.,1992),overﬁtting(MontgomeryandPeck,1992)-butthebasicideaisthesame.Roughlyspeaking,foragivenlearningtask,withagivenﬁniteamountoftrainingdata,thebestgeneralizationperformancewillbeachievediftherightbalanceisstruckbetweentheaccuracyattainedonthatparticulartrainingset,andthe“capacity”ofthemachine,thatis,theabilityofthemachinetolearnanytrainingsetwithouterror.Amachinewithtoomuchcapacityislikeabotanistwithaphotographicmemorywho,whenpresentedwithanewtree,concludesthatitisnotatreebecauseithasadiffer