DataMiningandKnowledgeDiscovery,2,121–167(1998)c°1998KluwerAcademicPublishers,Boston.ManufacturedinTheNetherlands.ATutorialonSupportVectorMachinesforPatternRecognitionCHRISTOPHERJ.C.BURGESburges@lucent.comBellLaboratories,LucentTechnologiesEditor:UsamaFayyadAbstract.ThetutorialstartswithanoverviewoftheconceptsofVCdimensionandstructuralriskminimization.WethendescribelinearSupportVectorMachines(SVMs)forseparableandnon-separabledata,workingthroughanon-trivialexampleindetail.Wedescribeamechanicalanalogy,anddiscusswhenSVMsolutionsareuniqueandwhentheyareglobal.Wedescribehowsupportvectortrainingcanbepracticallyimplemented,anddiscussindetailthekernelmappingtechniquewhichisusedtoconstructSVMsolutionswhicharenonlinearinthedata.WeshowhowSupportVectormachinescanhaveverylarge(eveninfinite)VCdimensionbycomputingtheVCdimensionforhomogeneouspolynomialandGaussianradialbasisfunctionkernels.WhileveryhighVCdimensionwouldnormallybodeillforgeneralizationperformance,andwhileatpresentthereexistsnotheorywhichshowsthatgoodgeneralizationperformanceisguaranteedforSVMs,thereareseveralargumentswhichsupporttheobservedhighaccuracyofSVMs,whichwereview.Resultsofsomeexperimentswhichwereinspiredbytheseargumentsarealsopresented.Wegivenumerousexamplesandproofsofmostofthekeytheorems.Thereisnewmaterial,andIhopethatthereaderwillfindthatevenoldmaterialiscastinafreshlight.Keywords:supportvectormachines,statisticallearningtheory,VCdimension,patternrecognition1.IntroductionThepurposeofthispaperistoprovideanintroductoryyetextensivetutorialonthebasicideasbehindSupportVectorMachines(SVMs).Thebooks(Vapnik,1995;Vapnik,1998)containexcellentdescriptionsofSVMs,buttheyleaveroomforanaccountwhosepurposefromthestartistoteach.Althoughthesubjectcanbesaidtohavestartedinthelateseventies(Vapnik,1979),itisonlynowreceivingincreasingattention,andsothetimeappearssuitableforanintroductoryreview.Thetutorialdwellsentirelyonthepatternrecognitionproblem.Manyoftheideastherecarrydirectlyovertothecasesofregressionestimationandlinearoperatorinversion,butspaceconstraintsprecludedtheexplorationofthesetopicshere.Thetutorialcontainssomenewmaterial.Alloftheproofsaremyownversions,whereIhaveplacedastrongemphasisontheirbeingbothclearandself-contained,tomakethematerialasaccessibleaspossible.Thiswasdoneattheexpenseofsomeeleganceandgenerality:howevergeneralityisusuallyeasilyaddedoncethebasicideasareclear.ThelongerproofsarecollectedintheAppendix.Bywayofmotivation,andtoalertthereadertosomeoftheliterature,wesummarizesomerecentapplicationsandextensionsofsupportvectormachines.Forthepatternrecog-nitioncase,SVMshavebeenusedforisolatedhandwrittendigitrecognition(CortesandVapnik,1995;Sch¨olkopf,BurgesandVapnik,1995;Sch¨olkopf,BurgesandVapnik,1996;BurgesandSch¨olkopf,1997),objectrecognition(Blanzetal.,1996),speakeridentification(Schmidt,1996),charmedquarkdetection1,facedetectioninimages(Osuna,Freundand122BURGESGirosi,1997a),andtextcategorization(Joachims,1997).Fortheregressionestimationcase,SVMshavebeencomparedonbenchmarktimeseriespredictiontests(M¨ulleretal.,1997;Mukherjee,OsunaandGirosi,1997),theBostonhousingproblem(Druckeretal.,1997),and(onartificialdata)onthePEToperatorinversionproblem(Vapnik,GolowichandSmola,1996).Inmostofthesecases,SVMgeneralizationperformance(i.e.errorratesontestsets)eithermatchesorissignificantlybetterthanthatofcompetingmethods.TheuseofSVMsfordensityestimation(Westonetal.,1997)andANOVAdecomposition(Stitsonetal.,1997)hasalsobeenstudied.Regardingextensions,thebasicSVMscontainnopriorknowledgeoftheproblem(forexample,alargeclassofSVMsfortheimagerecognitionproblemwouldgivethesameresultsifthepixelswerefirstpermutedrandomly(witheachimagesufferingthesamepermutation),anactofvandalismthatwouldleavethebestperformingneuralnetworksseverelyhandicapped)andmuchworkhasbeendoneonincorporatingpriorknowledgeintoSVMs(Sch¨olkopf,BurgesandVapnik,1996;Sch¨olkopfetal.,1998a;Burges,1998).AlthoughSVMshavegoodgeneralizationperformance,theycanbeabysmallyslowintestphase,aproblemaddressedin(Burges,1996;OsunaandGirosi,1998).Recentworkhasgeneralizedthebasicideas(Smola,Sch¨olkopfandM¨uller,1998a;SmolaandSch¨olkopf,1998),shownconnectionstoregularizationtheory(Smola,Sch¨olkopfandM¨uller,1998b;Girosi,1998;Wahba,1998),andshownhowSVMideascanbeincorporatedinawiderangeofotheralgorithms(Sch¨olkopf,SmolaandM¨uller,1998b;Sch¨olkopfetal,1998c).Thereadermayalsofindthethesisof(Sch¨olkopf,1997)helpful.TheproblemwhichdrovetheinitialdevelopmentofSVMsoccursinseveralguises-thebiasvariancetradeoff(GemanandBienenstock,1992),capacitycontrol(Guyonetal.,1992),overfitting(MontgomeryandPeck,1992)-butthebasicideaisthesame.Roughlyspeaking,foragivenlearningtask,withagivenfiniteamountoftrainingdata,thebestgeneralizationperformancewillbeachievediftherightbalanceisstruckbetweentheaccuracyattainedonthatparticulartrainingset,andthe“capacity”ofthemachine,thatis,theabilityofthemachinetolearnanytrainingsetwithouterror.Amachinewithtoomuchcapacityislikeabotanistwithaphotographicmemorywho,whenpresentedwithanewtree,concludesthatitisnotatreebecauseithasadiffer