MachineLearning:ProceedingsoftheThirteenthInternationalConference,1996.ExperimentswithaNewBoostingAlgorithmYoavFreundRobertE.SchapireAT&TLaboratories600MountainAvenueMurrayHill,NJ07974-0636 yoav,schapire @research.att.comAbstract.Inanearlierpaper,weintroducedanew“boosting”algorithmcalledAdaBoostwhich,theoretically,canbeusedtosignificantlyreducetheerrorofanylearningalgorithmthatcon-sistentlygeneratesclassifierswhoseperformanceisalittlebetterthanrandomguessing.Wealsointroducedtherelatednotionofa“pseudo-loss”whichisamethodforforcingalearningalgorithmofmulti-labelconceptstoconcentrateonthelabelsthatarehardesttodiscriminate.Inthispaper,wedescribeexperimentswecarriedouttoassesshowwellAdaBoostwithandwithoutpseudo-loss,performsonreallearningproblems.Weperformedtwosetsofexperiments.ThefirstsetcomparedboostingtoBreiman’s“bagging”methodwhenusedtoaggregatevariousclassifiers(includingdecisiontreesandsingleattribute-valuetests).Wecomparedtheperformanceofthetwomethodsonacollectionofmachine-learningbenchmarks.Inthesecondsetofexperiments,westudiedinmoredetailtheperformanceofboostingusinganearest-neighborclassifieronanOCRproblem.1INTRODUCTION“Boosting”isageneralmethodforimprovingtheperfor-manceofanylearningalgorithm.Intheory,boostingcanbeusedtosignificantlyreducetheerrorofany“weak”learningalgorithmthatconsistentlygeneratesclassifierswhichneedonlybealittlebitbetterthanrandomguessing.Despitethepotentialbenefitsofboostingpromisedbythetheoret-icalresults,thetruepracticalvalueofboostingcanonlybeassessedbytestingthemethodonrealmachinelearningproblems.Inthispaper,wepresentsuchanexperimentalassessmentofanewboostingalgorithmcalledAdaBoost.Boostingworksbyrepeatedlyrunningagivenweak1learningalgorithmonvariousdistributionsoverthetrain-ingdata,andthencombiningtheclassifiersproducedbytheweaklearnerintoasinglecompositeclassifier.ThefirstprovablyeffectiveboostingalgorithmswerepresentedbySchapire[20]andFreund[9].Morerecently,wede-scribedandanalyzedAdaBoost,andwearguedthatthisnewboostingalgorithmhascertainpropertieswhichmakeitmorepracticalandeasiertoimplementthanitsprede-cessors[10].Thisalgorithm,whichweusedinallourexperiments,isdescribedindetailinSection2.Homepage:“”.Expectedtochangeto“˜uid”some-timeinthenearfuture(foruid yoav,schapire ).1Weusetheterm“weak”learningalgorithm,eventhough,inpractice,boostingmightbecombinedwithaquitestronglearningalgorithmsuchasC4.5.Thispaperdescribestwodistinctsetsofexperiments.Inthefirstsetofexperiments,describedinSection3,wecomparedboostingto“bagging,”amethoddescribedbyBreiman[1]whichworksinthesamegeneralfashion(i.e.,byrepeatedlyrerunningagivenweaklearningalgorithm,andcombiningthecomputedclassifiers),butwhichcon-structseachdistributioninasimplermanner.(Detailsgivenbelow.)Wecomparedboostingwithbaggingbecausebothmethodsworkbycombiningmanyclassifiers.Thiscom-parisonallowsustoseparateouttheeffectofmodifyingthedistributiononeachround(whichisdonedifferentlybyeachalgorithm)fromtheeffectofvotingmultipleclassifiers(whichisdonethesamebyeach).Inourexperiments,wecomparedboostingtobaggingusinganumberofdifferentweaklearningalgorithmsofvaryinglevelsofsophistication.Theseinclude:(1)analgorithmthatsearchesforverysimplepredictionruleswhichtestonasingleattribute(similartoHolte’sverysim-pleclassificationrules[14]);(2)analgorithmthatsearchesforasinglegooddecisionrulethattestsonaconjunctionofattributetests(similarinflavortotherule-formationpartofCohen’sRIPPERalgorithm[3]andF¨urnkranzandWidmer’sIREPalgorithm[11]);and(3)Quinlan’sC4.5decision-treealgorithm[18].Wetestedthesealgorithmsonacollectionof27benchmarklearningproblemstakenfromtheUCIrepository.Themainconclusionofourexperimentsisthatboost-ingperformssignificantlyanduniformlybetterthanbag-gingwhentheweaklearningalgorithmgeneratesfairlysimpleclassifiers(algorithms(1)and(2)above).WhencombinedwithC4.5,boostingstillseemstooutperformbaggingslightly,buttheresultsarelesscompelling.Wealsofoundthatboostingcanbeusedwithverysim-plerules(algorithm(1))toconstructclassifiersthatarequitegoodrelative,say,toC4.5.KearnsandMansour[16]arguethatC4.5canitselfbeviewedasakindofboostingalgo-rithm,soacomparisonofAdaBoostandC4.5canbeseenasacomparisonoftwocompetingboostingalgorithms.SeeDietterich,KearnsandMansour’spaper[4]formoredetailonthispoint.Inthesecondsetofexperiments,wetesttheperfor-manceofboostingonanearestneighborclassifierforhand-writtendigitrecognition.Inthiscasetheweaklearningalgorithmisverysimple,andthisletsusgainsomeinsightintotheinteractionbetweentheboostingalgorithmandthenearestneighborclassifier.Weshowthattheboostingal-gorithmisaneffectivewayforfindingasmallsubsetofprototypesthatperformsalmostaswellasthecompleteset.WealsoshowthatitcomparesfavorablytothestandardmethodofCondensedNearestNeighbor[13]intermsofitstesterror.Thereseemtobetwoseparatereasonsfortheimprove-mentinperformancethatisachievedbyboosting.Thefirstandbetterunderstoodeffectofboostingisthatitgeneratesahypothesiswhoseerroronthetrainingsetissmallbycom-biningmanyhypotheseswhoseerro