ABSTRACT Classifier Selection for Majority Voting

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ClassifierSelectionforMajorityVotingDymitrRuta∗andBogdanGabrys†ABSTRACTIndividualclassificationmodelsarerecentlychallengedbycombinedpatternrecognitionsystems,whichoftenshowbetterperformance.Insuchsystemstheoptimalsetofclassifiersisfirstselectedandthencombinedbyaspecificfusionmethod.Forasmallnumberofclassifiersoptimalensemblescanbefoundexhaustively,buttheburdenofexponentialcomplexityofsuchsearchlimitsitspracticalapplicabilityforlargersystems.Asaresult,simplersearchalgorithmsand/orselectioncriteriaareneededtoreducethecomplexity.Thisworkprovidesarevisionoftheclassifierselectionmethodologyandevaluatesthepracticalapplicabilityofdiversitymeasuresinthecontextofcombiningclassifiersbymajorityvoting.Anumberofsearchalgorithmsareproposedandadjustedtoworkproperlywithanumberofselectioncriteriaincludingmajorityvotingerrorandvariousdiversitymeasures.Extensiveexperimentscarriedoutwith15classifierson27datasetsindicateinappropriatenessofdiversitymeasuresusedasselectioncriteriainfavourofthedirectcombinererrorbasedsearch.Furthermore,theresultspromptedanoveldesignofmultipleclassifiersystemsinwhichselectionandfusionarerecurrentlyappliedtoapopulationofbestcombinationsofclassifiersratherthantheindividualbest.Theimprovementofthegeneralisationperformanceofsuchsystemisdemonstratedexperimentally.KEYWORDSClassifierFusion,ClassifierSelection,Diversity,SearchAlgorithms,MajorityVoting,Generalisation1IntroductionGivenalargepoolofdifferentclassifiersthereareanumberofpossiblecombiningstrategiestofollowanditisusuallynotclearwhichonemaybetheoptimalforaparticularproblem.Thesimpleststrategycouldbe∗ComputationalIntelligenceGroup,BTExactTechnologies,OrionBuilding1stfloor,pp12,AdastralPark,MartleshamHeath,IpswichIP53RE,UK,dymitr.ruta@bt.com†ComputationalIntelligenceResearchGroup,BournemouthUniversity,SchoolofDesign,Engineering&Computing,PooleHouse,TalbotCampus,FernBarrowPooleBH125BB,UnitedKingdom,bgabrys@bournemouth.ac.uktoselectthesingle,bestperformingclassifieronthetrainingdataandapplyingittothepreviouslyunseenpatterns[26].Suchanapproach,althoughthesimplest,doesnotguaranteetheoptimalperformance[28].Moreover,thereisapossibilitythatatleastsomesubsetsofclassifierscouldjointlyoutperformthebestclassifierifsuitablycombined.Toensuretheoptimalperformance,amultipleclassifierdesignshouldbeabletoselectthesubsetofclassifiersthatisoptimalinthesensethatitproducesthehighestpossibleperformanceforaparticularcombiner.Ononehand,itisclearthatcombiningthesameclassifiersdoesnotcontributetoanythingbuttheincreasedcomplexityofasystem.Ontheotherhand,differentbutmuchworseperformingclassifiersareunlikelytobringanybenefitsincombinedperformance.Itisbelievedthattheoptimalcombinationsofclassifiersshouldhavegoodindividualperformancesandatthesametimesufficientlevelofdiversity[35].Inmanyrecentworksithasbeenshownhoweverthatneitherindividualperformances[27],[40]nordiversity[37],[29]ontheirownprovideareliablediagnostictoolabletodetectwhencombineroutperformstheindividualbestclassifier.AsnotedbyRogova[27],individualclassifierperformancesdonotrelatewelltocombinedperformanceastheymissouttheimportantinformationabouttheteamstrengthoftheclassifiers.Inturn,diversity,duetoproblemswithmeasuringandevenperceivingit,alsodoesnotprovideareliableselectioncriterionthatwouldbewellcorrelatedwithcombinerperformance[31].Someattemptsatincludingbothcomponentsjointlyguidingselectionprovedtobehighlycomplexwhileofferingonlyrelativelysmallimprovements[40],[30].Alittlemoresuccessfulhavebeenselectionattemptsbasedonspecificsimilaritymeasuresdevisedinconjunctionwiththecombinerforwhichtheclassifiersareselected.Thefaultmajoritypresentedin[29]orsimilarityS3hmeasurepresentedin[16]arejusttwoexamplesthathaveshownhighcorrelationwithmajorityvotingperformance.Unlikegeneralstatisticallydrivendiversitymeasures,measuresexploitingcombinerdefinitiontakeintoaccountinformationofwhatmakesaparticularcombinerworkandselectionguidedbysuchacombinernaturallyhavegreaterchancesofbeingsuccessful.Allthesefindingspointtothecombinedperformanceasarelevantselectioncriterion.Effectivelythemostreliablestrategyseemstobeevaluationofasmanydifferentdesignsaspossibleandsubsequentselectionofthebestperformingmodel.Adifficultyhoweveristhatsuchawideopenscaleofevaluationiscomputationallyintractable.Torealisethis,itissufficienttonotethatassumingachosencombiner,evaluationofallsubsetsfromanensembleofredundantclassifiersisaprocessgrowingexponentiallywiththenumberofclassifiers.Ontopofthat,forlargenumbersofclassifierstheperfor-mancebasedsearchspacebecomesincreasinglyflatwhichmakesselectionevenmoredifficult[40].Inthelightofsuchdifficulties,amodulardecompositionmodelofcombiningseemsadvisable,particularly2ifonlyonelocallybestclassifieristobeselectedforaparticularsubtaskorlocalinputsubspace.Anum-berofdynamicselectionmodels[7],[8],[10]orclusterandselectbasedapproaches[18],[24]illustratethatadvantageandinsomecasesshowevensubstantialimprovementcomparedwiththeindividualbestclassifier.However,byanalogytoredundantcombining,ingeneral,improvementmaybealsosoughtincombiningmanyclassifierswithineachsubtaskorinputsubspace,whichcom

1 / 38
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功