Expected error analysis for model selection

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ExpectedErrorAnalysisforModelSelectionTobiasScheer1andThorstenJoachims2July13,19991OttovonGuerickeUniversity,FIN/IWS,Universitatsplatz2,39106Magdeburg,Germany,scheer@iws.cs.uni-magdeburg.dePhoneno+4939167113992UniversitaetDortmund,LSVIII/ComputerScience,44221Dortmund,thorsten@ls8.cs.uni-dortmund.deAbstractInordertoselectagoodhypothesislanguage(ormodel)fromacollectionofpossiblemodels,onehastoassessthegeneralizationperformanceofthehypothesiswhichisre-turnedbyalearnerthatisboundtousesomeparticularmodel.Thispaperdealswithanewandveryecientwayofassessingthisgeneralizationperformance.Wepresentanewanalysiswhichcharacterizestheexpectedgeneralizationerrorofthehypothesiswithleasttrainingerrorintermsofthedistributionoferrorratesofthehypothesesinthemodel.Thisdistributioncanbeestimatedveryecientlyfromthedatawhichim-mediatelyleadstoanecientmodelselectionalgorithm.Theanalysispredictslearningcurveswithaveryhighprecisionandthuscontributestoabetterunderstandingofwhyandwhenover-ttingoccurs.Wepresentempiricalstudies(controlledexperimentsonBooleandecisiontreesandalarge-scaletextcategorizationproblem)whichshowthatthemodelselectionalgorithmleadstoerrorrateswhichareoftenaslowasthoseobtainedby10-foldcrossvalidation(sometimesevensuperior).However,thealgorithmismuchmoreecient(becausethelearnerdoesnothavetobeinvokedatall)andthussolvesmodelselectionproblemswithasmanyasthousandrelevantattributesand12,000examples.AshortversionofthispaperappearedattheInternationalConferenceonMachineLearn-ing,1999.11IntroductionInthesettingofclassicationlearningwhichwestudyinthispaper,thetaskofalearneristoapproximateajointdistributiononinstancesandclasslabelsaswellaspossible.Ahypothesisisamappingfrominstancestoclasslabels;the(generalization,ortrue)errorrateofahypothesishisthechanceofdrawingapairofaninstancexandaclasslabely(whendrawingaccordingtothesoughttargetdistribution)suchthatthehypothesisconjecturesaclasslabelh(x)whichisdistinctfromthe\correctclasslabely.Thiserrorrate(alsoreferredtoasthezero-oneloss)isthequantitywhichwewishtominimize.Unfortunately,however,wecannotdeterminetheerrorratebecausethetargetdistributionisnotknowntothelearner.Instead,thelearnerisabletoperceiveasample(i.e.,asetofpairs(xi;yi)ofxedsize)whichisdrawnaccordingtothetargetdistributionandwhichallowsustodenetheempiricalerrorrateofahypothesiswhichisthefrequencyofmisclassicationswithrespecttothesample.Thelearnerisprovidedasampleandisconstrainedtoamodel{asetofpotentiallyavailablehypotheses{andcanminimizetheempiricalerrorratewithinthatmodel.Onecanthinkofthemodelasaparametricschemeforahypothesiswhileanindividualhypothesisisafullyparameterizedmodel.Amodelmight,forinstance,consistofalldecisiontreesofdepththree,orofallback-propagationnetworkswithacertainxedarchitecture.Forthelatterexample,theback-propagationalgorithmwouldbealearnerthatminimizestheempiricalerrorratewithinthatmodel.Thechoiceofthemodeltowhichweconstrainthelearnerhasaverystrongimpactontheerrorrateofthehypothesisthatthelearnerwilldeliver.Forexample,ifwerestrictourlearnertodecisiontreesofdepthonethen,formostlearningproblems,wecanexpecteventhebesthypothesisinthatmodeltoincurbothahighempiricalerrorrateandahightrueerrorrate.Ontheotherhand,fromourunderstandingofPAC-andVC-styleerrorboundsweknowthatalowempiricalerrorratedoesnotimplythatthetrueerrorrateisalsolowwhenthemodelisverylarge(orcomplex,respectively).Whenweconsiderverymanydistincthypotheses,thechancethatatleastoneofthemhappenstoincuralowempiricalerrorrate(althoughitstrueerrorrateishigh)growsrapidly.Therefore,intheworstcase,theerrorrateofevenanapparentlygoodhypothesismightbelarge.Theproblemofselectingamodelthatwillleadtoalowerrorrateoftheresultinghypothesisisreferredtoasmodelselectionanditisintimatelyrelatedtotheproblemofestimatingtheerrorrateofahypothesis.Forourback-propagationexample,onepossiblemodelselectionproblemwouldbetodeterminethenumberofhiddenunitsthatleadstooptimalgeneralization.Fordecisiontrees,possiblemodelselectionproblemswouldbetodeterminethesubsetoftheavailableattributes,orthedepthorstructureofatreethatimposesanoptimalgeneralizationperformance.Threedistinctclassesofapproachestothemodelselectionproblemscanbedistinguished:(Foramoredetaileddiscussionoftheseapproaches,wereferthereadertoSection6.1.)Hold-outtesting,orcrossvalidationalgorithms(e.g.,Mosier,1951;Toussaint,1974;Kohavi&John,1997)useindependentsamplesthathavenotbeenusedfortrainingtocomparetheapparentlybesthypothesesofeachconsideredmodel.Crossvalidationhasproventobeaverygeneralandreliablemodelselectionalgorithmbutrequiresrepeatedinvocationsofthelearnerforeachmodelwhichmay,forlarge-scaleapplications,requireaprohibitivelylargeamountofcomputation.Bycontrast,complexitypenalizationalgorithms(e.g.,Cunetal.,1989;Mingers,1989;Vapnik,1998)trytoestimatethetrueerrorratebasedononlytheempiricalerrorrateandsomecomplexitymeasureofthemodel.Unfortunately,thisinformationdoesnotsucetoactuallydeterminethegeneralizationerrorrateandtherefore2complexitypenalizationalgorithmshavetoconjecturehowtheerrorratemightgrowwiththemode

1 / 31
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功