PAC-Bayesian Bounds for Randomized Empirical Risk

神佛皆杀
1 ℃
2020-07-07

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

arXiv:0712.1698v1[stat.ML]11Dec2007PAC-BAYESIANBOUNDSFORRANDOMIZEDEMPIRICALRISKMINIMIZERSPIERREALQUIERAbstrat.TheaimofthispaperistogeneralizethePAC-BayesiantheoremsprovedbyCatoni[6,8℄inthelassiationsettingtomoregeneralproblemsofstatistialinferene.Weshowhowtoontrolthedeviationsoftheriskofrandomizedestimators.Apartiularattentionispaidtorandomizedestima-torsdrawninasmallneighborhoodoflassialestimators,whosestudyleadstoontroltheriskofthelatter.Theseresultsallowtoboundtheriskofverygeneralestimationproedures,aswellastoperformmodelseletion.Contents1.Introdution21.1.Generalnotations21.2.StatistiallearningtheoryandPAC-Bayesianpointofview31.3.Trunationoftherisk51.4.Maintools51.5.AbasiPAC-BayesianTheorem71.6.Mainresultsofthepaper82.EmpirialboundfortheloalizedomplexityandloalizedPAC-Bayesiantheorems82.1.Mutualinformationbetweenthesampleandtheparameter82.2.Empirialboundoftheloalizedomplexity92.3.LoalizedPAC-Bayesiantheorems102.4.Choieoftheparameters112.5.Introdutionoftheomplexityfuntion113.Appliation:modelseletion123.1.Seletionalgorithm123.2.Empirialboundontheriskoftheseletedestimator123.3.Theoretialbound134.Conlusion145.Proofs155.1.ProofofLemma1.1155.2.ProofofTheorem1.5155.3.ProofofTheorems2.1and2.2165.4.ProofofTheorem3.216Appendix:boundingtheeetoftrunation28Referenes28Date:February2,2008.2000MathematisSubjetClassiation.Primary62G08;Seondary62H30,68T05,68T10.Keywordsandphrases.Regressionestimation,lassiation,adaptativeinferene,statistiallearning,randomizedestimator,empirialriskminimization,empirialbound.IWouldliketothankProfessorOlivierCatoniforhiskindhelpandhisusefulremarks.12P.ALQUIER1.IntrodutionTheaimofthispaperistoperformstatistialinferenewithobservationsinapossiblylargedimensionalspae.Letusrstintroduethenotations.1.1.Generalnotations.LetN∈N∗bethenumberofobservations.Let(Z,B)beameasurablespaeandP1,...,PNbeNprobabilitymeasuresonthisspae,unknowntothestatistiian.Weassumethat(Z1,...,ZN)istheanonialproessonZN,B⊗N,P1⊗...⊗PN.Denition1.1.LetusputP=P1⊗...⊗PN,andP=1NNXi=1δZi.WewanttoperformstatistialinfereneonageneralparameterspaeΘ,withrespettosomelossfuntionℓθ:Z→R,θ∈Θ.Denition1.2(Riskfuntions).Weintrodue,foranyθ∈Θ,r(θ)=P(ℓθ)=1NNXi=1ℓθ(Zi),theempirialriskfuntion,andR(θ)=P(ℓθ)=1NNXi=1Pi(ℓθ),theriskfuntion.Wenowdesribethreelassialproblemsinstatististhattthegeneralontextdesribedabove.Example1.1(Classiation).WeassumethatZ=X×YwhereXisasetofobjetsandYanitesetofpossiblelabelsfortheseobjets.Considerasetoflassiationfuntions{fθ:X→Y,θ∈Θ}whihassigntoeahobjetalabel.Letusput,foranyz=(x,y)∈Z,ℓθ(z)=ψ(fθ(x),y)whereψissomesymmetridisrepanymeasure.Themostusualaseistousethe0-1lossfuntionψ(y,y′)=δy(y′).Ifmoreover|Y|=2weandeidethatY={−1,+1}andsetψ(y,y′)=1R∗+(yy′).However,inmanypratialsituations,algorithmionsiderationsleadtouseaonvexupperboundofthislossfuntion,likeψ(y,y′)=(1−yy′)+=max(1−yy′,0),thehingeloss,ψ(y,y′)=exp(−yy′),theexponentialloss,ψ(y,y′)=(1−yy′)2,theleastsquareloss.Forexample,CortesandVapnik[10℄generalizedtheSVMtehniquetonon-separabledatausingthehingeloss,whileShapire,Freund,BartlettandLee[19℄gaveasta-tistialinterpretationofboostingalgorithmthankstotheexponentialloss.SeeZhang[22℄foraompletestudyoftheperformaneoflassiationmethodsusingtheselossfuntions.PAC-BAYESIANBOUNDSFORRANDOMIZEDEMPIRICALRISKMINIMIZERS3Example1.2(Regressionestimation).TheontextisthesameexeptthatthelabelsetYisinnite,inmostaseitisRoranintervalofR.Here,themostusualaseistheregressionwithquadratiloss,withψ(y,y′)=(y−y′)2,however,mostgeneralasesanbestudiedlikethelplossψ(y,y′)=(y−y′)pforsomep≥1.Example1.3(Densityestimation).Here,weassumethatP1=...=PN=PandonsequentlythatP=P⊗N,andwewanttoestimatethedensityf=dP/dμofPwithrespettoaknownmeasureμ.Weassumethatwearegivenasetofprobabilitymeasures{Qθ,θ∈Θ}withdensitiesqθ=dQθ/dμandweusethelossfuntionℓθ(z)=−log[qθ(z)].Indeedinthisase,weanwriteundersuitablehypothesesR(θ)=P(−log◦qθ)=P−log◦dQθdμ=Plog◦dPdQθ+Plog◦dμdP=K(P,Qθ)−P(log◦f),showingthattheriskistheKullbak-LeiblerdivergenebetweenPandQθuptoaonstant(thedenitionofKisremindedinthispaper,seeDenition1.8page5).IneahasetheobjetiveistoestimateargminRonthebasisoftheobservationsZ1,...,ZN-presumablyusinginsomewayoranotherthevalueoftheempirialrisk.WehavetonotiethatwhenthespaeΘislargeoromplex(forexampleavetorspaewithlargedimension),argminRandargminranbeverydierent.ThisdoesnothappenifΘissimple(forexampleavetorspaewithsmalldimension),butsuhaaseislessinterestingaswehavetoeliminatealotofdimensionsinΘbeforeproeedingtostatistialinferenewithnoguaranteesthatthesediretionsarenotrelevant.1.2.StatistiallearningtheoryandPAC-Bayesianpointofview.ThelearningtheorypointofviewintroduedbyVapnikandCervonenkis([9℄,seeVap-nik[21℄forapresentationofthemainresultsinEnglish)givesasettingthatprovedtobeadaptedtodealwithestimationproblemsinlargedimension.Thispointofviewreeivedanimportantinterestoverthepastfewyears,seeforexamplethewell-knownbooksofDevroye,GyrandLugosi[11℄,Friedman,HastieandTib-shirani[12℄ormorereentlythepaperbyBouheron,BousquetandLugosi[5℄andthereferenestherein,forastateoftheart.The