arXiv:0712.1698v1[stat.ML]11Dec2007PAC-BAYESIANBOUNDSFORRANDOMIZEDEMPIRICALRISKMINIMIZERSPIERREALQUIERAbstrat.TheaimofthispaperistogeneralizethePAC-BayesiantheoremsprovedbyCatoni[6,8℄inthelassi ationsettingtomoregeneralproblemsofstatistialinferene.Weshowhowtoontrolthedeviationsoftheriskofrandomizedestimators.Apartiularattentionispaidtorandomizedestima-torsdrawninasmallneighborhoodoflassialestimators,whosestudyleadstoontroltheriskofthelatter.Theseresultsallowtoboundtheriskofverygeneralestimationproedures,aswellastoperformmodelseletion.Contents1.Introdution21.1.Generalnotations21.2.StatistiallearningtheoryandPAC-Bayesianpointofview31.3.Trunationoftherisk51.4.Maintools51.5.AbasiPAC-BayesianTheorem71.6.Mainresultsofthepaper82.EmpirialboundfortheloalizedomplexityandloalizedPAC-Bayesiantheorems82.1.Mutualinformationbetweenthesampleandtheparameter82.2.Empirialboundoftheloalizedomplexity92.3.LoalizedPAC-Bayesiantheorems102.4.Choieoftheparameters112.5.Introdutionoftheomplexityfuntion113.Appliation:modelseletion123.1.Seletionalgorithm123.2.Empirialboundontheriskoftheseletedestimator123.3.Theoretialbound134.Conlusion145.Proofs155.1.ProofofLemma1.1155.2.ProofofTheorem1.5155.3.ProofofTheorems2.1and2.2165.4.ProofofTheorem3.216Appendix:boundingthee etoftrunation28Referenes28Date:February2,2008.2000MathematisSubjetClassi ation.Primary62G08;Seondary62H30,68T05,68T10.Keywordsandphrases.Regressionestimation,lassi ation,adaptativeinferene,statistiallearning,randomizedestimator,empirialriskminimization,empirialbound.IWouldliketothankProfessorOlivierCatoniforhiskindhelpandhisusefulremarks.12P.ALQUIER1.IntrodutionTheaimofthispaperistoperformstatistialinferenewithobservationsinapossiblylargedimensionalspae.Letus rstintroduethenotations.1.1.Generalnotations.LetN∈N∗bethenumberofobservations.Let(Z,B)beameasurablespaeandP1,...,PNbeNprobabilitymeasuresonthisspae,unknowntothestatistiian.Weassumethat(Z1,...,ZN)istheanonialproesson ZN,B⊗N,P1⊗...⊗PN.De nition1.1.LetusputP=P1⊗...⊗PN,andP=1NNXi=1δZi.WewanttoperformstatistialinfereneonageneralparameterspaeΘ,withrespettosomelossfuntionℓθ:Z→R,θ∈Θ.De nition1.2(Riskfuntions).Weintrodue,foranyθ∈Θ,r(θ)=P(ℓθ)=1NNXi=1ℓθ(Zi),theempirialriskfuntion,andR(θ)=P(ℓθ)=1NNXi=1Pi(ℓθ),theriskfuntion.Wenowdesribethreelassialproblemsinstatististhat tthegeneralontextdesribedabove.Example1.1(Classi ation).WeassumethatZ=X×YwhereXisasetofobjetsandYa nitesetofpossiblelabelsfortheseobjets.Considerasetoflassi ationfuntions{fθ:X→Y,θ∈Θ}whihassigntoeahobjetalabel.Letusput,foranyz=(x,y)∈Z,ℓθ(z)=ψ(fθ(x),y)whereψissomesymmetridisrepanymeasure.Themostusualaseistousethe0-1lossfuntionψ(y,y′)=δy(y′).Ifmoreover|Y|=2weandeidethatY={−1,+1}andsetψ(y,y′)=1R∗+(yy′).However,inmanypratialsituations,algorithmionsiderationsleadtouseaonvexupperboundofthislossfuntion,likeψ(y,y′)=(1−yy′)+=max(1−yy′,0),thehingeloss,ψ(y,y′)=exp(−yy′),theexponentialloss,ψ(y,y′)=(1−yy′)2,theleastsquareloss.Forexample,CortesandVapnik[10℄generalizedtheSVMtehniquetonon-separabledatausingthehingeloss,whileShapire,Freund,BartlettandLee[19℄gaveasta-tistialinterpretationofboostingalgorithmthankstotheexponentialloss.SeeZhang[22℄foraompletestudyoftheperformaneoflassi ationmethodsusingtheselossfuntions.PAC-BAYESIANBOUNDSFORRANDOMIZEDEMPIRICALRISKMINIMIZERS3Example1.2(Regressionestimation).TheontextisthesameexeptthatthelabelsetYisin nite,inmostaseitisRoranintervalofR.Here,themostusualaseistheregressionwithquadratiloss,withψ(y,y′)=(y−y′)2,however,mostgeneralasesanbestudiedlikethelplossψ(y,y′)=(y−y′)pforsomep≥1.Example1.3(Densityestimation).Here,weassumethatP1=...=PN=PandonsequentlythatP=P⊗N,andwewanttoestimatethedensityf=dP/dμofPwithrespettoaknownmeasureμ.Weassumethatwearegivenasetofprobabilitymeasures{Qθ,θ∈Θ}withdensitiesqθ=dQθ/dμandweusethelossfuntionℓθ(z)=−log[qθ(z)].Indeedinthisase,weanwriteundersuitablehypothesesR(θ)=P(−log◦qθ)=P−log◦dQθdμ=Plog◦dPdQθ+Plog◦dμdP=K(P,Qθ)−P(log◦f),showingthattheriskistheKullbak-LeiblerdivergenebetweenPandQθuptoaonstant(thede nitionofKisremindedinthispaper,seeDe nition1.8page5).IneahasetheobjetiveistoestimateargminRonthebasisoftheobservationsZ1,...,ZN-presumablyusinginsomewayoranotherthevalueoftheempirialrisk.WehavetonotiethatwhenthespaeΘislargeoromplex(forexampleavetorspaewithlargedimension),argminRandargminranbeverydi erent.ThisdoesnothappenifΘissimple(forexampleavetorspaewithsmalldimension),butsuhaaseislessinterestingaswehavetoeliminatealotofdimensionsinΘbeforeproeedingtostatistialinferenewithnoguaranteesthatthesediretionsarenotrelevant.1.2.StatistiallearningtheoryandPAC-Bayesianpointofview.ThelearningtheorypointofviewintroduedbyVapnikandCervonenkis([9℄,seeVap-nik[21℄forapresentationofthemainresultsinEnglish)givesasettingthatprovedtobeadaptedtodealwithestimationproblemsinlargedimension.Thispointofviewreeivedanimportantinterestoverthepastfewyears,seeforexamplethewell-knownbooksofDevroye,Gy r andLugosi[11℄,Friedman,HastieandTib-shirani[12℄ormorereentlythepaperbyBouheron,BousquetandLugosi[5℄andthereferenestherein,forastateoftheart.The