BAYESIANSTATISTICS8,pp.1–24.J.M.Bernardo,M.J.Bayarri,J.O.Berger,A.P.Dawid,D.Heckerman,A.F.M.SmithandM.West(Eds.)cOxfordUniversityPress,2007NonparametricFunctionEstimationUsingOvercompleteDictionariesMerliseA.ClydeandRobertL.WolpertDukeUniversity,U.S.A.clyde@stat.duke.edurlw@stat.duke.eduSummaryWeconsidertheproblemofestimatinganunknownfunctionbasedonnoisydatausingnonparametricregression.Oneapproachtothisestimationprob-lemistorepresentthefunctioninaseriesexpansionusingalinearcom-binationofbasisfunctions.Overcompletedictionariesprovidealarger,butredundantcollectionofgeneratingelementsthanabasis,however,coefficientsintheexpansionarenolongerunique.Despitethenon-uniqueness,thishasthepotentialtoleadtosparserrepresentationsbyusingfewernon-zerocoef-ficients.CompoundPoissonrandomfieldsandtheirgeneralizationtoL´evyrandomfieldsareideallysuitedforconstructionofpriorsonfunctionsusingtheseovercompleterepresentationsforthegeneralnonparametricregressionproblem,andprovideanaturallimitinggeneralizationofpriorsforthefi-nitedimensionalversionoftheregressionproblem.Whileexpressionsforposteriormodesorposteriordistributionsofquantitiesofinterestarenotavailableinclosedform,thepriorconstructionusingL´evyrandomfieldsper-mitstractableposteriorsimulationviaareversiblejumpMarkovchainMonteCarloalgorithm.Efficientcomputationispossiblebecauseupdatesbasedonadding/deletingorupdatingsingledictionaryelementsbypasstheneedtoinvertlargematrices.Furthermore,becausedictionaryelementsareonlycomputedasneeded,memoryrequirementsscalelinearlywiththesamplesize.Incomparisonwithothermethods,theL´evyrandomfieldpriorsprovideexcellentperformanceintermsofbothmeansquarederrorandcoverageforout-of-samplepredictions.KeywordsandPhrases:GaussianRandomField;InfinitelyDivisible;KernelRegression;L´evyrandomfield;NonparametricRegression;RelevanceVectorMachine;ReversibleJumpMarkovchainMonteCarlo;Spatial-TemporalModels;Splines;SupportVectorMachine;Wavelets.MerliseClydeisAssociateProfessorofStatisticsatDukeUniversity,Durham,NorthCar-olina,USA.RobertWolpertisProfessorofStatisticsatDukeUniversity.TheauthorswouldliketothankJen-hwaChu,LeannaHouse,andChongTufortheircontributions.ThismaterialisbaseduponworksupportedbytheNationalScienceFoundationunderGrantNumberDMS-0342172,DMS-0422400andDMS-0406115.Anyopinions,findings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthor(s)anddonotnecessarilyreflecttheviewsoftheNationalScienceFoundation.2M.A.ClydeandR.L.Wolpert1.INTRODUCTIONThecanonicalsetupforthenonparametricregressionproblemconsistsofhavingnmeasurementsY=(Y1,...,Yn)Tofanunknownrealvaluedfunctionf(x)definedonsomespaceX,Yi=f(xi)+i(1)observedatpointsxi∈X.Intheregressionformulationtheerrors,i,willtyp-icallyrepresentwhitenoise,iiid∼N(0,σ2),butthenonparametricmodelmaybeextendedtootherexponentialfamilymodelswhereg(E[Yi])=f(xi)forsomelinkfunctiong,asingeneralizedadditivemodels.Thefunctionf(·)willoftenbere-gardedasanelementofsomeseparableHilbertspaceHofreal-valuedfunctionsonacompactspaceX.ForBayesianinferenceregardingtheunknownmeanfunctionf,wemustfirstplaceapriordistributiononf.Ifwearetomodelfnonparamet-rically,thenweshouldplaceapriordistributionovertheinfinitedimensionalspaceHofpossiblefunctions.However,inpracticeitiscommontoplaceaprioronthefinitedimensionalvectorfn≡(f(x1),...,f(xn))Tforfn∈Rn,forexample,byexpressingfnattheobservedpointsxiintermsofafinitedimensionalbasisandplacingpriordistributionsonlyonthecoefficientsorcoordinatesoffnwithrespecttothebasis.Whileaclassofpriorsinthefinitedimensionalversionmayleadtoreasonablebehaviourofposteriorswithmodestsamplesizes,onewouldhopethatthefinitedimensionalpriorremainssensibleintheinfinitedimensionallimitandasthesamplesizenincreases.Inthispaper,wepromotetheuseofL´evyrandomfieldpriorsforstochasticexpansionsoffandshowhowL´evyrandomfieldsprovideanaturallimitingextensionofcertainfinitedimensionalpriordistributions.Webe-gininSection2byreviewingsomeofthepopularchoicesofpriorsinthethefinitedimensionalversionoftheproblem.InSection3,wepresentpriorsforstochasticexpansionsoffusingL´evyrandomfieldsandshowhowthesepriorsariseasnaturallimitsofcertainpriordistributionsonfinitedimensionalspaces.TheconnectionbetweenL´evyrandomfieldsandPoissonrandomfieldsprovidesthekeytotractablecomputationusing(reversiblejump)MarkovchainMonteCarlosamplingforthestochasticexpansions.InSection4wedescribetheresultinghierarchicalmodelanddiscusspriorspecifications.InSection5wediscusshowtheL´evyrandomfieldpri-orsleadtopenalizedlikelihoodsandcontrasttheseexpressionswithothermodelselectioncriteria.WehighlightsomeofourapplicationsofL´evyrandomfieldsinSection6.Formanyproblems,L´evyrandomfieldsprovideanattractivealternativetoGaussianrandomfieldpriors.Weconcludebydiscussingsomeareasforfutureresearch.2.PRIORDISTRIBUTIONSONFUNCTIONSWhenitcomestoplacingpriordistributionsovernonparametricfunctionsofex-planatoryvariable(s)x,Gaussianprocess(orrandomfield)priorsareperhapsthemostaccessible.IfthefunctionfhasaGaussianProcess(GP)priorwithmeanμandcovariancefunctionΣ(·,·;θ)(apositivedefinitefunctiononX×X),f(·)∼GP(μ,