arXiv:physics/9701026v2[physics.data-an]28Jan1997TechnicalReportNo.9702,DepartmentofStatistics,UniversityofTorontoMonteCarloImplementationofGaussianProcessModelsforBayesianRegressionandClassificationRadfordM.NealDepartmentofStatisticsandDepartmentofComputerScienceUniversityofToronto,Toronto,Ontario,Canadaradford@stat.utoronto.ca20January1997Abstract.Gaussianprocessesareanaturalwayofdefiningpriordistributionsoverfunc-tionsofoneormoreinputvariables.Inasimplenonparametricregressionproblem,wheresuchafunctiongivesthemeanofaGaussiandistributionforanobservedresponse,aGaussianprocessmodelcaneasilybeimplementedusingmatrixcomputationsthatarefeasiblefordatasetsofuptoaboutathousandcases.HyperparametersthatdefinethecovariancefunctionoftheGaussianprocesscanbesampledusingMarkovchainmethods.Regressionmodelswherethenoisehasatdistributionandlogisticorprobitmodelsforclassificationapplicationscanbeimplementedbysamplingaswellforlatentvaluesun-derlyingtheobservations.Softwareisnowavailablethatimplementsthesemethodsusingcovariancefunctionswithhierarchicalparameterizations.Modelsdefinedinthiswaycandiscoverhigh-levelpropertiesofthedata,suchaswhichinputsarerelevanttopredictingtheresponse.1IntroductionAnonparametricBayesianregressionmodelmustbebasedonapriordistributionovertheinfinite-dimensionalspaceofpossibleregressionfunctions.IthasbeenknownformanyyearsthatsuchpriorsoverfunctionscanbedefinedusingGaussianprocesses(O’Hagan1978),andessentiallythesamemodelhaslongbeenusedinspatialstatisticsunderthenameof“kriging”.Gaussianprocessesseemtohavebeenlargelyignoredasgeneral-purposere-gressionmodels,however,apartfromthespecialcaseofsmoothingsplines(Wahba1978),andsomeapplicationstomodelingnoise-freedatafromcomputerexperiments(eg,Sack,Welch,Mitchell,andWynn1989).Recently,IhaveshownthatmanyBayesianregressionmodelsbasedonneuralnetworksconvergetoGaussianprocessesinthelimitofaninfinitenetwork(Neal1996).ThishasmotivatedexaminationofGaussianprocessmodelsforthehigh-dimensionalapplicationstowhichneuralnetworksaretypicallyapplied(WilliamsandRasmussen1996).TheempiricalworkofRasmussen(1996)hasdemonstratedthatGaus-sianprocessmodelshavebetterpredictiveperformancethanseveralothernonparametric1regressionmethodsoverarangeoftaskswithvaryingcharacteristics.Theconceptualsim-plicity,flexibility,andgoodperformanceofGaussianprocessmodelsshouldmakethemveryattractiveforawiderangeofproblems.OnereasonforthepreviousneglectofGaussianprocessregressionmaybethatinastraightforwardimplementationitinvolvesmatrixoperationswhosetimerequirementsgrowasthecubeofthenumberofcases,andwhosespacerequirementsgrowasthesquareofthenumberofcases.Twentyyearsago,thismayhavelimiteduseofsuchmodelstodatasetswithlessthanaboutahundredcases,butwithmoderncomputers,itisfeasibletoapplyGaussianprocessmodelstodatasetswithathousandormorecases.Itmayalsobepossibletoreducethetimerequirementsusingmoresophisticatedalgorithms(GibbsandMacKay1997a).ThecharacteristicsofaGaussianprocessmodelcaneasilybecontrolledbywritingthecovariancefunctionintermsof“hyperparameters”.Oneapproachtoadaptingthesehyper-parameterstotheobserveddataistoestimatethembymaximumlikelihood(ormaximumpenalizedlikelihood),ashaslongbeendoneinthecontextofspatialstatistics(eg,MardiaandMarshall1984).InafullyBayesianapproach,thehyperparametersaregivenpriordistributions.Predictionsarethenmadebyaveragingovertheposteriordistributionforthehyperparameters,whichcanbedoneusingMarkovchainMonteCarlomethods.Thesetwoapproachesoftengivesimilarresults(WilliamsandRasmussen1996,Rasmussen1996),butthefullyBayesianapproachmaybemorerobustwhenthemodelsareelaborate.ApplyingGaussianprocessmodelstoclassificationproblemspresentsnewcomputationalproblems,sincethejointdistributionofallquantitiesisnolongerGaussian.ApproximatemethodsofBayesianinferenceforsuchmodelshavebeenproposedbyBarberandWilliams(1997)andbyGibbsandMacKay(1997b).Ageneralapproachtoexactlyhandlingclassi-ficationandothergeneralizedmodels(eg,foraPoissonresponse)istouseaMarkovchainMonteCarloschemeinwhichunobserved“latentvalues”associatedwitheachcaseareexplicitlyrepresented.Thispaperappliesthisapproachtoclassificationusinglogisticorprobitmodels,andtoregressionmodelsinwhichthenoisefollowsatdistribution.IhavewrittensoftwareinCforUnixsystemsthatimplementsGaussianprocessmethodsforregressionandclassification,withinthesameframeworkasisusedbymyBayesianneuralnetworksoftware.Thissoftwareisisfreelyavailableforresearchandeducationaluse.1Thecovariancefunctionssupportedmayconsistofseveralparts,andmaybespecifiedintermsofhyperparameters,asdescribedindetailinSection3.Thesecovariancefunctionsprovidefunctionalitysimilartothatoftheneuralnetworkmodels.ThesoftwareimplementsfullBayesianinferenceforthesehierarchicalmodelsusingmatrixcomputationsandMarkovchainsamplingmethods,asdescribedinSections4and5.InSections6and7,Idemonstratetheuseofthesoftwareonathree-wayclassificationproblem,usingamodelthatcanidentifywhichoftheinputsarerelevanttopredictingtheclass,andonaregressionproblemwithoutliers.Iconcludebydiscussingsomeareasforfutureresearch.First,however,Iwill1Foll