Least Angle Regression

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

LeastAngleRegressionBradleyEfron,TrevorHastie,IainJohnstoneandRobertTibshiraniStatisticsDepartment,StanfordUniversityJanuary9,2003AbstractThepurposeofmodelselectionalgorithmssuchasAllSubsets,ForwardSelection,andBackwardEliminationistochoosealinearmodelonthebasisofthesamesetofdatatowhichthemodelwillbeapplied.Typicallywehaveavailablealargecollectionofpossiblecovariatesfromwhichwehopetoselectaparsimonioussetfortheefficientpredictionofaresponsevariable.LeastAngleRegression(”LARS”),anewmodelse-lectionalgorithm,isausefulandlessgreedyversionoftraditionalforwardselectionmethods.Threemainpropertiesarederived.(1)AsimplemodificationoftheLARSalgorithmimplementstheLasso,anattractiveversionofOrdinaryLeastSquaresthatconstrainsthesumoftheabsoluteregressioncoefficients;theLARSmodificationcal-culatesallpossibleLassoestimatesforagivenproblem,usinganorderofmagnitudelesscomputertimethanpreviousmethods.(2)AdifferentLARSmodificationeffi-cientlyimplementsForwardStagewiselinearregression,anotherpromisingnewmodelselectionmethod;thisconnectionexplainsthesimilarnumericalresultspreviouslyobservedfortheLassoandStagewise,andhelpsunderstandthepropertiesofbothmethods,whichareseenasconstrainedversionsofthesimplerLARSalgorithm.(3)AsimpleapproximationforthedegreesoffreedomofaLARSestimateisavailable,fromwhichwederiveaCpestimateofpredictionerror;thisallowsaprincipledchoiceamongtherangeofpossibleLARSestimates.LARSanditsvariantsarecomputation-allyefficient:thepaperdescribesapubliclyavailablealgorithmthatrequiresonlythesameorderofmagnitudeofcomputationaleffortasOrdinaryLeastSquaresappliedtothefullsetofcovariates.1.IntroductionAutomaticmodel-buildingalgorithmsarefamiliar,andsometimesnoto-rious,inthelinearmodelliterature:ForwardSelection,BackwardElimination,AllSubsetsregression,andvariouscombinationsareusedtoautomaticallyproduce“good”linearmodelsforpredictingaresponseyonthebasisofsomemeasuredcovariatesx1,x2,...,xm.Good-nessisoftendefinedintermsofpredictionaccuracy,butparsimonyisanotherimportantcriterion:simplermodelsarepreferredforthesakeofscientificinsightintothex−yrelation-ship.Twopromisingrecentmodel-buildingalgorithms,theLassoandForwardStagewiselinearregression,willbediscussedhere,andmotivatedintermsofacomputationallysimplermethodcalledLeastAngleRegression.LeastAngleRegression(“LARS”)relatestotheclassicmodel-selectionmethodknown1asForwardSelection,or“forwardstepwiseregression”,describedinSection8.5ofWeisberg(1980):givenacollectionofpossiblepredictors,weselecttheonehavinglargestabsolutecorrelationwiththeresponsey,sayxj1,andperformsimplelinearregressionofyonxj1.Thisleavesaresidualvectororthogonaltoxj1,nowconsideredtobetheresponse.Weprojecttheotherpredictorsorthogonallytoxj1andrepeattheselectionprocess.Afterkstepsthisresultsinasetofpredictorsxj1,xj2,...,xjkthatarethenusedintheusualwaytoconstructak-parameterlinearmodel.ForwardSelectionisanaggressivefittingtechniquethatcanbeoverlygreedy,perhapseliminatingatthesecondstepusefulpredictorsthathappentobecorrelatedwithxj1.ForwardStagewise,asdescribedbelow,isamuchmorecautiousversionofForwardSelection,whichmaytakethousandsoftinystepsasitmovestowardafinalmodel.Itturnsout,andthiswastheoriginalmotivationfortheLARSalgorithm,thatasimpleformulaallowsForwardStagewisetobeimplementedusingfairlylargesteps,thoughnotaslargeasaclassicForwardSelection,greatlyreducingthecomputationalburden.Thegeometryofthealgorithm,describedinSection2,suggeststhename“LeastAngleRegression”.Itthenhappensthatthissamegeometryappliestoanother,seeminglyquitedifferentselec-tionmethodcalledtheLasso(Tibshirani1996).TheLARS/Lasso/Stagewiseconnectionisconceptuallyaswellascomputationallyuseful.TheLassoisdescribednext,intermsofthemainexampleusedinthispaper.Table1showsasmallpartofthedataforourmainexample.AGESEXBMIBP···SerumMeasurements···ResponsePatientx1x2x3x4x5x6x7x8x9x10y159232.110115793.23844.987151248121.687183103.27033.96975372230.59315693.64144.785141424125.384198131.44054.989206550123.0101192125.45244.380135623122.68913964.86124.26897....................................44136130.095201125.24255.18522044236119.671250133.29734.69257Table1.Diabetesstudy.442diabetespatientsweremeasuredon10baselinevariables.Apredictionmodelwasdesiredfortheresponsevariable,ameasureofdiseaseprogressiononeyearafterbaseline.Tenbaselinevariables,age,sex,bodymassindex,averagebloodpressure,andsixbloodserummeasurementswereobtainedforeachofn=442diabetespatients,aswellastheresponseofinterest,aquantitativemeasureofdiseaseprogressiononeyearafterbaseline.Thestatisticianswereaskedtoconstructamodelthatpredictedresponseyfromcovariatesx1,x2,...,x10.Twohopeswereevidenthere,thatthemodelwouldproduceaccuratebaselinepredictionsofresponseforfuturepatients,andalsothattheformofthemodelwouldsuggestwhichcovariateswereimportantfactorsindiseaseprogression.TheLassoisaconstrainedversionofordinaryleastsquares(OLS).Letx1,x2,...,xm2ben-vectorsrepresentingthecovariates,m=10andn=442inthediabetesstudy,andythevectorofresponsesforthencases.Bylocationandscaletransformationswecanalwaysassumethatthecovariateshavebeenstandardiz

1 / 44
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功