Generalized-Additive-Models

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

GeneralizedadditivemodelsTrevorHastieandRobertTibshiraniy1IntroductionInthestatisticalanalysisofclinicaltrialsandobservationalstudies,theiden-ticationandadjustmentforprognosticfactorsisanimportantcomponent.Validcomparisonsofdierenttreatmentsrequirestheappropriateadjust-mentforrelevantprognosticfactors.Thefailuretoconsiderimportantprog-nosticvariables,particularlyinobservationalstudies,canleadtoerrorsinestimatingtreatmentdierences.Inaddition,incorrectmodelingofprognos-ticfactorscanresultinthefailuretoidentifynonlineartrendsorthresholdeectsonsurvival.Thisarticledescribesexiblestatisticalmethodsthatmaybeusedtoidentifyandcharacterizetheeectofpotentialprognosticfactorsonanout-comevariable.Thesemethodsarecalled\generalizedadditivemodels,andextendthetraditionallinearstatisticalmodel.Theycanbeappliedinanysettingwherealinearorgeneralizedlinearmodelistypicallyused.Theseset-tingsincludestandardcontinuousresponseregression,categoricalororderedcategoricalresponsedata,countdata,survivaldataandtimeseries.Oneofthemostcommonlyusedstatisticalmodelsinmedicalresearchisthelogisticregressionmodelforbinarydata.Weuseithereasaspecicillustrationofageneralizedadditivemode.Logisticregression(andmanyDepartmentofStatisticsandDivisionofBiostatistics,StanfordUniversity,StanfordCalifornia94305;trevor@stat.stanford.eduyDepartmentofPreventiveMedicineandBiostatistics,andDepartmentofStatistics,UniversityofToronto;tibs@playfair.stanford.edu;tibs@utstat.toronto.edu1othertechniques)modeltheeectsofprognosticfactorsxjintermsofalinearpredictoroftheformPxjj,wherethejareparameters.ThegeneralizedadditivemodelreplacesPxjjwithPfj(xj)wherefjisaunspecied(\non-parametric)function.Thisfunctionisestimatedinaexiblemannerusingascatterplotsmoother.Theestimatedfunction^fj(xj)canrevealpossiblenonlinearitiesintheeectofthexj.Werstgivesomebackgroundonthemethodology,andthendiscussthedetailsofthelogisticregressionmodelanditsgeneralization.Somerelateddevelopmentsarediscussedinthelastsection.2Smoothingmethodsandgeneralizedaddi-tivemodelsThebuildingblockofthegeneralizedadditivemodelalgorithmisthescat-terplotsmoother.Wewillrstdescribescatterplotsmoothinginasimplesetting,andthenindicatehowitisusedingeneralizedadditivemodeling.Supposethatwehaveascatterplotofpoints(xi;yi)likethatshowningure1.Hereyisaresponseoroutcomevariable,andxisaprognosticfactor.Wewishtotasmoothcurvef(x)thatsummarizesthedependenceofyonx.IfweweretondthecurvethatsimplyminimizesP(yif(xi))2,theresultwouldbeaninterpolatingcurvethatwouldnotbesmoothatall.Thecubicsplinesmootherimposessmoothnessonf(x).Weseekthefunctionf(x)thatminimizesX(yif(xi))2+Zf00(x)2dx(1)NoticethatRf00(x)2measuresthe\wigglinessofthefunctionf:linearfshaveRf00(x)2=0,whilenon-linearfsproducevaluesbiggerthanzero.isanon-negativesmoothingparameterthatmustbechosenbythedataanalyst.Itgovernsthetradeobetweenthegoodnessofttothedata(asmeasuredbyP(yif(xi))2)andwigglinessofthefunction.Largervaluesofforceftobesmoother.Foranyvalueof,thesolutionto(1)isacubicspline,i.e.,apiecewisecubicpolynomialwithpiecesjoinedattheuniqueobservedvaluesofxinthedataset.Fastandstablenumericalproceduresareavailableforcomputation2********************************************************************************xy0.51.52.5-101********************************************************************************xy0.51.52.5-101Figure1:Leftpanelshowsactitiousscatterplotofanoutcomemeasureyplottedagainstaprognosticfactorx.Intherightpanel,ascatterplotsmoothhasbeenaddedtodescribethetrendofyonx.ofthettedcurve.Therightpanelofgure1showsacubicsplinettothedata.Whatvalueofdidweuseingure1?Infactitisnotconvenienttoexpressthedesiredsmoothnessoffintermsof,asthemeaningofdependsontheunitsoftheprognosticfactorx.Instead,itispossibletodenean\eectivenumberofparametersor\degreesoffreedomofacubicsplinesmoother,andthenuseanumericalsearchtodeterminethevalueoftoyieldthisnumber.Ingure1wechosetheeectivenumberofparameterstobe5.Roughlyspeaking,thismeansthatthecomplexityofthecurveisaboutthesameasapolynomialregressionofdegrees4.However,thecubicsplinesmoother\spreadsoutitsparametersinamoreevenmanner,andhenceismuchmoreexiblethanapolynomialregression.Notethatthedegreesoffreedomofasmootherneednotbeaninteger.Theabovediscussiontellshowtotacurvetoasingleprognosticfactor.Withmultipleprognosticfactors,ifxijdenotesthevalueofthejthprognostic3factorfortheithobservation,wettheadditivemodel^yiXjfj(xij)(2)Acriterionlike(1)canbespeciedforthisproblem,andasimpleiterativeprocedureexistsforestimatingthefjs.WeapplyacubicsplinesmoothertotheoutcomeyiPj6=k^fj(xij)asafunctionofxik,foreachprognosticfactorinturn.Theprocessiscontinuesuntiltheestimates^fjstabilize.Theseprocedureisknownas\backttingandtheresultingtisanalogoustoamultipleregressionforlinearmodels.Whengeneralizedadditivemodelsarettobinaryresponsedata(andinmanyothersettings),theappropriateerrorcriterionisapenalizedloglike-lihoodorapenalizedlogpartial-likelihood.Tomaximizeit,thebackttingprocedureisusedinconjunctionwithamaximumlikelihoodormax

1 / 10
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功