SPSS数据统计分析与实践主讲:周涛副教授北京师范大学资源学院2007-12-11教学网站:第十六章:岭回归(RidgeRegression)MulticollinearityRemedialMeasuresContents:Contents:1.SomeRemedialMeasuresforMulticollinearity2.PrinciplesofRidgeRegression3.SPSSExampleforRidgeRegression4.CommentsforRidgeRegressionSomeRemedialMeasuresforMulticollinearity1.AswesawinChapter9,thepresenceofseriousmulticollinearityoftendoesnotaffecttheusefulnessofthefittedmodelforestimatingmeanresponsesormakingpredictions.Hence,oneremedialmeasureistorestricttheuseofthefittedregressionmodeltoinferencesforvaluesofthepredictorvariables.SomeRemedialMeasuresforMulticollinearity2.Oneorseveralpredictorvariablesmaybedroppedfromthemodelinordertolessenthemulticollinearityandtherebyreducethestandarderrorsoftheestimatedregressioncoefficientsofthepredictorvariablesremaininginthemodel.zThisremedialmeasurehastwoimportantlimitations.zFirst,nodirectinformationisobtainedaboutthedroppedpredictorvariable.zSecond,themagnitudesoftheregressioncoefficientsforthepredictorvariablesremaininginthemodelareaffectedbythecorrelatedpredictorvariablesnotincludedinthemodel.SomeRemedialMeasuresforMulticollinearity3.Anotherremedialmeasureformulticollinearitythatcanbeusedwithordinaryleastsquaresistoformoneorseveralcompositeindexesbasedonthehighlycorrelatedpredictorvariables,anindexbeingalinearcombinationofthecorrelatedpredictorvariables.Themethodologyofprincipalcomponentsprovidescompositeindexesthatareuncorrelated.zAlimitationofprincipalcomponentsregression,alsocalledlatentrootregression(特征根回归),isthatitmaybedifficulttoattachconcretemeaningtotheindexes.SomeRemedialMeasuresforMulticollinearity4.Ridgeregressionisoneofseveralmethodsthathavebeenproposedtoremedymulticollinearityproblembymodifyingthemethodofleastsquarestoallowbiasedestimatorsoftheregressioncoefficient.岭回归的基本原理RidgeRegression—BiasedEstimationzWhenanestimatorhasonlyasmallbiasandissubstantiallymoreprecisethananunbiasedestimator,itmaywellbethepreferredestimatorsinceitwillhavealargerprobabilityofbeingclosetothetrueparametervalue.zTheFigureillustratesthissituation.Estimatorbisunbiasedbutimprecise,whereasestimatorbRismuchmoreprecisebuthasasmallbias.TheprobabilitythatbRfallsnearthetruevalueβismuchgreaterthanthatfortheunbiasedestimatorb.RidgeEstimators—forstandardizedregressionmodelForOrdinaryleastsquares,thenormalequationsaregivenby:YXbXX′=′)(Whenallvariablesaretransformedbythecorrelationtransformation(3),thetransformedregressionmodelisgivenby(4):)1,,1(1111**−=⎟⎟⎠⎞⎜⎜⎝⎛−−=⎟⎟⎠⎞⎜⎜⎝⎛−−=pkSXXnXSYYnYkkikikYiiK)1,,1(1)(1)(22−=−−=−−=∑∑pknXXSnYYSikikkiiYK(1)(2)(3)**1,*1*2*2*1*1*ipipiiiXXXYεβββ++++=−−K(4)RidgeEstimators—forstandardizedregressionmodelAndtheleastsquaresnormalequationaregivenby(5)YXXXrbr=WhererXXisthecorrelationmatrixoftheXvariablesdefinedin(6)andrYXisthevectorofcoefficientsofsimplecorrelationbetweenYandeachXvariabledefinedin(7).(5)⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎣⎡=−−−−−−1112,11,11,2211,112)1)(1(ppppppXXrrrrrrMMMrHere,r12denotesthecoefficientofsimplecorrelationbetweenX1andX2,andsoon.(6)⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎣⎡=−×−1,211)1(pYYYpYXrrrMr(7)rYXisavectorcontainingthecoefficientsofsimplecorrelationbetweenYandeachofXvariables.RidgeEstimators—forstandardizedregressionmodelYXRXXkrbIr=+)(Theridgestandardizedregressionestimatorsareobtainedbyintroducingintotheleastsquaresequations(5)abiasingconstantk≥0,inthefollowingform:(8)⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎣⎡=−×−RpRRpRbbb1211)1(Mb(9)WherebRisthevectorofthestandardizedridgeregressioncoefficients,andIisthe(p-1)×(p-1)identitymatrix.RkbRidgeEstimators—forstandardizedregressionmodelYXXXRkrIrb1)(−+=Solutionofthenormalequations(8)yieldstheridgestandardizedregressioncoefficients:(10)Theconstantkreflectstheamountofbiasintheestimator.Whenk=0,formula(10)reducestotheordinaryleastsquaresregressioncoefficientsinstandardizedform.Whenk0,theridgeregressioncoefficientsarebiasedbuttendtobemorestable(i.e.,lessvariable)thanordinaryleastsquaresestimators.RkbChoiceofBiasingConstantkz显然,由于k=0,就退化为最小二乘估计;而当kÆ∞时,就趋于0。因此,k不宜太大。z由于k的选择是任意的,岭回归分析时一个重要的问题就是k取多少合适。z由于岭回归是有偏估计,k值不宜太大;而且一般来说我们希望能尽量保留信息,即尽量能让k小些。z因此可以观察在不同k的取值时方程的变动情况,然后取使得方程基本稳定的最小k值。RkbRkbChoiceofBiasingConstantkzAcommonlyusedmethodofdeterminingthebiasingconstantkisbasedontheridgetrace(岭迹)andVarianceinflationfactor(VIF)k.zTheridgetraceisasimultaneousplotofthevaluesofthep-1estimatedridgestandardizedregressioncoefficientsfordifferentvaluesofk,usuallybetween0and1.zOnethereforeexaminetheridgetraceandchoosesthesmallestvalueofkwhereitisdeemedthattheregressioncoefficientsfirstbecomestableintheridgetrace.SPSSExampleforRidgeRegressionExampleofbodyfatzThetable16.1containsaportionofthedataforastudyoftherelationofamountofbodyfat(Y)toseveralpossiblepredictorvariables,basedonasampleof20healthyfemales25-34yearsold.Thepossiblepredictorvariablesaretriceps(肱三头肌)skinfoldthickness(X1),thighcircumference(大腿围)(X2),andmidarmcircumference(X3).Theamountofbodyfatinthetableforeachofthe20personswasobtainedbyacumbersomeandexpensiveprocedurerequiringtheimmersionofthepersoninwater.Itwouldthereforebeveryhelpfulifaregressionmodelwithsomeorallofthesep