用R语言做非参数和半参数回归笔记

evilevo
2 ℃
2019-12-15

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

由詹鹏整理，仅供交流和学习根据南京财经大学统计系孙瑞博副教授的课件修改，在此感谢孙老师的辛勤付出！教材为：LukeKeele:SemiparametricRegressionfortheSocialSciences.JohnWiley&Sons,Ltd.2008.-------------------------------------------------------------------------第一章introduction:GlobalversusLocalStatistic一、主要参考书目及说明1、Hardle(1994).AppliedNonparameticRegresstion.较早的经典书2、Hardleetc(2004).Nonparametricandsemiparametricmodels:anintroduction.Springer.结构清晰3、LiandRacine(2007).Nonparametriceconometrics:TheoryandPractice.Princeton.较全面和深入的介绍，偏难4、PaganandUllah(1999).NonparametricEconometrics.经典5、Yatchew(2003).SemiparametricRegressionfortheAppliedEconometrician.例子不错6、高铁梅（2009）.计量经济分析方法与建模：EVIEWS应用及实例（第二版）.清华大学出版社.（P127/143）7、李雪松（2008）.高级计量经济学.中国社会科学出版社.（P45ch3）8、陈强（2010）.高级计量经济学及Stata应用.高教出版社.（ch23/24）【其他参看原ppt第一章】二、内容简介方法：——移动平均（movingaverage）——核光滑（Kernelsmoothing）——K近邻光滑（K-NN）——局部多项式回归（LocalPolynormal）——LoesssandLowess——样条光滑（SmoothingSpline）——B-spline——FriedmanSupersmoother模型：——非参数密度估计——非参数回归模型——非参数回归模型——时间序列的半参数模型——Paneldata的半参数模型——QuantileRegression三、不同的模型形式1、线性模型linearmodels2、Nonlinearinvariables3、Nonlinearinparameters四、数据转换Powertransformation（对参数方法）IntheGLMframework,modelsareequallyprone(倾向于)tosomemisspecification（不规范）fromanincorrectfunctionalform.Itwouldbeprudent（谨慎的）totestthattheeffectofanyindependentvariableofamodeldoesnothaveanonlineareffect.Ifitdoeshaveanonlineareffect,analystsinthesocialscienceusuallyrelyonPowerTransformationstoaddressnonlinearity.[ADD:检验方法见SanfordWeisberg.AppliedLinearRegression(ThirdEdition).AJohnWiley&Sons,Inc.,Publication.（本科的应用回归分析课教材）]----------------------------------------------------------------------------第二章NonparametricDensityEstimation非参数密度估计一、三种方法1、直方图Hiatogram2、Kerneldensityestimate3、Knearest-neighborsestimate二、Histogram对直方图的一个数值解释Supposex1,…xN–f(x),thedensityfunctionf(x)isunknown.Onecanusethefollowingfunctiontoestimatef(x)【与x的距离小于h的所有点的个数】三、KerneldensityestimateBandwidth:h;Windowwidth:2h.1、Kernelfunction的条件ThekernelfunctionK(.)isacontinuousfunction,symmetric(对称的)aroundzero,thatintegrates(积分)tounityandsatisfiesadditionalboundedconditions:(1)K()issymmetricaround0andiscontinuous;(2),,;(3)Either(a)K(z)=0if|z|=z0forz0Or(b)|z|K(z)à0as;(4),whereisaconstant.2、主要函数形式3、置信区间其中，4、窗宽的选择实际应用中，。其中，s是样本标准差，iqr是样本分位数级差（interquartilerange）四、Knearest-neighborsestimate五、R语言部分da-read.table(PSID.txt,header=TRUE)lhwage-da$lhwage#***bandwidth相等，核函数不同***den1-density(lhwage,bw=0.45,kernel=epan)den2-density(lhwage,bw=0.45,kernel=gauss)den3-density(lhwage,bw=0.45,kernel=biwe)den4-density(lhwage,bw=0.45,kernel=rect)plot(den4,lty=4,main=,xlab=LogHourlyWage,ylab=Kerneldensityestimates)lines(den3,lty=3,col=red)lines(den2,lty=2,col=green)lines(den1,lty=1,col=blue)#***bandwidth不相等，核函数也不同***den5-density(lhwage,bw=0.545,kernel=epan)den6-density(lhwage,bw=0.246,kernel=gauss)den7-density(lhwage,bw=0.646,kernel=biwe)den8-density(lhwage,bw=0.214,kernel=rect)plot(den8,lty=4,main=,xlab=LogHourlyWage,ylab=Kerneldensityestimates)lines(den7,lty=3,col=red)lines(den6,lty=2,col=green)lines(den5,lty=1,col=blue)----------------------------------------------------------------------------第三章smoothingandlocalregression一、简单光滑估计法SimpleSmoothing1、LocalAveraging局部均值按照x排序，将样本分成若干部分（intervalsor“bins”）；将每部分x对应的y值的均值作为f(x)的估计。三种不同方法：（1）相同的宽度（equalwidthbins）：uniformlydistributed.（2）相同的观察值个数（equalno.ofobservationsbins）：k-nearestneighbor.（3）移动平均（movingaverage）K-NN：等窗宽：2、kernelsmoothing核光滑其中，二、局部多项式估计LocalPolynomialRegression1、主要结构局部多项式估计是核光滑的扩展，也是基于局部加权均值构造。——localconstantregression——locallinearregression——lowess(Cleveland,1979)——loess(Cleveland,1988)【本部分可参考：Takezana(2006).IntroductiontoNonparametricRegression.(P1853.7andP1953.9)ChambersandHastie(1993).StatisticalmodelsinS.(P312ch8)】2、方法思路（1）对于每个xi，以该点为中心，按照预定宽度构造一个区间；（2）在每个结点区域内，采用加权最小二乘法（WLS）估计其参数，并用得到的模型估计该结点对应的x值对应y值，作为y|xi的估计值（只要这一个点的估计值）；（3）估计下一个点xj；（4）将每个y|xi的估计值连接起来。【R操作library(KernSmooth)#函数locpoly()library(locpol)#locpol();locCteSmootherC()library(locfit)#locfit()#weightfunciton:kernel=”tcub”.And“rect”,“trwt”,“tria”,“epan”,“bisq”,“gauss”】3、每个方法对应的估计形式（1）变量个数p=0,localconstantregression(kernelsmoothing)min（2）变量个数p=1,locallinearregressionmin（3）Lowess(LocalWeightedscatterplotsmoothing)p=1:min【还有个加权修正的过程，这里略，详见原书或者PPT】（4）Loess(Localregression)p=1,2:min【还有个加权修正的过程，这里略，详见原书或者PPT】（5）Friedmansupersmoothersymmetrick-NN,usinglocallinearfit,varyingspan,whichisdeterminedbylocalCV,notrobusttooutliers,fasttocomputesupsmu()inR三、模型选择需要选择的内容：（1）窗宽thespan；（2）多项式的度thedegreeofpolynomialforthelocalregressionmodels；（3）权重函数theweightfunctions。【其他略】四、R语言部分library(foreign)library(SemiPar)library(mgcv)jacob-read.table(jacob.txt,header=TRUE)################################################################################第一部分，简单的光滑估计#1、KernelDensityEstimation#IllustrationofKernelConcepts#DefiningtheWindowWidthattach(jacob)x0-sort(perotvote)[75]diffs-abs(perotvote-x0)which.diff-sort(diffs)[120]#ApplyingtheTricubeWeight#...Tricubefunctiontricube-function(z){ifelse(abs(z)1,(1-(abs(z))^3)^3,0)}#...a-seq(0,1,by=.1)tricube(a)#Figure2.5plot(range(perotvote)