孟生旺中国人民大学统计学院.sina/mengshw1GLM模型的结构分布假设线性预测项连接函数2011~Gamma(,)log()iiippYxxGLM模型的检验和比较关于解释变量:是否显著?方差分析,P值是否分段?是否线性?平滑函数是否存在交互效应?关于分布假设:残差分析:QQ图,蠕虫图。流量三角形的数据有限,分布假设的影响显著关于连接函数?模型比较:AIC=-2l+2*参数个数3准备金评估的数据格式流量三角形数据框4事故年进展年123456789101357,848766,940610,542482,940527,326574,398146,342139,950227,22967,9482352,118884,021933,8941,183,289445,745320,996527,804266,172425,0463290,5071,001,799926,2191,016,654750,816146,923495,992280,4054310,6081,108,250776,1891,562,400272,482352,053206,2865443,160693,190991,983769,488504,851470,6396396,132937,085847,498805,037705,9607440,832847,6311,131,3981,063,2698359,4801,061,6481,443,3709376,686986,60810344,014增量赔款的流量三角形格式(Taylor,1983)5增量赔款的数据框格式事故年进展年增量赔款11357848127669401361054214482940155273261657439817146342181399501922722911067948213521182288402123933894241183289254457452632099627527804282661722942504631290507………6描述性统计分析78910第1个事故年的进展趋势比较特殊?11均值和方差方差=122188299213方差/均值=195597Min.1stQu.MedianMean3rdQu.Max.67950352100527300624700905100156200012泊松分布拟合:图中泊松分布的概率被缩小为实际值的1%1314伽马分布拟合描述性分析的初步结论增量赔款右偏,方差远远大于均值,伽马拟合较好第1个事故年的进展模式与其它事故年不同15建模分布假设:泊松(链梯法)伽马解释变量:事故年、进展年、日历年形式:离散、连续连接函数:对数16泊松回归增量赔款~泊松分布log(增量赔款)=事故年+进展年1712345678910准备金13578481,124,7881,735,3302,218,2702,745,5963,319,9943,466,3363,606,2863,833,5153,901,46323521181,236,1392,170,0333,353,3223,799,0674,120,0634,647,8674,914,0395,339,0855,433,71994,63432905071,292,3062,218,5253,235,1793,985,9954,132,9184,628,9104,909,3155,378,826469,51143106081,418,8582,195,0473,757,4474,029,9294,381,9824,588,2685,297,906709,63854431601,136,3502,128,3332,897,8213,402,6723,873,3114,858,200984,88963961321,333,2172,180,7152,985,7523,691,7125,111,1711,419,45974408321,288,4632,419,8613,483,1305,660,7712,177,64183594801,421,1282,864,4986,784,7993,920,30193766861,363,2945,642,2664,278,972103440144,969,8254,625,811进展因子3.49060651.7473331.4574131.1738521.1038241.0862691.0538741.0765551.01772473累积进展因子14.4465774.1387012.3685821.6251961.3844991.2542761.1546641.0956371.01772473准备金合计18,680,856链梯法(使用累积赔款数据)18泊松回归的结果注意:泊松回归的所有变量高度显著!准备金估计值等价于链梯法,为18,680,85619问题?链梯法或泊松回归适用于这组数据吗?需要进行模型的诊断和比较2021应用glm函数的残差分析泊松回归的随机分位残差:gamlss22泊松回归随机分位残差的QQ图:删除了无穷大的残差23结论泊松回归(链梯法)不适用于这组数据为什么所有变量是高度显著的?低估了标准误。改用伽马回归24伽马回归的结果25回归系数的比较泊松(链梯法)伽马(Intercept)12.50612.560factor(accyear)20.3310.317factor(accyear)30.3210.283factor(accyear)40.3060.165factor(accyear)50.2190.231factor(accyear)60.2700.273factor(accyear)70.3720.352factor(accyear)80.5530.462factor(accyear)90.3690.307factor(accyear)100.2420.189factor(devyear)20.9130.909factor(devyear)30.9590.932factor(devyear)41.0260.998factor(devyear)50.4350.415factor(devyear)60.0800.111factor(devyear)7-0.006-0.054factor(devyear)8-0.394-0.450factor(devyear)90.009-0.059factor(devyear)10-1.380-1.43326标准误的比较(泊松回归系数为何高度显著?)27泊松回归伽马回归(Intercept)0.0007540.1568factor(accyear)20.00066940.1531factor(accyear)30.00068770.1601factor(accyear)40.00070080.1677factor(accyear)50.00073240.1768factor(accyear)60.00074450.1884factor(accyear)70.00076060.2041factor(accyear)80.00081330.2273factor(accyear)90.0010430.2673factor(accyear)100.0018640.3606factor(devyear)20.0006490.1531factor(devyear)30.00066520.1601factor(devyear)40.0006840.1677factor(devyear)50.00080190.1768factor(devyear)60.00093640.1884factor(devyear)70.0010390.2041factor(devyear)80.0013530.2273factor(devyear)90.0013960.2673factor(devyear)100.003910.3606AIC的比较dfAIC泊松回归(链梯法)191903877伽马20150128伽马分布假设合适吗?残差分析29伽马回归的残差(glm)30NoImage伽马回归的残差(gamlss)31残差分析:伽马回归的蠕虫图32初步结论伽马回归优于泊松回归。伽马回归的分布假设通过检验。33伽马回归的偏残差34伽马回归能否进一步改进?使用平滑函数:优点:可以提高预测结果的准确性缺点:增加解释困难3536完美拟合:折现完全平滑:回归直线平滑函数的选择拟合效果好足够平滑惩罚样条:3722PLS()()(())dyfxfxx38惩罚样条平滑(0)log(增量赔款)=事故年+进展年(1)log(增量赔款)=f(事故年)+进展年(2)log(增量赔款)=事故年+f(进展年)(3)log(增量赔款)=f(事故年)+f(进展年)(4)log(增量赔款)=f(进展年)39伽马回归模型模型dfAIClog(增量赔款)=f(事故年)+进展年151492log(增量赔款)=f(进展年)61496log(增量赔款)=f(事故年)+f(进展年)101496log(增量赔款)=事故年+进展年201501log(增量赔款)=事故年+f(进展年)151505伽马回归模型的比较4041如何比较AIC?与AIC的最小值之差模型的信息损失达到最小的相对概率10.60720.36830.22340.13550.08260.05070.03080.01890.011100.007平滑的伽马回归42NoImage43log(增量赔款)=f(事故年)+进展年44EstimateStd.ErrortvaluePr(|t|)(Intercept)12.660560.1272199.5252e-16***ps(accyear)0.031150.017261.8050.07866.factor(devyear)20.909500.125537.2458.56e-09***factor(devyear)30.945800.129777.2887.46e-09***factor(devyear)40.995660.136487.2957.31e-09***factor(devyear)50.417170.144622.8840.00629**factor(devyear)60.102780.156020.6590.51382factor(devyear)7-0.035610.16886-0.2110.83405factor(devyear)8-0.424020.18812-2.2540.02974*factor(devyear)9-0.025170.22113-0.1140.90993factor(devyear)10-1.463890.29581-4.9491.39e-05***45log(增量赔款)=f(事故年)+f(进展年)EstimateStd.ErrortvaluePr(|t|)(Intercept)13.489270.1407595.8362e-16***ps(accyear)0.030560.019461.5700.123ps(devyear)-0.100280.01946-5.1525.54e-06***log(增量赔款)=f(进展年)EstimateStd.ErrortvaluePr(|t|)(Intercept)13.669110.08497160.8612e-16***ps(devyear)-0.112920.01812-6.2331.02e-07***伽马回归log(增量赔款)=f(事故年)+进展年18,293,369log(增量赔款)=f(进展年)16,710,915log(增量赔款)=f(事故年)+f(进展年)18,225,116log(增量赔款)=事故年+进展年18,085,822log(增量赔款)=事故年+f