logistic回归分析应用实例例8.4某工作者在探讨肾细胞癌转移的有关临床病理因素研究中,收集了一批行根治性肾切除术患者的肾癌标本资料,现从中抽取26例资料作为示例进行logistic回归分析。表中有关符号意义说明:i:样品序号x1:确诊时患者的年龄(岁)x2:肾细胞癌血管内皮生长因子(VEGF),其阳性表述由低到高共3个等级x3:肾细胞癌组织内微血管数(MVC)x4:肾癌细胞核组织学分级,由低到高共Ⅳ级x5:肾细胞癌分期,由低到高共Ⅳ期y:肾细胞癌转移情况(有转移y=1;无转移y=0)。26例行根治性肾切除术患者的肾癌标本资料iX1X2X3X4X5Y159243.4210236157.21103612190.02104583128.0431555380.0341661194.4210738176.01108421240.0320950174.01101058368.622011683132.84201225294.64311352156.01101431147.82101536331.63111642166.221017143138.633118321114.02301935140.221020703177.24312165251.644122452124.024023683127.233124312124.823025581128.043026603149.8431本题的应变量为二分类变量,用最简单的logistic回归模型进行配合,采用逐步筛选法筛选变量,程序如下:databk4_2;inputix1-x5y;cards;159243.4210236157.21103612190.02104583128.0431555380.0341661194.4210738176.01108421240.0320950174.01101058368.622011683132.84201225294.64311352156.01101431147.82101536331.63111642166.221017143138.633118321114.02301935140.221020703177.24312165251.644122452124.024023683127.233124312124.823025581128.043026603149.8431proclogisticdes;modely=x1-x5/selection=stepwise;run;程序运行的主要输出结果如下:TheLOGISTICProcedureDataSet:A.BK4_2计算所用的数据集名ResponseVariable:Y应变量ResponseLevels:2应变量的水平数NumberofObservations:26观察单位数LinkFunction:Logit联系函数ResponseProfileOrderedValueYCount1192017根据ORDER和DES选项对应变量的重新排序,给出排序值和及每个水平相应的例数,拟合排序为1对应的应变量水平的概率ModelFittingInformationandTestingGlobalNullHypothesisBETA=0对模型的总的检验,无效假设为总体的β=0,InterceptInterceptandCriterionOnlyCovariatesChi-SquareforCovariatesAIC35.54217.826.SC36.80021.600.-2LOGL33.54211.82621.716with2DF(p=0.0001)(相当于似然比χ2检验)Score..15.844with2DF(p=0.0004)(相当于Pearsonχ2检验)模型的总的检验,P值均小于0.05,故模型总体有意义。AnalysisofMaximumLikelihoodEstimatesParameterStandardWaldPrStandardizedOddSVariableDFEstimateErrorChi-SquareChi-SquareEstimateRatio自由度参数估计标准误Waldχ2P值标准化回归系数比值比INTERCPT1-12.32855.43055.15400.0232..X212.41341.19604.07190.04361.18551011.172X412.09631.08793.71310.05401.2306978.136AssociationofPredictedProbabilitiesandObservedResponses预测数和观测数的关联性分析Concordant=94.1%Somers'D=0.902Discordant=3.9%Gamma=0.920Tied=2.0%Tau-a=0.425(153pairs)c=0.951最后一部分是关于预测概率和观察到的结果的关联性,包括对不同结果的个数和四种秩相关指数的分析。逐步回归法筛选出两个有意义的变量X2和X4,其P值都小于0.05,回归系数β分别为2.4134,2.0963,比数比分别为11.172,8.136,事实上,比数比OR=ebeta。据此,写出本例的回归方程如下:Logit(P)=-12.3285+2.4134X2+2.0963X4。上面的方程中X4的P值大于0.05,但没有被剔除出去,这是因为所采用的筛选方法为Stepwise,X4的P值并没有超过剔除标准,因此仍在方程内。结合专业,最终的方程仍然保留了X4。本例用逐步回归法筛选出对患肾细胞癌有意义的危险因素有两个,肾细胞癌血管内皮生长因子(VEGF)的等级越高,肾癌细胞核组织学分级越高,患肾细胞癌的危险越大。比较两个标准化回归系数,X2对于患肾细胞癌的影响要大于X4。