LOGISTIC回归线性模型—一般线性模型线性回归方差分析协方差分析响应变量连续,正态连续,正态连续,正态解释变量连续离散连续和离散连接函数恒等式恒等式恒等式SAS实现REG/GLMMIXEDANOVA/MIXEDGLM/GENMODGLM/MIXEDGENMOD线性模型—广义线性模型logistic回归对数线性模型响应变量离散型,B(n,π)事件发生频数解释变量离散型,连续型分类变量联接函数logitln[P/(1-P)]log线性模型的条件•LINELLinearIIndependenceNNormaldistributionEEqualvarianceLOGISTIC模型•二值变量(0,1)资料的logit变换设P为事件发生的概率PPP1ln)(logitxβ0)(logPit事件发生的优势odds和样本率的关系P1.00.90.80.70.60.50.40.30.20.10.00ValueODDSP/(1-P)20151050图1事件发生的优势odds和样本率的关系图2logit函数图图2logit函数图P1.00.95.85.75.65.55.45.35.25.15.05.00ValueLOGITP6420-2-4-66420-2-4-6u(x)P1.0.8.6.4.20.0图3logistic曲线xβ0)(xuxβxβ001eeP例1.饮酒与高血压年龄组25~35~…75~85高血压+—+—+—饮+酒—194265001065164831Dataa;Inputydrinka1a2a3a4a5count@@;Cards;11000001100000001110000410100005110100025100100021110010042100010034110001019100001036110000151000001801000009000000010601100002600100001640101000290001000138010010027000010013801000101800000108801000010000000131;proclogisticdescending;freqcount;modely=a1a2a3a4a5drink;run;SAS程序1例1SAS结果解释--变量赋值•ResponseProfile•OrderedTotal•ValueyFrequency•11200•20774•Probabilitymodeledisy=1.模型中假(哑)变量的定义问题年龄组25~35~45~55~65~75~85Age123456a1010000a2001000a3000100a4000010a5000001模型中假变量的向量表示或),,,,(54321aaaaaA54321aaaaaA参数估计及模型检验•最大似然法:使似然函数L达到最大。•拟合优度检验:H0:模型拟合观察资料;H1:模型不拟合观察资料。拟合优度检验统计量:-2ln(L)在大样本条件下近似服从ν=N-m-1的χ2分布变量筛选•似然比检验(最常用)•记分检验:统计量:SCORE(公式略)•Wald检验1)ln2(ln2DFLLG1ˆˆ22DFSEiii例1模型检验统计量•ModelFitStatistics•Intercept•Interceptand•CriterionOnlyCovariates•AIC991.029802.456•SC995.910836.626•-2LogL989.029788.456例1模型检验结果TestingGlobalNullHypothesis:BETA=0TestChi-SquareDFPrChiSqLikelihoodRatio200.57316.0001Score183.55236.0001Wald125.02286.0001例1模型吻合情况•AssociationofPredictedProbabilitiesandObservedResponsesPercentConcordant75.1Somers'D0.594PercentDiscordant15.7Gamma0.654PercentTied9.1Tau-a0.194Pairs154800c0.797例1结果—参数估计StandardWaldParameterDFEstimateErrorChi-SquarePrChiSqIntercept1-5.05341.009425.0637.0001a111.54261.06592.09440.1478a213.19901.02319.77630.0018a313.71821.018513.32640.0003a413.96671.023015.03370.0001a513.96161.065013.83750.0002drink11.66710.189677.2908.0001例1结果—优势比OddsRatioEstimatesPoint95%WaldEffectEstimateConfidenceLimitsa14.6770.57937.774a224.5083.299182.048a341.1905.595303.229a452.8107.110392.225a552.5436.516423.683drink5.2973.6537.681参数的意义•优势/对数优势•优势比PPOdds1事件不发生的概率事件发生的概率)|0(/)|1()|0(/)|1(**xyPxyPxyPxyPoddsoddsOR对照组病例组常数项与预测和判别的关系•病例—对照研究中,常数项不代表各变量取值为零时人群患病OR估计值的对数。不可用于预测和判别!xβPP1ln'0条件logistic模型•匹配资料的问题根据Bayes公式推出)()(001)(AAxβxβeeDPA)()(001)(BBeeDPBxβxβ)]([11),|(BAxxβBAxxeDPA例2:1:1配对设计,胃癌与三种生活因素•每个病例按年龄、性别和居住地取健康对照,调查3种生活因素•X1:不良饮食习惯•X2:爱吃卤、腌制品•X3:精神状态dataa;inputnoyx1x2x3@@;cards;102401131020321210103030031201(略)490121491001500201501031;procphreg;modely=x1x2x3/selection=stepwiseslentry=0.05;stratano;run;SAS程序2例2条件logisticsas结果—参数估计AnalysisofMaximumLikelihoodEstimatesParameterStandardHazardVarDFEstimateErrorChi-SqPrChiSqRatiox110.785470.256869.35130.00222.193x210.814110.306797.04200.00802.257例2条件logisticsas结果—变量筛选TestingGlobalNullHypothesis:BETA=0TestChi-SquareDFPrChiSqLikelihoodRatio22.00172.0001Score17.904620.0001Wald12.414420.0020NOTE:No(additional)variablesmetthe0.05levelforentryintothemodel.例2条件logisticsas结果—变量全部入选AnalysisofMaximumLikelihoodEstimatesParameterStandardHazardVarDFEstimateErrorChi-SqPrChiSqRatiox110.823510.267009.51300.00202.278x210.825610.311417.02900.00802.283x310.498900.517440.92960.33501.647有序多分类logistic模型•累积logistic模型:设结果变量y有c个等级,如1—显效;2—有效;3—无效则用c-1个方程描述y与x的关系1,...,2,1)]|([logckkyPitkxβx累积模型程序3•dataa;•inputyx1x2count@@;•cards;•111161015•21152012•31163017•11061001•21072000•3101930010•;•proclogistic;•freqcount;•modely=x1x2/scale=noneaggregate;•run;X1性别x2方法y疗效:1显效2有效3无效有序多分类—变量赋值ResponseProfileOrderedTotalValueyFrequency112822143342累积logistic模型参数估计例AnalysisofMaximumLikelihoodEstimatesStandardWaldParameterDFEstimateErrorChi-SqPrChiSqIntercept11-2.66710.599719.7800.0001Intercept21-1.81270.556610.60640.0011x111.31870.52926.20960.0127x211.79730.472814.44930.0001X1性别x2方法y疗效:1显效2有效3无效两个方程Logit[P(Y=1|x]=-2.667+1.318x1+1.797x2Logit[P(Y=2|x]=-1.813+1.318x1+1.797x2问一女性病人使用新疗法的预期疗效?将x1=1x2=1代入方程得Logit[P(Y=1|x)]=0.448Logit[P(Y=2|x)]=1.302P(Y=1|x)=0.61P(Y=2|x)=0.79结果:此人显效0.61有效0.18无效0.21无序多分类logistic模型设结果变量y有c个等级,如1—鳞癌;2—腺癌;3—大细胞癌则用c-1个方程描述y与x的关系xβxxβx1221)]|2([log)]|1([logyPityPit分化程度、染色和组织类型分化程度X1细胞染色X2组织分型Y鳞癌(1)腺癌(2)大c癌(3)I(1)+(1)101726-(2)51250II(2)+211726-161236III(3)+151516-121220程序4无序分类模型•dataa;•dox1=1to3;•dox2=1to2;•doy=1to3;•inputcount@@;•Output;•end;end;end;•cards;•10172651250211726161226151516121220•;•proccatmodorder=data;•directx1x2;•weightcount;•modely=x1x2;•run;无序多分类SAS结果例AnalysisofMaximumLikelihoodEstimatesFunctionStandardChi-ParaNumberEstimateErrorSqPrChiSqIntercept1-0.98260.57072.960.08512-0.34610.54130.410.5226x110.62810.179912.190.000520.34540.17284.000.0456x21-0.64940.28335.260.02192-0.63520.27255.430.0197Logistic模型的应用和问题•应用筛选危险因素/校正混杂因素/预测与判别•问题1样本量不能太小2不应单纯依赖程序筛选变量,注意变量的医学意义3自变量的类型和