回归分析:两个变量之间:1直线多重现性回归(一个岁多个变量)2曲线样本的代表性数据:1准确测量(测量方法仪器技术水平)2调查(询问发调查表)过失误差(调查或记录错误)数据本身(异常点)回归分析中的诊断:1数据本身(异常点)2贡献性诊断筛选变量(8):前进后退逐步回归(多重线性多重LOGISTIC)多重线性回归结果变量是定量的最好是否合正太分布不管二值还是多重都用多重LOGISTIC一:考虑药物种类就是单因素非单组设计线性回归简单回归(样本量为30的单组二元)设有30例某病患者,将他们随机均分为两组,第1组用A药治疗,第2组用B药治疗,对每一位患者均观测性别、年龄、体重、CD34+和微核细胞数(MNC),资料见表2。表3-29两种药物治疗同一种疾病患者的部分原因和指标的观测结果药物编号性别年龄(岁)体重(kg)MNC(x108/kg)CD34+(x106/kg)A1男31604.427.072女43582.671.393男55584.142.154男55583.231.585女35602.541.096男24582.371.427男37602.380.488男37602.581.559女43604.542.9510男26601.240.3111女38682.433.4312女29732.161.1913男46733.494.3614男43853.065.5115男46852.652.41B1女38553.864.982男16466.005.883女28584.573.664女30603.021.965女32603.752.666女38605.419.207男38682.683.648男38682.733.069男46563.993.8310男46563.841.1511男20605.796.5412男20605.233.1413女49573.422.3314男36674.381.9315女43757.608.36请按要求实现如下的统计分析,并给出统计和专业结论。(1)假定不考虑药物种类、性别、年龄的影响,仅考察CD34+与MNC之间的相互关系和依赖关系(其中MNC是不便观测的定量指标),请选择合适的统计分析方法去处理资料datalist;inputzusex1$ageweightcd34mnc@@;ifsex1='male'thensex=0;elsesex=1;cards;1male31607.074.421female43581.392.671male55582.154.141male55581.583.231female35601.092.541male24581.422.371male37600.482.381male37601.552.581female43602.954.541male26600.311.241female38683.432.431female29731.192.161male46734.363.491male43855.513.061male46852.412.652female38554.983.862male16465.8862female28583.664.572female30601.963.022female32602.663.752female38609.25.412male38683.642.682male38683.062.732male46563.833.992male46561.153.842male20606.545.792male20603.145.232female49572.333.422male36671.934.382female43758.367.6;run;symbol1cv=redv=diamond绘图的点地表示ci=yellowi=rlclm95co=cyan;*SYMBOL语句:图形符号及线条控制语句,可定义点的颜色、形状与插值方法;symbol1定义绘图1PROCGPLOTDATA=LIST;绘图数据来源于LISTPLOTmnc*cd34;mnc纵坐标变量cd横坐标可以写mnc*cd34=’*’指定绘图符号run;结束一个一般的SAS过程步还在运行ODSHTML;PROCCORRDATA=LIST;*FISHER(alpha=0.05biasadj=no)PEARSON;去掉;*就开始运行VARcd34mnc;VAR指明运行的变量quit;彻底结束某些过程步某些特定的SAS过程ODSHTMLCLOSE;ODSHTML;optionsls=200;输出的结果中每行可写200个字符在回归分析之前加回归分析中加上参差结果比较多每行通常默认70个PS=500即每页可打500行PROCREGDATA=LIST;MODELmnc=cd34/R;R参差分析为了发现数据中的异常点或者异常点的诊断run;ODSHTMLCLOSE;dataddd;*去除异常点后再作一遍;setlist;if_n_=30thendelete;run;symbolcv=redv=diamondci=yellowi=rlclm95co=black;PROCGPLOTDATA=ddd;PLOTmnc*cd34;run;ODSHTML;PROCCORRDATA=dddFISHER(alpha=0.05biasadj=no)PEARSON;VARcd34mnc;RUN;ODSHTMLCLOSE;ODSHTML;PROCREGDATA=ddd;MODELmnc=cd34/RCLICLM;*alpha=0.01;CLI个体值给出置信限单个数值的置信限CL置信限CLM总体均数置信限所有特定X下的Y总体的平均数alpha=0.01按@=0.99算plotr.*p.;(r.*p)纵轴变量参差P横轴上是Y的估计值在纵轴上找到一个0点划平行于X轴的线其他上下波动则比较直观看出拟合效果上下分布均匀比较好quit;ODSHTMLCLOSE;CoeffVar变异系数一般小于20即写成20%R-Square决定系数Pr|t|表示俩个截据是0的可能性RootMSE是Error的开方参差图黄色线Y=Y的估计值参差为0的线没有表现出规律随机的分布2)研究者希望根据此类疾病患者的“药物种类、性别、年龄、体重、CD34+”的信息,去预测MNC的数值大小,请选择合适的统计分析方法处理资料;【SAS程序】:练习2(1),练习2(2)多重线性回归模型优劣的评价标准:其一,拟合的多重回归方程在整体上有统计学意义;其二,多重回归方程中各回归参数的估计值的假设检验结果都有统计学意义;其三,多重回归方程中各回归参数的估计值的正负号与其后的变量在专业上的含义相吻合;其四,根据多重回归方程计算出因变量的所有预测值在专业上都有意义;其五,若有多个较好的多重回归方程时,残差平方和较小且多重回归方程中所含的自变量的个数又较少者为最佳。*做逐步回归;ODSHTML;PROCREGdata=list;MODELmnc=zusexageweightcd34/mnc作为结果变量SELECTION=STEPWISESLE=0.3SLS=0.05RSTB;SLE进入方程水平数SLS剔除水平数STB标准化回归系数那些贡献大那些小针对定量变量(有单位)quit;ODSHTMLCLOSE;dataLIST1;*去除异常点后再作一遍;setlist;if_n_=30thendelete;run;ODSHTML;PROCREGdata=list1;MODELmnc=zusexageweightcd34/SELECTION=STEPWISESLE=0.3SLS=0.05RSTB;plotr.*p.;quit;ODSHTMLCLOSE;dataLIST2;*去除异常点后再作一遍;setlist1;if_n_=29thendelete;run;ODSHTML;PROCREGdata=list2;MODELmnc=zusexageweightcd34/SELECTION=STEPWISESLE=0.3SLS=0.05RSTB;plotr.*p.;quit;ODSHTMLCLOSE;dataLIST3;*去除异常点后再作一遍;setlist2;if_n_=27thendelete;run;ODSHTML;PROCREGdata=list3;MODELmnc=zusexageweightcd34/SELECTION=STEPWISESLE=0.3SLS=0.05RSTB;plotr.*p.;quit;ODSHTMLCLOSE;dataLIST4;*去除异常点后再作一遍;setlist3;if_n_=10thendelete;run;ODSHTML;PROCREGdata=list4;MODELmnc=zusexageweightcd34/SELECTION=STEPWISESLE=0.3SLS=0.05RSTB;plotr.*p.;quit;ODSHTMLCLOSE;*前进法;ODSHTML;PROCREGdata=list4;MODELmnc=zusexageweightcd34/SELECTION=forwardSLE=0.05RSTB;plotr.*p.;quit;ODSHTMLCLOSE;*做后退法;ODSHTML;PROCREGdata=list4;MODELmnc=zusexageweightcd34/SELECTION=backwardSLS=0.05RSTB;plotr.*p.;quit;ODSHTMLCLOSE;*作共线性诊断;ODSHTML;PROCREGdata=list4;MODELmnc=weightcd34/COLLINCOLLINOINT;方差比例法估计共线性方法一方差比方法二方差膨胀因子VIFTOLquit;ODSHTMLCLOSE;datalist;inputzusex1$ageweightcd34mnc@@;ifsex1='male'thensex=0;elsesex=1;cards;1male31607.074.421female43581.392.671male55582.154.141male55581.583.231female35601.092.541male24581.422.371male37600.482.381male37601.552.581female43602.954.541male26600.311.241female38683.432.431female29731.192.161male46734.363.491male43855.513.061male46852.412.652female38554.983.862male16465.8862female28583.664.572female30601.963.022female32602.663.752female38609.25.412male38683.642.682male38683.062.732male46563.833.992male46561.153.842male20606.545.792male20603.145.232female49572.333.422male36671.934.382female43758.367.6;run;*下面三个是最优回归子集法;*R平方选择法(RSQUARE):;ODSHTML;PROCREGdata=list;MODELmnc=zusexageweightcd34/SELECTION=RSQUARE;quit;ODSHTMLCLOSE;*修正R平方选择法(ADJRSQ):;ODSHTML;PROCREGdata=list;MODELmnc=zusexageweightcd34/SELECTION=ADJRSQ;quit;ODSHTMLCLOSE;*CP法:;ODSHTML;PROCREGdata=list;MODELmnc=zusexageweightcd34/SELECTION=CP;quit;ODS