R软件一元线性回归分析合金钢强度与碳含量的数据序号碳含量/%合金钢强度/107pa10.1042.020.1143.030.1245.040.1345.050.1445.060.1547.570.1649.080.1753.090.1850.0100.2055.0110.2155.0120.2360.0这里取碳含量为x是普通变量,取合金钢强度为y是随机变量使用R软件对以上数据绘出散点图程序如下:x=matrix(c(0.1,42,0.11,43,0.12,45,0.13,45,0.14,45,0.15,47.5,0.16,49,0.17,53,0.18,50,0.2,55,0.21,55,0.23,60),nrow=12,ncol=2,byrow=T,dimnames=list(1:12,c(C,E)))outputcost=as.data.frame(x)plot(outputcost$C,outputcost$E)0.100.120.140.160.180.200.2245505560outputcost$Coutputcost$E很显然这些点基本上(但并不精确地)落在一条直线上。下面在之前数据录入的基础上做回归分析(程序接前文,下同)lm.sol=lm(E~C,data=outputcost)summary(lm.sol)得到以下结果:Call:lm(formula=E~C,data=outputcost)Residuals:Min1QMedian3QMax-2.00449-0.63600-0.024010.712972.32451Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)28.0831.56717.926.27e-09***C132.8999.60613.847.59e-08***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:1.309on10degreesoffreedomMultipleR-squared:0.9503,AdjustedR-squared:0.9454F-statistic:191.4on1and10DF,p-value:7.585e-08由计算结果分析:常数项0=28.083,变量(即碳含量)的系数1=132.899得到回归方程:y=28.083+132.899x由于回归模型建立使用的是最小二乘法,而最小二乘法只是一种单纯的数学方法,存在着一定的缺陷,即不论变量间有无相关关系或有无显著线性相关关系,用最小二乘法都可以找到一条直线去拟合变量间关系。所以回归模型建立之后,还要对其进行显著性检验:在上面的结果中sd(0)=1.567,sd(1)=9.606。而对应于两个系数的P值6.27e-09和7.59e-08,故是非常显著的。关于方程的检验,残差的标准差=1.309。相关系数的平方R2=0.9503。关于F分布的P值为7.585e-08,也是非常显著的。我们将得到的直线方程画在散点图上,程序如下:abline(lm.sol)得到散点图及相应的回归直线:0.100.120.140.160.180.200.2245505560outputcost$Coutputcost$E下面分析残差:在R软件中,可用函数residuals()计算回归方程的残差。程序如下:y.res=residuals(lm.sol);plot(y.res)得到残差图24681012-2-1012Indexy.res从残差图可以看出,第8个点有些反常,这样我们用程序将第8个点的残差标出,程序如下:text(8,y.res[8],labels=8,adj=1.2)24681012-2-1012Indexy.res8这个点可能有问题,下面做简单处理,去掉该样本点,编程如下:i=1:12;outputcost2=as.data.frame(x[i!=8,])lm2=lm(E~C,data=outputcost2)summary(lm2)结果输出如下:Call:lm(formula=E~C,data=outputcost2)Residuals:Min1QMedian3QMax-1.7567-0.5067-0.13080.68211.6787Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)28.1241.33521.065.75e-09***C131.2938.21715.986.51e-08***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:1.115on9degreesoffreedomMultipleR-squared:0.966,AdjustedR-squared:0.9622F-statistic:255.3on1and9DF,p-value:6.506e-08由结果分析,去掉第8个点之后,回归方程系数变化不大,R2相关系数有所提高,并且p-值变小了,这说明样本点8可以去掉。所得新模型较为理想。总结程序如下:x2=matrix(c(0.1,42,0.11,43,0.12,45,0.13,45,0.14,45,0.15,47.5,0.16,49,0.18,50,0.2,55,0.21,55,0.23,60),nrow=11,ncol=2,byrow=T,dimnames=list(1:11,c(C,E)))outputcost=as.data.frame(x2)plot(outputcost$C,outputcost$E)lm.sol=lm(E~C,data=outputcost)summary(lm.sol)Call:lm(formula=E~C,data=outputcost)Residuals:Min1QMedian3QMax-1.7567-0.5067-0.13080.68211.6787Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)28.1241.33521.065.75e-09***C131.2938.21715.986.51e-08***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:1.115on9degreesoffreedomMultipleR-squared:0.966,AdjustedR-squared:0.9622F-statistic:255.3on1and9DF,p-value:6.506e-08abline(lm.sol)得到最后的散点图和回归直线0.100.120.140.160.180.200.2245505560outputcost$Coutputcost$E得到回归方程:y=28.124+131.293x