12009年广东省各市经济统计分析——《数据分析与统计软件》实验报告伍思敏(进修)11020201、问题背景与数据描述自广东省委省政府提出建设“幸福广东”之后,如何加快全省经济的升级转型,如何促进全省各市的城乡区域协调发展,以及如何统筹全省经济社会协调发展成为全社会关注的热点。首先,我们必须了解全省21个地级市的基本经济情况,找出现行经济的不足,才能做出科学的决策。我们通过分析各市的社会经济的8项统计指标,来研究各市的经济运行情况。根据《广东年鉴2010》的资料,给出有关的数据,见表1。表12009年广东省各市社会经济统计数据地区生产总值/亿元人均生产总值/元农林牧渔业总产值/亿元工业总产值/亿元全社会固定资产投资/亿元出口总额/亿美元社会消费品零售总额/亿元城镇单位在岗职工平均工资/元广州9138.2189082295.6211376.762659.85374.053615.7749519深圳8201.329277215.4815416.241709.151619.792567.9446723珠海1038.666988951.622405.04410.51177.83404.4631764汕头1035.8720385104.711531.10291.9040.16661.9625389佛山4820.9080686195.0311711.281470.56245.781408.7834106韶关578.7519549133.42599.23356.505.79278.3628276河源405.501392886.86604.68198.1514.13139.5023803梅州519.2912558179.38351.11162.986.71267.9824097惠州1414.7035819147.913005.14758.97171.49491.1025786汕尾390.0413363111.22319.60289.439.48282.0623238东莞3763.915660125.316071.111094.08551.67959.0742585中山1566.416230477.774057.97545.61177.36549.7636165江门1340.8832139193.092933.26492.0779.49562.0724304阳江527.2722132200.16504.56239.4912.30305.3821439湛江1156.6716647397.681028.79393.2313.65559.9423944茂名1231.2519979385.381098.13180.015.32591.0524255肇庆862.0022415256.811179.01462.7720.30275.7826174清远861.5922796158.712024.06841.2414.15303.5628379潮州480.181868161.35581.07162.9818.70207.8921293揭阳816.0914159149.611153.29393.5025.25341.4619881云浮344.5114276144.91324.32240.196.16117.912191322、统计分析方法与SAS实现为了研究各市的经济情况,我们利用基本的描述性统计、因子分析、聚类分析等方法来进行多角度的分析,并用SAS完成统计分析任务。2.1数据准备和处理为便于分析和说明,在下面的中文和程序中,我们将使用以下变量来表示各经济指标,如表2所示:表2变量符号地区生产总值/亿元人均生产总值/元农林牧渔业总产值/亿元工业总产值/亿元全社会固定资产投资/亿元出口总额/亿美元社会消费品零售总额/亿元城镇单位在岗职工平均工资/元regionx1x2x3x4x5x6x7x8datacity;nputregion$x1-x8;cards;guangzhou9138.2189082295.6211376.762659.85374.053615.7749519shengzhen8201.329277215.4815416.241709.151619.792567.9446723zhuhai1038.666988951.622405.04410.51177.83404.4631764shantou1035.8720385104.711531.10291.9040.16661.9625389foshan4820.9080686195.0311711.281470.56245.781408.7834106shaoguan578.7519549133.42599.23356.505.79278.3628276heyuan405.501392886.86604.68198.1514.13139.5023803meizhou519.2912558179.38351.11162.986.71267.9824097huizhou1414.7035819147.913005.14758.97171.49491.1025786shanwei390.0413363111.22319.60289.439.48282.0623238dongguan3763.915660125.316071.111094.08551.67959.0742585zhongshan1566.416230477.774057.97545.61177.36549.7636165jiangmen1340.8832139193.092933.26492.0779.49562.0724304yangjiang527.2722132200.16504.56239.4912.30305.3821439zhanjiang1156.6716647397.681028.79393.2313.65559.9423944maoming1231.2519979385.381098.13180.015.32591.0524255zhaoqing862.0022415256.811179.01462.7720.30275.7826174qingyuan861.5922796158.712024.06841.2414.15303.5628379chaozhou480.181868161.35581.07162.9818.70207.8921293jieyang816.0914159149.611153.29393.5025.25341.4619881yunfu344.5114276144.91324.32240.196.16117.9121913;run;2.2描述性统计分析为了对数据的基本情况有一个初步的了解,我们首先进行单变量分析。利用MEANS过程计算各3个变量的描述性统计量,程序如下:procmeansdata=citymaxdec=2meanstdmaxmincvskew;varx1-x8;run;MEANS过程计算每个变量的均值、标准差、最大最小值、极差、变异系数和偏度。maxdec=2表示输出统计结果保留2位小数,结果如表3所示。表3MEANS过程VariableMeanStdDevMaximumMinimumRangeCoeffofVariationSkewnessx11928.292500.519138.21344.518793.70129.682.20x235721.9027212.7092772.0012558.0080214.0076.181.13x3160.57104.55397.6815.48382.2065.110.92x43251.234313.2515416.24319.6015096.64132.671.91x5635.87629.142659.85162.982496.8798.942.12x6170.93361.771619.795.321614.47211.653.57x7709.13860.813615.77117.913497.86121.392.62x828715.868484.2049519.0019881.0029638.0029.551.41根据表3的结果,可以得出以下结论:(1)除了x8(城镇单位在岗职工平均工资)的CV(变异系数)不是很大外,其他各个变量的CV都在50以上,其中x1(生产总值)、x4(工业总产值)、x6(出口总额)、x7(社会消费品零售总额)的CV都在100以上,这说明全省21个地级市在以上几个方面存在很大的差异,从极差可以具体看大各市间的差别,各市的发展很不平衡。(2)x8(城镇单位在岗职工平均工资)的CV(变异系数)为29.55,是所有变量中变异系数最小的,说明虽然职工的工资跟城市的发展水平有关,但其增长的幅度与城市经济发展是不相应的,城市发展了,职工并没有享受到更多的发展成果。进一步可以考虑8个变量之间的相关系数,程序如下:proccorrdata=city;varx1-x8;run;CORR过程给出变量两两之间的相关系数和显著概率(p值),如表4所示:从表4可看出很多变量之间的相关系数都在0.7以上,且显著性检验的p值都很小,这表明各变量间存在较强的相关性,它们反映的信息有所重叠,因此考虑降低维数,用较少的变量来考虑各市的经济情况。4表4PearsonCorrelationCoefficients,N=21Prob|r|underH0:Rho=0x1x2x3x4x5x6x7x8x11.000.85538.0001-0.007210.97520.94793.00010.95323.00010.77392.00010.97853.00010.89577.0001x21.00000-0.218120.34220.90767.00010.82766.00010.721780.00020.79809.00010.89720.0001x31.00000-0.140730.54290.030770.8946-0.374740.09420.094880.6825-0.199090.3869x41.000000.89387.00010.82924.00010.87906.00010.86331.0001x51.000000.626170.00240.94154.00010.87574.0001x61.000000.676340.00080.74948.0001x71.000000.85083.0001x81.000002.3因子分析采用因子分析的方法来实现对数据的降维处理,将8个经济指标综合为几个综合因子来进行研究。程序如下:procfactordata=city;varx1-x8;run;FACTOR过程计算得到数据相关矩阵的特征值、方差贡献率和累计方差贡献率如表5所示:表5相关矩阵的特征值、方差贡献率EigenvaluesoftheCorrelationMatrix:Total=8Average=1EigenvalueDifferenceProportionCumulative16.102665634.898258230.76280.762821.204407400.894745470.15060.913430.309661930.128815110.03870.952140.180846820.056787550.02260.974750.124059270.069655560.01550.990260.054403710.033312180.00680.997070.021091530.018227830.00260.999680.002863700.00041.0000从中可看出,相关矩阵的前两个特征值分别为6.10266563和1.20440740,对应两个公共因子的累计方差贡献率已达0.9134,因此2个公共因子所代表的信息已经能够很充分反映原变量。下面指定2个公共因子来进行因子分析。procfa