1第五章聚类分析实验报告下表为2012年全国31个省、直辖市和自治区城镇居民家庭平均每人全年消费性支出的八个主要变量数据。X1食品支出(元/人)X5家庭日用杂品(元/人)X2衣着支出(元/人)X6交通通信(元/人)X3居住支出(元/人)X7文教娱乐(元/人)X4家庭设备及用品(元/人)X8医疗保健(元/人)表5-12012年全国城镇居民平均每人全年消费性支出数据单位(元/人)地区X1X2X3X4X5X6X7X8北京296.172638.901970.941610.70604.573781.513695.981658.37天津564.761881.431854.221151.16411.283083.372254.221556.35河北177.911541.991502.41876.10340.151723.751203.801047.28山西73.251529.471438.88832.52296.061672.291506.20905.88内蒙古144.442730.231583.561242.64446.862572.931971.781354.09辽宁474.352042.401433.281069.65480.072323.291843.891309.62吉林208.162044.801594.14871.46419.611780.671642.701447.50黑龙江213.691806.921336.85742.22338.711462.611216.561180.67上海1011.402111.171790.481906.49784.854563.803723.741016.65江苏542.921915.971437.081288.42533.962689.513077.761058.11浙江949.222109.581551.691161.39475.714133.502996.591228.02安徽268.201540.661396.97811.23244.311809.721932.741142.96福建1219.951634.211753.861254.71535.752961.782104.83773.22江西291.381476.631173.91966.23442.491501.341487.30670.71山东384.332196.981572.351125.99406.202370.231655.911005.25河南105.771885.991190.811145.42395.961730.351525.331085.47湖北334.461783.411371.15978.26405.301476.981651.921029.55湖南272.491624.571301.601034.30442.862084.151737.64918.41广东792.271520.592099.751467.20695.324176.662954.131048.28广西378.221146.461377.261125.39369.542088.641626.05883.56海南963.24864.961521.04777.20420.372004.341319.54993.24重庆266.132228.761177.021196.03499.731903.241470.641101.56四川185.401651.141284.091097.93482.161946.721587.43772.75贵州99.051399.001013.53849.94401.781891.031396.00654.53云南116.621759.89973.76634.09274.622264.231434.30939.13西藏50.961361.57845.18474.69233.801387.45550.48467.23陕西116.031789.061322.22986.82447.071788.382078.521212.44甘肃88.241631.401287.93833.15338.121575.671388.211049.65青海112.871512.241232.39923.70327.761549.761097.21906.14宁夏81.171875.701193.37929.01401.242110.411515.911063.09新疆115.342031.141166.59950.17466.461660.271280.811027.602资料来源:2013《中国统计年鉴》根据上述八个指标,下面用spss19.0对各地区分别进行系统聚类和K均值聚类,分析全国各地区城镇居民消费之间的结构化差异情况。一、系统聚类法操作(一)操作步骤1.定义变量,输入数据。2.在SPSS窗口中选择Analyze/Classify/HierachicalCluster,调出系统聚类分析主界面,并将变量X1至X8八个数据变量移入Variables框中,将diqu变量移入LabelCasesby,选择标记变量增强聚类分析结果的可读性。在Cluster栏中选择Cases单选按钮,即选择对样品进行聚类。在Display栏中选择Statistics和Plots复选框,这样在结果输出窗口中可以同时得到聚类结果统计量和统计图。3.点击Statistics按钮,设置在结果输出窗口中给出的聚类分析统计量。在ClusterMembership中,选择Rangeofsolutions,分别输入3和5,点击Continue,返回主界面。4.点击Plots按钮,设置结果输出窗口中给出的聚类分析统计图。选中Dendrogram复选框和Icicle栏中的None单选按钮,即只给出聚类树形图,而不给出冰柱图。单击Continue按钮,返回主界面。5.点击Method按钮,设置系统聚类的方法选项。ClusterMethod下拉列表用于指定聚类的方法,包括组间连接法、组内连接法、最近距离法、最远距离法等,这里选择默认的组间连接法;Measure栏用于选择对距离和相似性的测度方法,这里选择默认的欧氏距离平方;TransformValues和TransformMeasures栏用于选择对原始数据进行标准化的方法。这里沿用系统默认选项。单击Continue按钮,返回主界面。6.点击Save按钮,指定保存在数据文件中的用于保存聚类结果的新变量。None表示不保存任何新变量;Singlesolution表示生成一个分类变量。选择Rangeofsolutions,在其中输入3和5,即生成三个新的分类变量,分别表明将样品分为3类、4类和5类时的聚类结果进行保存。点击Continue,返回主界面。7.点击OK按钮,运行系统聚类过程。3(二)主要运行结果解释1.样本聚类结果表5-2ClusterMembershipCase5Clusters4Clusters3Clusters1:北京1112:天津2223:河北3334:山西3335:内蒙古3336:辽宁3337:吉林3338:黑龙江3339:上海41110:江苏22211:浙江41112:安徽33313:福建22214:江西33315:山东33316:河南33317:湖北33318:湖南33319:广东41120:广西33321:海南33322:重庆33323:四川33324:贵州33325:云南33326:西藏54327:陕西33328:甘肃33329:青海33330:宁夏33331:新疆33342.有效样本31个,没有缺失值。表5-3CaseProcessingSummaryCasesValidMissingTotalNPercentNPercentNPercent31100.00.031100.03.聚类详细过程如下表5-4AgglomerationScheduleStageClusterCombinedCoefficientsStageClusterFirstAppearsNextStageCluster1Cluster2Cluster1Cluster21182376857.385009242879096.645003334126451.61302541631132889.7020085329170084.736301061227188923.97100157615190150.955001881622206930.613401391830207898.85210131038219549.7435012111424244375.437001412317317188.66610014131618377076.178891614314423989.87312111615712457926.240061716316510241.6891413171737659939.0501615201856668028.0960726192021679067.876002320325726856.13317023211119849632.6990024222131177073.2750025233201293652.109201926249111453416.46202127252101516143.4502202926351584911.53923182827192162388.18202430283262642718.9522602929233571326.040252830301210190001.6282729054.聚类树形图(Dendrogram)图5-1聚类树形图(三)分类汇总及分析根据聚类结果将全国31个省、直辖市和自治区分成三、四类,中间分布集中,两级分化严重,故分成五类为佳。第一类:北京第二类:上海、浙江、广东第三类:天津、江苏、福建第四类:其余省份第五类:西藏6二、K均值法聚类分析(一)操作步骤1.在SPSS窗口中选择Analyze/Classify/K-MeansCluster,调出K均值聚类分析主界面,并将变量X1至X8八个数据变量移入Variables框中,将标志变量diqu移入LabelCaseby框中。在Method框中选择Iterateclassify,即使用K-means算法不断计算新的类中心,并替换旧的类中心。在NumberofCluster中输入5。2.点击Iterate按钮,对迭代参数进行设置。MaximumIterations设定K-means算法迭代的最大次数,ConvergenceCriterion中设定收敛判据。选择系统默认标准,单击Continue,返回主界面。3.点击Save按钮,设置保存在数据文件中的表面聚类结果的新变量。选中Clustermembership和Distancefromclustercenter两个复选框,点击Continue按钮,返回主界面。4.点击Options按钮,指定要计算的统计量。选中Initialclustercenters和Clusterinformationforeachcase复选框,点击Continue按钮,返回主界面。5.点击OK按钮,运行K均值聚类分析程序。(二)主要运行结果解释1.最终聚类中心表,分析各类消费支出强弱差异。表5-5FinalClusterCentersCluster12345x1762.27618.02203.79670.7381.92x22095.062040.461781.051005.711436.91x31853.221657.181301.621449.151038.79x41536.451234.23943.71951.30699.20x5640.11481.96395.94394.96280.78x64163.872826.901846.072046.491468.61x73342.612352.151555.571472.80823.85x81237.831185.441029.69938.40686.6