R语言学习系列23-描述性统计

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

21.描述性统计一、整体的描述统计1.函数summary()输出数据的最小值、中位数、均值、上下四分位数、最大值。vars-c(mpg,hp,wt)summary(mtcars[vars])mpghpwtMin.:10.40Min.:52.0Min.:1.5131stQu.:15.431stQu.:96.51stQu.:2.581Median:19.20Median:123.0Median:3.325Mean:20.09Mean:146.7Mean:3.2173rdQu.:22.803rdQu.:180.03rdQu.:3.610Max.:33.90Max.:335.0Max.:5.4242.pastecs包中的函数stat.desc()基本格式为:stat.desc(x,basic=TRUE,desc=TRUE,norm=FALSE,p=0.95)其中,x为数据框或时间序列;basic默认为TRUE,则计算所有值、空值、缺失值的数量;desc默认为TRUE,则计算中位数、平均数、平均数的标准误、平均数置信度为95%的置信区间、方差、标准差及变异系数;norm默认为FALSE,若=TRUE则计算正态分布统计量:偏度、峰度及显著p值、Shapiro-Wilk正态检验结果;p值设定置信度,默认0.95stat.desc(mtcars[vars],norm=TRUE)mpghpwtnbr.val32.000000032.0000000032.00000000nbr.null0.00000000.000000000.00000000nbr.na0.00000000.000000000.00000000min10.400000052.000000001.51300000max33.9000000335.000000005.42400000range23.5000000283.000000003.91100000sum642.90000004694.00000000102.95200000median19.2000000123.000000003.32500000mean20.0906250146.687500003.21725000SE.mean1.065424012.120317310.17296847CI.mean.0.952.172946524.719550130.35277153var36.32410284700.866935480.95737897std.dev6.026948168.562868490.97845744coef.var0.29998810.467407710.30412851skewness0.61065500.726023660.42314646skew.2SE0.73669220.875872590.51048252kurtosis-0.3727660-0.13555112-0.02271075kurt.2SE-0.2302812-0.08373853-0.01402987normtest.W0.94756470.933419340.94325772normtest.p0.12288140.048808240.09265499另外,也可以使用psych包中的describe()函数,和Hmisc包中的describe()函数计算描述统计量,两个函数同名以最近加载的包为准,也可以限定一下,例如:Hmisc::describe()二、分组描述统计1.函数aggregate()vars-c(mpg,hp,wt)aggregate(mtcars[vars],by=list(am=mtcars$am),mean)ammpghpwt1017.14737160.26323.7688952124.39231126.84622.411000aggregate(mtcars[vars],by=list(am=mtcars$am),sd)ammpghpwt103.83396653.908200.7774001216.16650484.062320.6169816注:可以使用多个分组变量,但只支持mean,sd等单返回值函数。2.doBy包中的函数summaryBy()summaryBy(formula,data=dataframe,FUN=function)其中,formula接受如下格式的公式:var1+…+varN~groupvar1+groupvar2+…用~隔开要分析的变量和分组变量。library(doBy)myfun-function(x)(c(mean=mean(x),sd=sd(x)))summaryBy(mpg+hp+wt~am,data=mtcars,FUN=myfun)#FUN=可以是自定义的多返回值的函数ammpg.meanmpg.sdhp.meanhp.sdwt.meanwt.sd1017.147373.833966160.263253.908203.7688950.77740012124.392316.166504126.846284.062322.4110000.61698163.psych包中的函数describeBy()library(psych)describeBy(mtcars[vars],mtcars$am)$`0`varsnmeansdmediantrimmedmadminmaxrangeskewmpg11917.153.8317.3017.123.1110.4024.4014.000.01hp219160.2653.91175.00161.0677.1062.00245.00183.00-0.01wt3193.770.783.523.750.452.465.422.960.98kurtosissempg-0.800.88hp-1.2112.37wt0.140.18$`1`varsnmeansdmediantrimmedmadminmaxrangeskewmpg11324.396.1722.8024.386.6715.0033.9018.900.05hp213126.8584.06109.00114.7363.7552.00335.00283.001.36wt3132.410.622.322.390.681.513.572.060.21kurtosissempg-1.461.71hp0.5623.31wt-1.170.174.reshape包,揉数据透视表library(reshape)myfun-function(x)(c(n=length(x),mean=mean(x),sd=sd(x)))dfm-melt(mtcars,measure.vars=c(mpg,hp,wt),id.vars=c(am,cyl))cast(dfm,am+cyl+variable~.,myfun)amcylvariablenmeansd104mpg322.9000001.4525839204hp384.66666719.6553640304wt32.9350000.4075230406mpg419.1250001.6317169506hp4115.2500009.1787799606wt43.3887500.1162164708mpg1215.0500002.7743959808hp12194.16666733.3598379908wt124.1040830.76830691014mpg828.0750004.48385991114hp881.87500022.65541561214wt82.0422500.40934851316mpg320.5666670.75055531416hp3131.66666737.52776751516wt32.7550000.12816011618mpg215.4000000.56568541718hp2299.50000050.20458151818wt23.3700000.2828427

1 / 4
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功