统计基础和prism软件使用仝鑫魏健2015-12目录线性回归和prism软件应用t检验、F检验(方差分析)和prism软件应用假设检验(参数检验和非参数检验)统计学基础知识TheGaussianDistribution•TheGaussianfunctiondescribingthisshapeisdefinedasfollows:wheremrepresentsthepopulationmeanandsthestandarddeviation.•Fewbiologicaldistributions,ifany,reallyfollowtheGaussiandistribution一、统计学基础知识TheCentralLimitTheoremIfyoursamplesarelargeenough,thedistributionofmeanswillfollowaGaussiandistributionevenifthepopulationisnotGaussian.N=10orsoisgenerallyenough一、统计量(DescriptiveStatistics:columnstatisticsinprism)MeasuresofLocationAtypicalorcentralvaluethatbestdescribesthedata(centraltendency).•Mean(平均值)•Median(中数)•Mode(众数)•Geometricmean(几何均数)MeasuresofDispersionDescribespread(variation)ofthedataaroundthatcentralvalue.•Range(范围)•Variance(方差)•StandardDeviation(标准偏差)•StandardError(样本间标准误=SD/n½•Coefficientofvariation(变异系数)•ConfidenceInterval(置信区间)Nosingleparametercanfullydescribedistributionofdatainthesample.Moststatisticssoftwarewillprovideacomprehensivetabledescribingthedistribution.MeasuresofLocation:MeanMean•Morecommonlyreferredtoas“theaverage”.•Itisthesumofthedatapointsdividedbythenumberofdatapoints.MigrationAssayCell#Distancetravelled(Microns)14922731324245786807628399200M=76.78microns=77micronsM492713224788062392009MeasuresofDispersion:VarianceVariance•Definedastheaverageofthesquaredistanceofeachvaluefromthemean.Tocalculatevariance,itisfirstnecessarytocalculatethemeanscorethenmeasuretheamountthateachscoredeviatesfromthemean.Theformulaforcalculatingvarianceis:1)(22NMXSMeasuresofDispersion:StandardDeviationStandardDeviation•Themostcommonandusefulmeasureofdispersion.•Tellsyouhowtightlyeachsampleisclusteredaroundthemean.Whenthesamplesaretightlybunchedtogether,theGaussiancurveisnarrowandthestandarddeviationissmall.•Whenthesamplesarespreadapart,theGaussiancurveisflatandthestandarddeviationislarge.•Theformulatocalculatestandarddeviationis:SD=squarerootofthevariance.标准偏差(SD)和标准误(SEM)Standarddeviationreferstotheamountyouexpectanindividualmeasurementtovaryfromtheaverage.标准差(standarddeviation)衡量的是样本值对样本平均值的离散程度,反应个体间变异的大小,是量度数据精密度的指标。Standarderrorofthemeanishowmuchyouexpectavalueaveragedfromseveralmeasurementstovaryfromthetruemean.标准误(standarderror)衡量的是样本平均值对总体平均值的离散程度,反映抽样误差的大小,是量度结果精密度的指标。Shouldweshowstandarddeviationorstandarderror?UseStandardDeviation•Ifthescatteriscausedbybiologicalvariabilityandyouwanttoshowthatvariability.•Forexample:Youaliquot10plateseachwithadifferentcelllineandmeasureintegrinexpressionofeach.Usestandarderror•Ifthevariabilityiscausedbyexperimentalimprecisionandyouwanttoshowtheprecisionofthecalculatedmean.Thenshowthe95%confidenceintervalofthemean.•Forexample:Youaliquot10platesofthesamecelllineandmeasureintegrinexpressionofeach.PrecisionoftheMean•在统计学中,样本的置信区间(Confidenceintervals)是对这个样本的总体某参数的区间估计。展现的是这个参数的真实值有一定概率落在测量结果的周围的程度。•“一定概率”:称为置信水平。当求取90%置信区间时Z=1.645当求取95%置信区间时Z=1.96当求取99%置信区间时Z=2.576TheformulaforcalculatingCI:CI=X±(SEMxZ)•XisthesamplemeanandZisthecriticalvalueforthenormaldistribution.•Forthe95%CI,Z=1.96.•Forourdataset:95%CI=77±(19x1.96)=77±32CI95%=45-109•Thismeansthatthere’sa95%chancethattheCIyoucalculatedcontainsthepopulationmean.CI:APracticalExampleDatasetADatasetB8090855290308844796892778855856288758688DatasetADatasetBMean86.164.1SD4.119.3SEM1.36.1Low95%CI83.250.3High95%CI89.077.9Betweenthesetwodatasets,whichmeandoyouthinkbestreflectsthepopulationmeanandwhy?InterpretCIofameanSD/SEM/95%CIerrorbarsSDSEM95%CI二、TheNullHypothesis(假设检定)•AppearsintheformHo:m1=m2Where;Ho=nullhypothesism1=meanofpopulation1m2=meanofpopulation2•AnalternateformisHo:m1-m2=0•Thenullhypothesisispresumedtrueuntilstatisticalevidenceintheformofahypothesistestprovesotherwise.(非此即彼)假设检验的一些基本概念1.thedifferenceyouobservedfromsampling≠truedifferenceofpopulation.Allyoucandoiscalculateprobabilities(Pvalue:[0,1]).BeforethinkingaboutPvalues,youshould:1)Assessthescience.·2)ReviewtheassumptionsoftheanalysisyouchosePvalues(SmallPandbigPseepage35and37)2.显著性水平(thresholdsignificancelevel)•用样本推断H0是否正确,必有犯错误的可能。原假设H0正确,而被我们拒绝,犯这种错误的概率或风险用表示。把称为假设检验中的显著性水平,即决策中的风险。例:=0.05时的接受域和拒绝域接受域:原假设为真时允许范围内的变动,应该接受原假设。拒绝域:当原假设为真时只有很小的概率出现,因而当统计量的结果落入这一区域便应拒绝原假设,这一区域便称作拒绝域。假设检验的一些基本概念双侧检验与单侧检验假设检验根据实际的需要可以分为:双侧检验(双尾):指只强调差异而不强调方向性的检验。单侧检验(单尾):强调某一方向性的检验。左侧检验右侧检验大还是小比是否有差异,不关心,只关注0101011010::mmmmmmmmHHmmmmmmmm1110011010::::HHHH假设检验中的单侧检验示意图拒绝域拒绝域(a)右侧检验(b)左侧检验假设检验的一些基本概念假设检验中的两类错误•假设检验是依据样本提供的信息进行推断的,即由部分来推断总体,因而假设检验不可能绝对准确,是可能犯错误的。两类错误:•错误(I型错误):H0为真时却被拒绝,弃真错误;•错误(II型错误):H0为假时却被接受,取伪错误。假设检验中各种可能结果的概率:接受H0,拒绝H1拒绝H0,接受H1H0为真1-(正确决策)(弃真错误)H0为伪(取伪错误)1-(正确决策)X(1)与是两个前提下的概率。即是拒绝原假设H0时犯错误的概率,这时前提是H0为真;是接受原假设H0时犯错误的概率,这时前提是H0为伪。所以+不等于1。(2)对于固定的n,与一般情况下不能同时减小。对于固定的n,越小,Z/2越大,从而接受假设区间(-Z/2,Z/2)越大,H0就越容易被接受,从而“取伪”的概率就越大;反之亦然。即样本容量一定时,“弃真”概率和“取伪”概率不能同时减少,一个减少,另一个就增大。与(3)要想减少与,一个方法就是要增大样本容量n。。与概率从而减少了两种错误的变小,则分布就瘦长,变小,就会中,~,在样本平均数的分布若增大mnnnNXn22),(与HypothesisTestingObservePhenomenonProposeHypothesisDesignStudyCollectandAnalyzeDataInterpretResultsDrawConclusionsvvvStatisticsareanimportantParto