6、相关分析与简单回归分析Correlation&SimpleRegressionAnalysis张伦复旦大学新闻学院2013年FIST课程·传播研究方法Outline本讲内容相关分析o定义与公式o结果解读o相关系数的统计推断检验o扩展学习简单回归分析o定义与公式o简单回归模型建模步骤多元线性回归分析学习目标理解并掌握相关分析/简单回归分析的基本概念能够运用相关分析以及线性回归分析解决实际问题能够对分析结果进行正确、恰当的解读所有公式仅作理解之用,不需背记!2相关分析第一部分如何描述两个变量之间的关系?途径1:散点图散点图表示两个变量之间的关系,有何优点?有何缺陷?o直观o不可比o非标准化途径2:相关系数检验o是否显著?o是否带有普遍性?4-3-2-1012-2-1012(a)xy-2-1012-2-1012(b)xy-2-1012-2-1012(c)xy-3-2-1012302468(d)xyCorrelation简单相关系数,又称皮尔森相关系数(Pearson’sproduct-momentcorrelationcoefficient),测量两个变量线性联系的紧密程度(direction,strength,andsignificance(alongwithsamplesize))5StatisticalSignificancevs.StrengthofRelationshiprSignificantNon-significantr=1YesImpossible1r≥.7StrongTrivial.7r≥.3ModerateTrivial.3r0WeakTrivialr=0ImpossibleTrivialCaution:readKozak,M.(2009).Whatisstrongcorrelation?TeachingStatistics,31,85-86.原始算法标准化值算法FormalDefinitionCovariancebetweentwovariablesdividedbytheproductoftheirrespectivestandarddeviations12211()()()()niiinniiiixxyyrxxyy2()1()2(1)xyZZrnSource:Cohen,J.,Cohen,P.,West,S.G.,&Aiken,L.S.(1983).Appliedmultipleregression/correlationanalysisforthebehavioralsciences.P278010203040506005101520No.ofPublicationsTimeSincePh.D.(Years)2()r1()2(1)9.614=1()2(151)=.657xyZZn相关系数的检验相关系数的抽样分布随着n的增大越来越接近于自由度为n-2的t分布E.g.,r=.657,n=15odf=15-2=13op.0592t(1)/(2)rrn20.657t(10.657)/(152)3.14Interpretationr:theamountofchangeinzygivenaunitincreaseinzx(orviceversa).r2:theproportionofvarianceinYassociatedwith/causedbyX(orviceversa).PropertiesoftheproductmomentcorrelationcoefficientindependentoftheunitsofmeasurementItsvaluevariesbetweenzero,whenthevariablehavenolinearrelationship,and+1.00or-1.00,wheneachvariableisperfectlyestimatedbytheother.TheabsolutevaluegivesthedegreeofrelationshipPropertiesoftheProductMomentCorrelationCoefficientItssignindicatesthedirectionoftherelationship.oApositivesignindicatesatendencyforhighvaluesofonevariabletooccurwithhighvaluesoftheother,andlowvaluestooccurwithlow.oAnegativesignindicatesatendencyforhighvaluesofonevariabletobeassociatedwithlowvaluesoftheother.Reversingthedirectionofmeasurementofoneofthevariableswillproduceacoefficientofthesameabsolutevaluebutofoppositesign.12相关分析在社会科学研究中的主要用途对变量之间的关系进行描述,判定数据质量oCorrelationMatrix数据分析工具获得新发现的必要手段oGoogleCorrelatebyGoogleLabso:Huffaker,D.Dimensionsofleadershipandsocialinfluenceinonlinecommunities.HumanCommunicationResearch,36(4),593-617.作为数据分析方法15McCombs,M.E.,&Shaw,D.L.(1972).Theagenda-settingfunctionofmassmedia.Publicopinionquarterly,36(2),176-18716AnyObservations?AnyConclusions?简单回归分析第二部分社会科学家理解现实世界的途径理论自然规则:人们试图猜测或近似。“黑匣子”所有的模型都是对黑匣子的各种猜测,且希望这些猜测离真正的规律越近越好模型统计可以根据目前所拥有的信息(数据)来简历人们所关欣的变量和其他有关变量的关系。这种关系成为模型(Model)。因变量=f(自变量,随机噪声,参数)18自然规则XY相关分析与回归分析的区别Noassumptionofcausality:oThroughthecalculationofthecorrelationcoefficient,onecantellwhetherXandYvarylinearlybutCANNOTtellwhetherXaffectsYorYaffectsX.DeterministicModel:oanequationorsetofequationsthatallowustofullydeterminethevalueofthedependentvariablefromthevaluesoftheindependentvariables.o当变量X发生变化时,变量Y发生了多大的变化先来看一个简单的例子:美国某60个著名商学院的数据oX:学生进入MBA学习前的工资oY:学生进入MBA学习后的工资回归模型的基本任务:o寻找到一条直线【拟合】,来适当地【最小二乘法】代表图中散点的趋势。o用这条直线的特点(Y=ax+b)【估计】,来概括自变量与因变量的关系o对这条直线的特点的“显著性”做出判断【显著性检验】o对这条直线与散点的关系进行评判【模型拟合检验,决定系数】203040506080100120140160180SalaryPreMBASalaryPostMBA简单回归模型的定义DependentDependent(Response)(Response)VariableVariable(e.g(e.g.,.,NumberofNumberofPublications)Publications)Independent(Explanatory)Independent(Explanatory)VariableVariable(e.g.,Years(e.g.,Yearssincegraduation)sincegraduation)PopulationPopulationSlopeSlopePopulationPopulationYY--InterceptInterceptRandomRandomErrorErrorYi=0+1Xi+iUnknownRelationshipPopulationPopulationRandomSampleRandomSampleYXiii01图示:如何理解从样本推断总体简单回归模型的定义Simpleregressionline简单回归模型的数据分析步骤ModelSpecification•DefinetheDVandIVs•Hypothesizethenatureofrelationship•Expectedeffects(CoefficientSign)•FunctionalForm(Linear&Nonlinear)•Basedonpreviousresearch/theoriesModelEstimation•Identifypossibleoutliers&Checktherequiredconditions•ModelEstimation•Choosetheappropriateteststatistics•Specifythesignificancelevel(andcriticalvalue)•Estimatetheteststatistics•TestthenullhypothesesModelAssessment•Evaluatetheperformance(i.e.,goodnessofthefit)ofthemodelResultInterpretation•Interpretthe(social)meaningoftheparametersandperformanceofthemodel简单回归模型的数据分析步骤ModelSpecification•DefinetheDVandIVs•Hypothesizethenatureofrelationship•Expectedeffects(CoefficientSign)•FunctionalForm(Linear&Nonlinear)•Basedonpreviousresearch/theoriesModelEstimation•Identifypossibleoutliers&Checktherequiredconditions•ModelEstimation•Choosetheappropriateteststatistics•Specifythesignificancelevel(andcriticalvalue)•Estimatetheteststatistics•TestthenullhypothesesModelAssessment•Evaluatetheperformance(i.e.,goodnessofthefit)ofthemodelResultInterpretation•Interpretthestatistical&socialmeaningoftheparametersandperformanceofthemodelRequiredConditionsofSimpleRegressionNormality(残差