第三部分:类别数据的分析(CategoricalData)离散选择模型和样本选择纠正西安交大管理学院2012-春2内容1.两项选择Logit模型分析的模型模型的最大似然估计2Probit模型3多项选择模型4Tobit模型5泊松回归模型6截取和断尾回归模型7样本选择纠正西安交大管理学院2012-春3TypesofMeasurement变量的测度类型QuantitativeQualitativeContinuousDiscreteOrdinalNominalCategoricalI.ReviewofBasicConcepts:Cross-tablesandMeasuresofAssociationOdds,Logoftheodds=Logit,andOddsRatio4ReviewofBasicConceptsI.BasicStatisticalMethodsIndependentVariables全是类别变量至少有一個整數或連續变量二分Binary2´c´…行列表分析;機率單元(probit)模型、勝算對數(logit)模型機率單元模型、成長曲線(logistic)回归无序多分Nominalr´c´…行列表分析;多項(multinomial)probit模型、Logit模型多項之機率單元模型、勝算對數模型(logistic回归)DependentVariable有序多分Ordinalr´c´…行列表分析;有序多分類probit模型、依序之logit模型有序多分類probit模型、依序之logit模型整數Integer*对数线型(loglinear)模型;泊松(Poisson)回归及其延伸泊松回归及其延伸連續Continuous方差分析(ANOVA);線型或非線型迴歸协方差分析(ANCOVA);線型或非線型迴歸常见的类别分析统计模型P28表2-4西安交大管理学院2012-春5II.BivariateDiscreteVariables:MeasuresofAssociationfor2x2Tables(有差?冇差?)•Sampledata:nijandproportionYY1Y2XX1n11n12n1+X2n21n22n2+n+1n+2n++ˆijp西安交大管理学院2012-春6•EstimatedJointProbabilitiesY1Y2XX1X21111ˆnnp++=12ˆp21ˆp22ˆp11ˆnnp++++=2ˆp+11ˆnnp++++=2ˆp+Yˆijp西安交大管理学院2012-春7•EstimatedConditionalProbabilitiesYY1Y2XX1X2111|11ˆnnp+=2|11|1ˆˆ1pp=-111|22ˆnnp+=2|21|2ˆˆ1pp=-11ˆnnp++++=2ˆp+|ˆjip西安交大管理学院2012-春8i.ScaleofMeasuresofAssociation:1.Unitscale:n2NominalVariables:between0and1[0,1]n2OrdinalVariables:between-1and+1[-1,+1]2.[0,¥)multiplicativescaleii.theUnitScale:1.DifferenceofProportions:2.Chi-Squared-BasedMeasuresofAssociation3.PREStatistics:ProportionalReductioninPredictionErrors1|11|2pp-西安交大管理学院2012-春9iii.theMultiplicativeScale1.TheConceptof“Odds”勝算:Theexpectednumberofsuccessforeachfailure••odds=1meansequalchanceofsuccessandfailure•probabilityofsuccess1|1111|112Pr()Pr()1nsucessoddsfailurenpp===-01oddspp£=+¥-1oddsoddsp=+西安交大管理学院2012-春102.Logoftheodds=ln(odds)=logit(duetoJosephBerkson,1944)勝算之對數•,symmetricaround0•logit=0meansequalchanceofsuccessandfailure•exp(logit)=exp[ln(odds)]=oddslogitln1ppæö-¥º+¥ç÷-èø西安交大管理学院2012-春11probabilityofsuccessπodds=π/(1-π)logits=ln(odds)00undefined0.0010.001001001-6.9067547790.010.01010101-4.595119850.020.020408163-3.8918202980.050.052631579-2.9444389790.10.111111111-2.1972245770.20.25-1.3862943610.250.333333333-1.0986122890.30.428571429-0.847297860.40.666666667-0.4054651080.5100.61.50.4054651080.72.3333333330.847297860.7531.0986122890.841.3862943610.992.1972245770.95192.9444389790.98493.8918202980.99994.595119850.9999996.906754779infinityinfinity西安交大管理学院2012-春Probability,Odds,andLogit0.20.40.60.81501001502002500.20.40.60.81-15-10-55101512西安交大管理学院2012-春133.OddsRatio(Cross-ProductRatio)勝算比••WhenXandYareindependent,=1•TheoddsratiotreatsthevariablesXandYsymmetrically1|11|11|21|2101oddsratioppqpp-£=+¥-q西安交大管理学院2012-春14•Sampleoddsratio(cross-productratio):就2×2表而言,oddsratio的樣本估算式又稱為「交叉相乘比」(cross-productratio),因為:()()()()1|12|111112111221|22|22122221221ˆ.nnnnnnnnnnnnppqpp++++===西安交大管理学院2012-春2008总统选举馬英九謝長廷2008立委選舉泛藍泛藍穩定653(57.28%)[94.64%]藍轉綠37(3.25%)[5.36%]690(60.53%)泛綠綠轉藍50(4.39%)[11.11%]泛綠穩定400(35.09%)[88.89%]450(39.47%)703(61.67%)437(38.33%)1,140(100%)1.对称性檢定(Testofsymmetry):McNemarX2=1.943,df=1,p=0.1630.052.独立性檢定(Testofindependence):PearsonX2=803.857,df=1,p0.0013.相关度測量(Measuresofassociation):Cramer’sV=0.840;Cohen’s=0.8404.勝算比(Oddsratio)=引自:黃紀、王德育(2009,41)15k65337141.18950400æö=ç÷èø2008年立委与总统选举投票模式之交叉分析西安交大管理学院2012-春164.ln(oddsratio)=ln(odds1)-ln(odds2)=logitdifference5.StatisticalInferenceforoddsratio:Sincesamplingdistributionofoddsratioishighlyskewed,use6.RelativeRisk(RR)=7.OddsRatio=Iftheeventofinterestoccursinfrequently,theoddsratiocanbeusedasanestimateofRR.ˆln()q1|11|2pp1|12|22|21|22|12|1RRpppppp´=´西安交大管理学院2012-春西安交大管理学院2012-春17TwoPhilosophiesofCategoricalData分类数据建模的两种哲学观点Onephilosophyviewscategoricalvariablesasbeinginherentlycategoricalandreliesontransformationsofthedatatoderiveregression-typemodels.Theotherphilosophypresumesthatcategoricalvariablesareconceptuallycontinuousbutareobserved,ormeasured,ascategorical.Thisapproachreliesonlatentvariablestoderiveregression-typemodels.①②西安交大管理学院2012-春18Characteristically,regressionpartitionsanobservationintotwoparts:observedstructuralstochastic=+TheobservedpartrepresentstheactualvaluesofthedependentvariableathandThestructuralpartdenotestherelationshipbetweenthedependentandindependentThestochasticpartistherandomcomponentunexplainedbythestructuralpartOmittedstructuralfactorsMeasurementerror“Noise”RegressionModels西安交大管理学院2012-春19ØHowtointerpretregressionmodelsiscontingentonone’sconceptualizationaboutwhatregressiondoestodata.WeproposethreedifferentconceptualizationsCausation:Prediction:Description:Observedtruemechanismdisturbancepredictederrorsummaryresidual=+Observed=Observed=++西安交大管理学院2012-春20Forlinearregressions,assessmentsareconductedusingF-testbasedonreductioninresidualsumsofsquares,orotherproportionatereductioninerror(PRE)criteria.Forthenonlinearmodels,mostassessmentsarebasedonlikelihood-ratioteststhatassesstheincreasedlikelihoodofthedatawhenaparameterisaddedtoamodel.linearregressionsnonlinearmodels西安交大管理学院2012-春21ModelsforBinaryDataØLogitModel(orLogisit