1個體數據之類別分析基本模型LogitandProbitModels:AnIntroduction黃紀政治大學講座教授2OutlineI.ReviewofBasicConcepts:Cross-tablesandMeasuresofAssociationOdds,Logoftheodds=Logit,andOddsRatioII.BinaryRegressionModelsLogitModel(orLogisiticRegression)ProbitModelIII.OrderedRegressionModelsCumulativeProbability(orProportionalOdds)ModelContinuationRatio(orSequentialLogit)ModelIV.MultinomialRegressionModelsMultinomialLogit(MNL)ModelMultinomialProbit(MNP)ModelReviewofBasicConcepts4I.BasicStatisticalMethods自變數IndependentVariables全是類別變數至少有一個整數或連續變數二分Binary2c…行列表分析;機率單元(probit)模型、勝算對數(logit)模型機率單元模型、成長曲線(logistic)迴歸無序多分Nominalrc…行列表分析;多項(multinomial)之機率單元模型、勝算對數模型多項之機率單元模型、勝算對數模型(成長曲線迴歸)依變數DependentVariable有序多分Ordinalrc…行列表分析;有序多分類之機率單元模型、依序之勝算對數模型有序多分類之機率單元模型、依序之勝算對數模型整數Integer*對數線型(loglinear)模型;卜瓦松(Poisson)迴歸及其延伸卜瓦松迴歸及其延伸連續Continuous變異數分析(ANOVA);線型或非線型迴歸共變數分析(ANCOVA);線型或非線型迴歸常見之統計模型5II.BivariateDiscreteVariables:MeasuresofAssociationfor2x2Tables(有差?冇差?)Sampledata:nijandproportionYY1Y2XX1n11n12n1+X2n21n22n2+n+1n+2n++ˆij6EstimatedJointProbabilitiesY1Y2XX1X21111ˆnn12ˆ21ˆ22ˆ11ˆnn2ˆ11ˆnn2ˆYˆij7EstimatedConditionalProbabilitiesYY1Y2XX1X2111|11ˆnn2|11|1ˆˆ1111|22ˆnn2|21|2ˆˆ111ˆnn2ˆ|ˆji8i.ScaleofMeasuresofAssociation:1.Unitscale:2NominalVariables:between0and1[0,1]2OrdinalVariables:between-1and+1[-1,+1]2.[0,)multiplicativescaleii.theUnitScale:1.DifferenceofProportions:2.Chi-Squared-BasedMeasuresofAssociation3.PREStatistics:ProportionalReductioninPredictionErrors1|11|29iii.theMultiplicativeScale1.TheConceptof“Odds”勝算:Theexpectednumberofsuccessforeachfailureodds=1meansequalchanceofsuccessandfailureprobabilityofsuccess1|1111|112Pr()Pr()1nsucessoddsfailuren01odds1oddsodds102.Logoftheodds=ln(odds)=logit(duetoJosephBerkson,1944)勝算之對數,symmetricaround0logit=0meansequalchanceofsuccessandfailureexp(logit)=exp[ln(odds)]=oddslogitln111probabilityofsuccessπodds=π/(1-π)logits=ln(odds)00undefined0.0010.001001001-6.9067547790.010.01010101-4.595119850.020.020408163-3.8918202980.050.052631579-2.9444389790.10.111111111-2.1972245770.20.25-1.3862943610.250.333333333-1.0986122890.30.428571429-0.847297860.40.666666667-0.4054651080.5100.61.50.4054651080.72.3333333330.847297860.7531.0986122890.841.3862943610.992.1972245770.95192.9444389790.98493.8918202980.99994.595119850.9999996.906754779infinityinfinityProbability,Odds,andLogit0.20.40.60.81501001502002500.20.40.60.81-15-10-551015133.OddsRatio(Cross-ProductRatio)勝算比WhenXandYareindependent,=1TheoddsratiotreatsthevariablesXandYsymmetrically1|11|11|21|2101oddsratio14Sampleoddsratio(cross-productratio):就2×2表而言,oddsratio的樣本估算式又稱為「交叉相乘比」(cross-productratio),因為:1|12|111112111221|22|22122221221ˆ.nnnnnnnnnnnn2008總統選舉馬英九謝長廷2008立委選舉泛藍泛藍穩定653(57.28%)[94.64%]藍轉綠37(3.25%)[5.36%]690(60.53%)泛綠綠轉藍50(4.39%)[11.11%]泛綠穩定400(35.09%)[88.89%]450(39.47%)703(61.67%)437(38.33%)1,140(100%)1.對稱性檢定(Testofsymmetry):McNemarX2=1.943,df=1,p=0.1630.052.獨立性檢定(Testofindependence):PearsonX2=803.857,df=1,p0.0013.相關度測量(Measuresofassociation):Cramer’sV=0.840;Cohen’s=0.8404.勝算比(Oddsratio)=引自:黃紀、王德育(2009,41)1565337141.189504002008年立委與總統選舉投票模式之交叉分析164.ln(oddsratio)=ln(odds1)-ln(odds2)=logitdifference5.StatisticalInferenceforoddsratio:Sincesamplingdistributionofoddsratioishighlyskewed,use6.RelativeRisk(RR)=7.OddsRatio=Iftheeventofinterestoccursinfrequently,theoddsratiocanbeusedasanestimateofRR.ˆln()1|11|21|12|22|21|22|12|1RR17BinaryRegressionModelsI.HistoricalOrigins:Problemswithlinearprobabilitymodels(LPM)181920II.AlternativeViewsofBinaryRegression:殊途同歸ProbabilityModel:whereFisacumulativedistributionfunction(CDF)|Pr(1|)iiiiiyFxxxxβ2122i.LatentVariableRegression:ThresholdModelIdentifyingAssumptionsoftheLatentVariableModelsandImplicationsthethresholdis0theconditionalmeanoftheerroris0***:10:00iiiiiiLatentyyifyObservedyifyxβ23theconditionalvarianceoftheerrorisaconstant:1intheprobitandinthelogitmodel,themagnitudeoftheslopedependsonthescaleofthedependentvariableandcannotbeinterpreteddirectly.Butidentifyingassumptionsdonotaffect,whichisanestimablefunction.Wecaninterpretchangesinprobabilitiesandodds.23.293Pr(1|)iiyx24ii.RandomUtilityModel(RUM)TheChoiceSetCiandRandomUtility:U=V+eDerivationofChoiceProbabilitybasedonUtilityMaximizationPrinciple1|0|aaaabiUxPYxPUUPxx25TheIIDTypeIExtremeValueAssumptionoftheErrorTermsImplicationsoftheRUM:a)OnlyDifferencesinUtilityMatter:Differencebetween2TypeIExtremeValueDistributionisLogisticDistributionb)TheOverallScaleofUtilityIsIrrelevantandthusNormalizedto21.64626iii.GeneralizedLinearModel(GLM)GLMforBinaryDataRandom(Stochastic)Component:BernoulliDistribution1|1,0,1;01exp[ln1ln1]explnln11iiyyiiiiiiiiiiiiiifyyyyy27whereexpln,11exp1,1ln1lnln1exp,1exp,0.iiiiiiiiiiibcy28SystematicComponent:LinkFunction:thelogi