EndogeneityinEconometrics:InstrumentalVariableEstimationMingLUEndogeneity•Omittingvariablebias•Simultaneity•Measurementerror•Canweignoretheomittedvariablesbias?Itcanbesatisfactoryiftheestimatesarecoupledwiththedirectionofthebiasesforthekeyparameters.•Canweuseproxytoeliminateomittedvariablebias?–Sometimes.•CanFEestimationsolveomittingvariableproblem?Firstdifferencingorfixedeffectsestimationeliminatestime-constantvariables.Inaddition,thepaneldatamethodsdonotsolvetheproblemoftime-varyingomittedvariablesIdeaofIVEstimation•Exogenousvariable.•IndirecteffectsofIV.ExampleWhatcanserveasIVforedu?•Mother’seducation?•Numberofsiblings?•Thereportofothers?•Adummyvariablethatisequalto1ifamanisborninthefirstquarteroftheyear.AngristandKrueger(1991).(Problematic.)•InChina,theyearsofprimaryedu?IVforskippedclass?•Thedistancefromhometoschool.OtherexamplesofIV•IVforinstitution:Language?History?•Mauro(1995)使用人口的种族和语言构成作为腐败的工具变量,HallandJones(1999)用距离赤道的距离和以西欧语言为第一语言的程度作为制度质量的工具变量,LaPortaetal.(1997,1998,1999)把法律的起源作为各种法律结构的工具变量。Acemoglu,Johnson,andRobinson(2001,2002)使用殖民地时代(1500年前后)的死亡率和人口密度作为制度的工具变量•IVforschoolchoice:Numberofsteams?Identification•Referto(15.9)and(15.10)The(asymptotic)standarderrorofSSTisthetotalsumofsquaresofthexiSelf-selection•Angrist(1990)studiedtheeffectthatbeingaveteranintheVietnamwarhadonlifetimeearnings.•DraftlotterynumberisagoodIVcandidateforveteran.•SomeadditionalwordsaboutnaturalexperimentandDIDPropertiesofIVwithaPoorInstrumentalVariablePoorIVcancauseseriousbias.R2•MostregressionpackagescomputeanR-squaredafterIVestimation,usingthestandardformula:R2=1-SSR/SST,whereSSRisthesumofsquaredIVresiduals,andSSTisthetotalsumofsquaresofy.•R2canbenegativeinthiscase.IVESTIMATIONOFTHEMULTIPLEREGRESSIONMODEL•structuralequationEstimationEfficientIVEquation(15.26)isanexampleofareducedformequation,whichmeansthatwehavewrittenanendogenousvariableintermsofexogenousvariables.TWOSTAGELEASTSQUARES2SLSinwords•Thefirststageistoruntheregressionin(15.36),whereweobtainthefittedvaluesyˆ2.•ThesecondstageistheOLSregression(15.38).Becauseweuseyˆ2inplaceofy2,the2SLSestimatescandiffersubstantiallyfromtheOLSestimates.•Anotherinterpretation:MultipleEndogenousExplanatoryVariables•ORDERCONDITIONFORIDENTIFICATIONOFANEQUATION:•Weneedatleastasmanyexcludedexogenousvariablesasthereareincludedendogenousexplanatoryvariablesinthestructuralequation.IVSOLUTIONSTOERRORS-IN-VARIABLESPROBLEMSOnepossibilityistoobtainasecondmeasurementonX*1,say,z1,asIV.AnalternativeistouseotherexogenousvariablesasIVsforapotentiallymismeasuredvariable.TESTINGFORENDOGENEITYANDTESTINGOVERIDENTIFYINGRESTRICTIONS•The2SLSestimatorislessefficientthanOLSwhentheexplanatoryvariablesareexogenous;aswehaveseen,the2SLSestimatescanhaveverylargestandarderrors.Howtotestendogeneity?•1.ComparingtheOLSand2SLSestimatesanddeterminingwhetherthedifferencesarestatisticallysignificant.(Hausman,1978)•2.Aregressiontest:Anotherinterpretationof2SLS•Includingvˆ2intheOLSregression(15.51)clearsuptheendogeneityofy2.•Wecanalsotestforendogeneityofmultipleexplanatoryvariables.Foreachsuspectedendogenousvariable,weobtainthereducedformresiduals.Then,wetestforjointsignificanceoftheseresidualsinthestructuralequation,usinganFtest.TestingOveridentificationRestrictions•Ifwehavemorethanoneinstrumentalvariable,wecaneffectivelytestwhethersomeofthemareuncorrelatedwiththestructuralerror.•UseoneIVandgetthepredictedresidual,thentestthecorrelationbetweenotherIVsandtheresidual.TESTINGOVERIDENTIFYINGRESTRICTIONS:•(i)Estimatethestructuralequationby2SLSandobtainthe2SLSresiduals,uˆ1.•(ii)Regressuˆ1onallexogenousvariables.ObtaintheR-squared,sayR12.•(iii)UnderthenullhypothesisthatallIVsareuncorrelatedwithu1,nR12~ªX2(q),whereqisthenumberofinstrumentalvariablesfromoutsidethemodelminusthetotalnumberofendogenousexplanatoryvariables.IfnR12exceeds(say)the5%criticalvalueintheX2(q)distribution,werejectH0andconcludethatatleastsomeoftheIVsarenotexogenous.IsitbettertohavemoreIVs?•Addinginstrumentstothelistimprovestheasymptoticefficiencyofthe2SLS.Butthisrequiresthatanynewinstrumentsareinfactexogenous.•Withthetypicalsamplesizesavailable,addingtoomanyinstruments—thatis,increasingthenumberofoveridentifyingrestrictions—cancauseseverebiasesin2SLS.2OmittedTopics•2SLSWITHHETEROSKEDASTICITY•APPLYING2SLSTOTIMESERIESEQUATIONSAPPLYING2SLSTOPOOLEDCROSSSECTIONSANDPANELDATA•Forpooledcrosssectionsdata:addtimedummy.•Forpaneldata:Inthefirststage,usethedifferencedIVtogetanestimateoftheendogenousvariable.•Question:IfthepanelmodelisaFEone,howtochecktheefficiencyofIViftheIVistimeinvariant?STATAcommands•TocompareOLSand2SLS•ivregy(x=iv)x2•eststoref2•regyxx2•hausmanf2•Thesequenceisimportant.STATAcommands•TocompareFEandIV-FE•xtivregy(x=iv)x2,fe•eststoref2•xtregyxx2,fe•hausmanf2Theend.