Subset Selection in Multiple Linear Regression A N

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SCHOOLOFBUSINESSWORKINGPAPERNO.284SUBSETSELECTIONINMULTIPLELINEARREGRESSION:ANEWMATHEMATICALPROGRAMMINGAPPROACHRizaDemirerandBurakEksiogluJuly1998†SchoolofBusinessUniversityofKansasSummerfieldHallLawrence,KS66045-2003.riza@ukans.eduDepartmentofIndustrialEngineeringUniversityofFlorida303WeilHallGainesville,FL32611.burak@grove.ufl.edu†Commentsandsuggestionsarewelcomeandwillbegratefullyappreciated.SubsetSelectioninMultipleLinearRegression:ANewMathematicalProgrammingApproach†RizaDemirerUniversityofKansas,SchoolofBusiness,SummerfieldHall,Lawrence,KS66045.BurakEksiogluDepartmentofIndustrialEngineering,303WeilHall,UniversityofFlorida,Gainesville,FL32611.July1998†Correspondence:RizaDemirerUniversityofKansasSchoolofBusinessSummerfieldHallLawrence,KS66045USAE-mail:riza@ukans.eduABSTRACTWeaddresstheso-calledsubsetselectionprobleminmultiplelinearregressionwheretheobjectiveistoselectaminimalsubsetofpredictorvariableswithoutsacrificinganyexplanatorypower.Anewmathematicalprogrammingmodelisproposedforthispurpose.Aparametricsolutionofthismodelyieldsanumberofefficientsubsets.Toobtainthissolution,werepeatedlyuseanexactoroneoftwoheuristicalgorithms.Thesubsetsgeneratedinthiswayarecomparedwiththeonesgeneratedbyseveralstandardprocedures.Theresultssuggestthat,inmostcases,ourapproachfindssubsetsthatcomparefavorablyagainstthestandardprocedures(intermsofgenerallyacceptedmeasuressuchasadjustedR2andMallow’sCp).Keywords:Heuristics;Mathematicalprogramming;Multivariatestatistics;Regression.1.IntroductionAcommonchallengeforaregressionanalystistheselectionofthebestsubsetfromasetofpredictorvariablesintermsofsomespecifiedcriterion.Historically,whentherearemanypredictorvariables,oneormoresubsetswithfewerpredictorvariablesaregeneratedusingamethodoftheanalyst’schoice.Givendataoftheform{{yi,x1i,...,xni},i=1,...,k},theso-calledsubsetselectionproblemgenerallyinvolvestheselectionofasubsetMofN,whereN={1,...,n}istheindexsetofthepredictorvariables{X1,...,Xn},suchthatsomemeasureofthemodel’sexplanatorypowerismaximized.Themainmotivationforsubset-selectionseemstobeparsimony:“if3regressorscan‘explain’or‘satisfactorilyfit’aresponseY,whyuse4?”,asMandel(1989)notes.SomeofthereasonsforusingonlyasubsetoftheavailablepredictorvariablesaregivenbyMiller(1984):(i)toestimateorpredictatalowercostbyreducingthenumberofvariablesonwhichdataaretobecollected;(ii)topredictmoreaccuratelybyeliminatinguninformativevariables;(iii)todescribemultivariatedatasetparsimoniously;and(iv)toestimateregressioncoefficientswithsmallerstandarderrors(particularlywhensomeofthepredictorsarehighlycorrelated).Theproblemofselectingthebestsubsetofpredictorvariablesinregressionhasbeenthetopicofanumberofstudiesinthestatisticalliterature.Roughlyspeaking,thefocusofsuchstudieshasbeenthesubsetselectionmethodologies,theselectioncriteria,oracombinationofboth.Thetraditionalselectionmethodologiescanbepurelyenumerative(e.g.,allsubsetsandbestsubsetsprocedures),sequential(e.g.,forwardselection,backwardelimination,stepwiseregressionandstagewiseregressionprocedures),andscreening-based(e.g.,ridgeregressionandprincipalcomponentsanalysis).StandardtextslikeDraperandSmith(1981)andMontgomeryandPeck(1991)provideagooddescriptionofthesemethodologies.Alongthetraditionalroute,newmethodologiessuchasthestepwisedirectedsearchofBroersen(1986)andnonnegativegarroteofBreiman(1995)haverecentlycometosurface.ThereisaparallelapproachtothesubsetselectionproblemwhichusesaBayesianperspective;see,forexample,theworkofMitchellandBeauchamp(1988).Withrespecttotheselectioncriteria,anumberofmeasureshavebeenproposedsuchasadjustedR2,Mallow’sCpandAkaike’sAIC.Onceagain,textssuchasDraperandSmith(1981)andMontgomeryandPeck(1991)offeranadequatecoverageonthistopic.Thereareseveralpapersthatcanhelpthereaderunderstandthestate-of-the-artinsubsetselectionresearch.TheearlyworkofHocking(1976)providesadetailedoverviewofthefielduntilthemid-70s.Ataboutthesametime,Berk(1977)reportsacomputationalcomparisonofvariousselectionprocedures,andThompson(1978a,1978b)givesbothareviewandanevaluationofselectionproceduresandcriteria.Subsequently,Miller(1984)offersacomprehensivesurveyofselectionmethodsandcriteriaanddiscussesthepotentialpitfallsananalystfacesinusingsubsetselection.Grechanovsky(1987)providesasomewhatsimilaraccount,thoughinalimitedway.SparksandZucchini(1985)examinethesameissues,butforthecasewhentherearemultipleYvariables.Hoerletal.(1986)reportacomputationalstudyinvolvingridgeregression,andsequentialandscreening-basedsubsetselection.Theopinionsregardingtheadvantagesanddisadvantagesofthevariousproceduresclearlydiffer,andnofinalwordseemstobeforthcoming.Inthispaper,weproposeanewmathematicalprogrammingbasedapproachfordoingsubsetselection.Themethodissimilarinspirittoallsubsetsandbestsubsetsproceduresinthatitconcernsitselfwiththeselectionofgoodsubsets;however,unliketheallsubsetsprocedure,itidentifiesonlyalimitednumberofsubsets,and,unlikethebestsubsetsprocedure,itusesanon-traditionalselectioncriterion.Thecriterionusedisbasedontheintuitionthatinagoodmodelthecorrelationsbetwee

1 / 23
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功