Bayesian Regression Analysis in the Large p, Small

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

BayesianRegressionAnalysisinthe“Largep,Smalln”ParadigmwithApplicationinDNAMicroarrayStudiesMikeWesty,JosephRNevins,JeffreyRMarks,RainerSpang&HarryZuzanDukeUniversityCurrentdraft:July31st2000(May2000original)Summary.Statisticalmodellingandinferenceproblemsinwhichsamplesizesaresubstan-tiallysmallerthanthenumberofavailableandpotentiallyinterestingpredictors(explanatoryvariables)aboundinappliedscienceandmedicine.These“Largep,Smalln”problemsposechallengestostandardstatisticalmethodsanddemandnewconceptsandmodelsforre-gressionandclassification.Ourmotivatingappliedcontextisinfunctionalgenomics;morespecifically,instudiesofphenotypingclinicalorphysiologicaloutcomesinwhichthepredictorsaremeasuredexpressionlevelsoflargenumbersofgenesbasedonhigh-densityDNAmi-croarrays.Inacanonicalframeworkofbinaryregression,wediscuss(a)issuesofregressionmodellingutilisingsingular-valuedecompositionsofdesignmatricesthataremassivelyrankdeficient,(b)theimperativesforcareful,informativepriorspecificationsonhigh-dimensionre-gressionparameters,(c)thedevelopmentofnewclassesofstructuredpriordistributionsforthisproblem,and(d)thedevelopmentofappropriatecomputationalmethodsandmodesofposteriorinferenceforregressionestimationandpredictiveinferenceforout-of-sampleclassi-fication.Thelatterenterpriseisfundamentaltogenomicphenotypingapplications.WestudyandexemplifythenewstatisticalmethodologyinaproblemofbreastcancerphenotypingusingDNAmicroarrayexpressionprofilesaspredictors,andindiscriminationofleukemiatypes.Keywords:Bayesianregressionanalysis,binaryregression,dimensionreduction,geneex-pressionprofiles,DNAmicroarrays,high-dimensionalcovariates,regressionprediction,singu-larvaluedecompositionsyInstituteofStatistisandDeisionSienes,DukeUniversity,DurhamNC27708-0251,USA.:Regressionmodelswithlargesetsofhigher-orderinterationsbetweenpreditorvariablesisanobviousontext,thoughherewefousonthesimplerparadigminwhihnisreallyverysmallomparedtop;sothattheopportunitiesforidentifyinginterationsislimited.Funtionalgenomisprovidesamotivatingappliationofsimplyritialimportane{large-salegeneexpressionprolingusingDNAmiroarraydata(Golubetal,1999).Theproblemisexempliedandhighlightedinphenotypingstudies,wheretheentralfousisonrelatingmeasuredgeneexpressionprolestolinialandphysiologialoutomes.Challengingquestionsofmodellingandanalysisariseduetothehigh-dimensionalityofthegeneexpressionprole.OurmainexamplehereomesfromaurrentDukeprojetinbreastanerphenotyping:linkingthemeasuredexpressionoflargenumbersofgenestolinialoutomesinbreastaner.Firstexamplesinvolveonlytwodenedpossibleoutomessoleadingtoabinaryregressionformat.Typially,wewillhaveavailablerathersmallnumbersofindividualtumourtissuesamplesfromwhihtoproduetheRNArequiredtohybridisetotheDNAmiroarraysthatdeliverthegenetiexpressionmeasures;henethe\smalln:Coupledwiththis,thenumberofgenessongerprintedis,withurrentarraytehnologies,intheseveralortensofthousands,henethe\largep.Inthenumerialexamplehere,n=27andp=7129:FurtherdetailsandexampleswillbereportedinWestetal(2000).WefurtherexploreandillustrateourapproahinanalysesofleukemiadatafromareentstudyofGolubetal(1999),wherethemodel-basedapproahisextremelyeetiveinout-of-samplepreditivedisrimination.Toaddressthemodellingandanalysishallenges,wedevelopanovelapproahtoBayesianregressionanalysis,fousingonthebinaryregressionontext.Inthisframework,weutilisesingular-valuedeompositionsofmatriesofmeasuredvaluesoflargenumbersofpreditorsarosssamples,generatingfatorrepresentationsandpossiblymassivedimensionredutiontosummary\super-preditorsofuseinexploratoryanalyses;introduelassesofnovelpriordistributionsforlargeregressionparameterstoreetthedependeneandsingularitystrutureevidentinlikelihoodfuntionsbasedonlargenumbersofpreditors,andthatutilisethesingular-valuestrutureofthedesignmatriestoindueapotentiallymassiveredutionintheparameterspaerelevanttoposterioromputation;and,hene,developeasilyimplementedandstandardMCMCmethodsforbinaryregressionmod-elstoprodueposteriorinferenesonthehigh-dimensionalregressionparameter,andonsequentevaluationofout-of-samplepreditiveutilityinprobabilistilassiationofnewases.Theresultinganalysisandmethodologyisillustratedviasomeanalysissummariesinthebreastanerphenotypingontext,andintheleukemiadisriminationproblem.BayesianRegressionAnalysiswithpn32.BinaryregressionandlatentnormallinearmodelsConsiderthestandardbinaryregressionontextinwhihbinaryresponsesz1;:::;znareas-sumedtobedesribedbyaprobitregressiononasetofppreditors.Thatis,independentlyarossasesi=1;:::;n;eahziisabinaryoutomewithPr(zi=1j)=(x0i)(1)wherexiisthevetorofppreditorvaluesforasei;isthepvetorregressionparametertobeinferred,and()isthestandardnormalum

1 / 42
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功