Parallel Algorithms for Forward and Back Substitut

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ParallelAlgorithmsforForwardandBackSubstitutioninDirectSolutionofSparseLinearSystems∗ANSHULGUPTAIBMT.J.WATSONRESEARCHCENTERYORKTOWNHEIGHTS,NY10598ANSHUL@WATSON.IBM.COMVIPINKUMARDEPARTMENTOFCOMPUTERSCIENCEUNIVERSITYOFMINNESOTAMINNEAPOLIS,MN55455KUMAR@CS.UMN.EDU∗ThisworkwassupportedbyIST/BMDOthroughArmyResearchOfficecon-tractDA/DAAH04-93-G-0080,NSFgrantNSG/1RI-9216941,andbyArmyHighPerformanceComputingResearchCenterundertheauspicesoftheDepartmentoftheArmy,ArmyResearchLaboratorycooperativeagreementnumberDAAH04-95-2-0003/contractnumberDAAH04-95-C-0008,thecontentofwhichdoesnotnecessarilyreflectthepositionorthepolicyofthegovernment,andnoofficialendorsementshouldbeinferred.AccesstocomputingfacilitieswereprovidedbyMinnesotaSupercomputerInstitute,CrayResearchInc.andbythePitts-burghSupercomputingCenter.Relatedpapersareavailablevia://:reordering,symbolicfac-torization,numericalfactorization,andsolvingthelower-andupper-triangularsystemsresultingfromfactorization.Sincenumericalfac-torizationiscomputationallythemostexpensivephase,asignificantresearchefforthasbeendirectedtowardsdevelopingefficientandscalableparallelsparsefactorizationalgorithms.Wehaverecentlyproposed[4]aparallelsparseCholeskyfactorizationalgorithmthatisoptimallyscalableforawideclassofproblems.ExperimentshaveshownthatthisalgorithmcaneasilyspeedupCholeskyfactorizationbyafactorofatleastafewhundredonupto1024processors.Withsuchspeedupsinnumericalfactorization,itisimperativethatthere-mainingphasesofthesolutionprocessbeparallelizedeffectivelyinordertoscaletheperformanceoftheoverallsolver.Furthermore,withoutanoverallparallelsolver,thesizeofthesparsesystemsthatcanbesolvedmaybeseverelyrestrictedbytheamountofmemoryavailableonauniprocessorsystem.Inthispaper,weaddresstheproblemofperformingthefinalphaseofforwardandbackwardsubstitutioninparallelonadistributedmem-orymultiprocessor.Wepresentadetailedanalysisoftheparallelcomplexityandscalabilityofparallelalgorithmdescribedbrieflyin[5]toobtainasolutiontothesystemofsparselinearequationsoftheformsLY=BandUX=Y,whereLisalowertriangularmatrixandUisanuppertriangularmatrix.HereLandUareobtainedfromthenumericalfactorizationofasparsecoefficientmatrixAoftheoriginalsystemAX=Btobesolved.IfA,L,andUareN×Nmatrices,2thenX,Y,andBareN×mmatrices,wheremisthenumberofright-handsidevectorsforwhichthesolutiontothesparselinearsystemwithAasthecoefficientmatrixisdesired.Ouranalysisandexperi-mentsshowthat,althoughnotasscalableasthebestparallelsparseCholeskyfactorizationalgorithms,parallelsparsetriangularsolverscanyieldreasonablespeedupsinruntimeonhundredsofprocessors.Wealsoshowthatforawideclassofproblems,thesparsetriangularsolversdescribedinthispaperareoptimalandareasymptoticallyasscalableasadensetriangularsolver.Forasingleright-handside(m=1),ourexperimentsshowa256-processorperformanceofupto435MFLOPSonaCrayT3D,onwhichthesingle-processorperformanceforthesameproblemis≈8.6MFLOPS.Withm=30,themaximumsingle-processorand256-processorperformanceobservedinourexperimentsis≈30MFLOPSand≈3050MFLOPS,respectively.Tothebestofourknowledge,thisisthehighestperformanceandspeedupforthisproblemreportedonanymassivelyparallelcomputer.Inadditiontotheperformanceandscalabilityanalysisofparallelsparsetriangularsolvers,wediscusstheredistributionofthetriangu-larfactormatrixamongtheprocessorsbetweennumericalfactoriza-tionandtriangularsolution,anditsimpactonperformance.In[4],wedescribeanoptimaldata-distributionschemeforCholeskyfactoriza-tionofsparsematrices.ThisdistributionleavesgroupsofconsecutivecolumnsofLwithidenticalpatternofnon-zeros(henceforthcalledsupernodes)withatwo-dimensionalpartitioningamonggroupsofprocessors.However,thisdistributionisnotsuitableforthetriangularsolvers,whicharescalableonlywithaone-dimensionalpartition-ingofthesupernodalblocksofL.Weshowthatifthesupernodesaredistributedinasubtree-to-subcubemanner[2]thenthecostofconvertingthetwo-dimensionaldistributiontoaone-dimensionaldis-tributionisonlyaconstanttimesthecostofsolvingthetriangularsystems.Fromourexperiments,weobservedthatthisconstantisfairlysmallontheCrayT3D—atmost0.9forasingleright-handsid

1 / 20
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功