PARALLELDIRECTMETHODSFORSPARSELINEARSYSTEMSMichaelT.HeathDepartmentofComputerScienceandNCSAUniversityofIllinoisUrbana,Illinois61801ABSTRACTWepresentanoverviewofparalleldirectmethodsforsolvingsparsesystemsoflinearequations,focusingonsymmetricpositivede nitesystems.Weexaminetheperformanceimplicationsoftheimportantdi erencesbetweendenseandsparsesystems.Ourmainemphasisisonparallelimplementationofthenumericallyintensivefactorizationprocess,butwealsobrie yconsidertheothermajorcomponentsofdirectmethods,suchasparallelordering.IntroductionInthispaperwepresentabriefoverviewofparalleldirectmeth-odsforsolvingsparselinearsystems.Paradoxically,sparsematrixfactorizationo ersadditionalopportunitiesforexploitingparallelismbeyondthoseavailablewithdensematrices,yetitisoftenmoredif- culttoattaingoode ciencyinthesparsecase.Weexaminebothsidesofthisparadox:theadditionalparallelisminducedbysparsity,andthedi cultyinachievinghighe ciencyinspiteofit.WefocusonCholeskyfactorization,primarilybecausethisallowsustodiscussparallelisminrelativeisolation,withouttheadditionalcomplicationsofpivotingfornumericalstability.Mostofthelessonslearnedarealsoapplicabletoothermatrixfactorizations,suchasLUandQR.Ourmainpointinthecurrentdiscussionistoexplainhowthesparsecasedi ersfromthedensecase,andexaminetheperformanceimplicationsofthosedi erences.ConsiderasystemoflinearequationsAx=b;whereAisann nsymmetricpositivede nite(SPD)matrix,bisaknownvector,andxistheunknownsolutionvectortobecomputed.Onewaytosolvethelinearsystemis rsttocomputetheCholeskyfactorizationA=LLT;wheretheCholeskyfactorLisalowertriangularmatrixwithpositivediagonalelements.ThenthesolutionvectorxcanbecomputedbysuccessiveforwardandbacksubstitutionstosolvethetriangularsystemsLy=b;LTx=y:CholeskyFactorizationAlgorithmThealgorithmforCholeskyfactorizationisavariantofGaussianeliminationthattakesadvantageofsymmetrytoreducebothworkandstoragebyabouthalf.LikeGaussianelimination,thealgorithmconsistsofatriplenestedloop.Oneofthe3!waysofarrangingthatloopisshowninFigure1.forj=1;nfork=1;j 1fori=j;naij=aij aik ajkfcmod(j;k)gajj=pajjfork=j+1;nakj=akj=ajjfcdiv(j)gFigure1:SerialCholeskyfactorizationalgorithmWemakethefollowingimportantobservationsaboutthisalgorithm: SinceAisSPD,thesquarerootsareallofpositivenumbers,sothealgorithmiswellde ned. Pivotingisnotrequiredfornumericalstability. OnlythelowertriangularportionofAisaccessed. ThefactorLiscomputedinplace,overwritingthelowertrian-gleofA. Eachcolumnjismodi edbyamultipleofeachpriorcolumnk.Wedenotethisoperationbycmod(j;k). Ifthatmultipleiszero(i.e.,ajk=0),thentheinnermostloophasnoe ectandmayaswellbeskipped. ElementsofAthatwereinitiallyzeromaybecomenonzeroduetocmodoperationsbynonzeroelementsfrompreviouscolumns.Suchnewnonzerosarecalled ll. Whenallmodi cationstocolumnjarecomplete,itisscaledbythesquarerootofitsdiagonalelementtoproducecolumnjofthefactor.Wedenotethisoperationbycdiv(j).ForfurtherdetailsonCholeskyfactorization,see(GolubandVanLoan,1989).ThreeFormsofCholeskyFactorizationThethreechoicesofindexfortheouterloopyieldmarkedlydi er-entmemoryaccesspatterns,asillustratedinFigure2,andthesehaveimportantperformanceimplicationsinvariousarchitecturalsettings,suchase ectivecacheutilization,vectorization,parallelization,orout-of-coresolutions. Row-Cholesky:Withiintheouterloop,theinnerloopssolveatriangularsystemforeachnewrowintermsofthepreviouslycomputedrows. Column-Cholesky:Withjintheouterloop,theinnerloopscomputethematrix-vectorproductthatgivesthee ectofpre-viouslycomputedcolumnsonthecolumncurrentlybeingcom-puted. Submatrix-Cholesky:Withkintheouterloop,theinnerloopsapplythecurrentcolumnasarank-1updatetotheremainingunreducedsubmatrix.Althoughrow-orientedalgorithmscanbee ectiveinsomecon-texts,column-orientedalgorithmstendtobemuchmoree ectiveinpracticeforsparseproblems,sowewillrestrictourattentiontothelatter.row-Choleskycolumn-Choleskysubmatrix-Choleskymodi edusedformodi cationFigure2:ThreeformsofCholeskyfactorization.Column-Choleskyissometimessaidtobea\left-lookingalgo-rithm,sinceateachstageitaccessesneededcolumnstotheleftofthecurrentcolumninthematrix.Itcanalsobeviewedasa\demand-drivenalgorithm,sincetheinnerproductsthata ectagivencolumnarenotaccumulateduntilactuallyneededtomod-ifyandcompletethatcolumn.Forthisreason,column-Choleskyiscalleda\delayed-updatealgorithm.Itisalsoreferredtoasa\fan-inalgorithm,sincethebasicoperationistocombinethee ectsofmultiplepreviouscolumnsonasingletargetcolumn.Insubmatrix-Cholesky,assoonacolumnhasbeencomputed,itse ectsonallsubsequentcolumnsarecomputedimmediately.Thus,submatrix-Choleskyissaidtobea\right-lookingalgorithm,sinceateachstagecolumnstotherightofthecurrentcolumnaremodi- ed.Itcanalsobeviewedasa\data-drivenalgorithm,sinceeachnewcolumnisusedassoonasitiscompletedtomakeallmodi- cationstoallthesubsequentcolumnsita ects.Forthisreason,submatrix-Choleskyiscalledan\immediate-updatealgorithm.Itisalsoreferredtoasa\fan-outalgorithm,sincethebasicopera-tionisforasinglecolumntoa ectmultiplesubsequentcolumns.Wewillseethatthesecharacterizationsofthecolumn-Cho