Scalable Load Balancing Strategies for Parallel A

逃课也收上课费
1 ℃
2020-01-27

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

PublishedintheJournalofParallelandDistributedComputing1994ScalableLoadBalancingStrategiesforParallelA*Algorithms1ShantanuDuttandNiharR.MahapatraDepartmentofElectricalEngineering,UniversityofMinnesota,Minneapolis,MN55455AbstractInthispaper,wedeveloploadbalancingstrategiesforscalablehigh-performanceparallelA*algorithmssuitablefordistributed-memorymachines.InparallelA*search,inecienciessuchasprocessorstarvationandsearchofnonessentialspaces(searchspacesnotexploredbythesequentialalgorithm)growwiththenumberofprocessorsPused,thusrestrictingitsscalability.Toalleviatethiseect,weproposeanovelparallelstartupphaseandanecientdynamicloadbalancingstrat-egycalledthequalityequalizing(QE)strategy.Ournewparallelstartupschemeexecutesoptimallyin(logP)timeand,inaddition,achievesgoodinitialloadbalance.TheQEstrategypossessescertainuniquequantitativeandqualitativeloadbalancingpropertiesthatenableittosignicantlyreducestarvationandnonessentialwork.Consequently,weobtainahighlyscalableparallelA*algorithmwithanalmost-linearspeedup.ThestartupandloadbalancingschemeswereemployedinparallelA*algorithmstosolvetheTravelingSalesmanProblemonannCUBE2hypercubemul-ticomputer.TheQEstrategyyieldsaveragespeedupimprovementsofabout20-185%and15-120%atlowandintermediateworkdensities(theratiooftheproblemsizetoP),respectively,overthreewell-knownloadbalancingmethods|theround-robin(RR),therandomcommunication(RC)andtheneighborhoodaveraging(NA)strategies.Theaveragespeedupobservedon1024processorsisabout985,representingaveryhigheciencyof0:96.Finally,weanalyzeandempiricallyevaluatethescalabilityofparallelA*algorithmsintermsoftheisoeciencymetric.Ouranalysisgives(1)a(P:logP)lowerboundontheisoeciencyfunctionofanyparallelA*algorithm,and(2)ageneralexpressionfortheupperboundontheisoeciencyfunctionofourparallelA*algorithmusingtheQEstrategyonanytopology|forthehypercubeand2-Dmesharchitecturestheupperboundsontheisoeciencyfunctionarefoundtobe(P:log2P)and(P:pP),respectively.Experimentalresultsvalidateouranalysis,andalsoshowthatparallelA*searchusingtheQEloadbalancingstrategyhasbetterscalabilitythanwhenusingtheRR,RCorNAstrategies.1ThisresearchwasfundedinpartbyaGrant-in-AidfromtheUniversityofMinnesotaandinpartbyNSFgrantMIP-9210049.SandiaNationalLabsprovidedaccesstotheir1024-processornCUBE2parallelcomputer.1IntroductionTheA*algorithm[20]isawellknowngeneralizedbranch-and-boundsearchprocedure,widelyusedinthesolutionofmanycomputationallydemandingcombinatorialoptimizationproblems(COPs)[4,22].Itsoperation,asdetailedlater,canbeviewedessentiallyasabest-rstsearchofastate-spacegraph.Parallelizationofbranch-and-boundmethodsprovidesaneectivemeanstomeetthecomputationalneedsofmanypracticalsearchproblems[3,7].Theaimofourworkistodevelopscalablehigh-performanceparallelA*algorithmsforsolvingCOPsondistributed-memorymachines.However,parallelizationofA*introducesanumberofineciencies.(1)First,thetimerequiredinitiallytosplitthewholesearchspaceamongallPprocessors,i.e.,thestartupphasetime,canbeasignicantfractionofthetotalexecutiontimeatlowworkdensities(theratiooftheproblemsizetoP).Thereforethestartupphaseneedstobeexecutedeciently.Also,itisdesirabletohaveagoodinitialloadbalancetoreduceidlingatthebeginningofparallelA*.(2)InsearchalgorithmssuchasA*,theamountofworkcorrespondingtodierentsearchsubspacesisverydiculttoestimateandcanvarywidely.Hencesomeformofdynamic,quantitativeloadbalancingiscrucialtoreducingtheidlingthatwouldotherwiseoccur.(3)Finally,processorsperformingbest-rstsearchoftheirlocalsubspacesinparallelA*maysearchspacesthatasequentialA*algorithmwillnotexplore.Thiscanleadtosubstantial\nonessentialwork.Toaddressthisproblem,itisimperativetoperformdynamicqualitativeloadbalancingsothatatalltimesdierentprocessorssearchspacesthatarecomparablypromising.Inadditiontotheaboveineciencies,duplicatedworkamongprocessorscanoccurwhenthesearchspaceisagraph.Thisproblemcanbetackledbyusingecientduplicatepruningtechniques[8,16,17].However,sincethefocusofthispaperisonloadbalancingstrategies,wewillrestrictourattentiontotreesearchspacessothatperformancecomparisonofparallelA*algorithmsemployingdierentloadbalancingmethodsreectstheeectivenessofthesealgorithmsinachievingloadbalance(ratherthantheirecacyinpruningduplicates).Thesameloadbalancingmethodscanbeappliedwithequaleectiveness,inconjunctionwithanyduplicatepruningtechniques,tograph-searchproblems[16]asexplainedbrieyinSection3.WhileanumberofinnovativeparallelA*methodshavebeenproposedinpastwork,theyhavenotadequatelyaddressedtheinecienciesofslowstartupandloadimbalance.Inpreviouswork,a(P)-timesequentialstartupphaseinwhichasingleprocessorgeneratesthePstartingnodesneededforparallelsearchbyallprocessorswasused[12].Also,noexplicitattemptsweremadeto1obtainagoodinitialloadbalanceinpriorstartupschemes.AnumberofdynamicloadbalancingmethodsforparallelA*havealsobeenpreviouslyproposed[1,2,9,10,11,12,14,19,21,23].WecriticallyanalyzetheeectivenessofsomerepresentativemethodsinSection4andpointouttheirdrawbacks.Inthispaper,weproposeaparallelA*algorithmwithsigni