PublishedintheJournalofParallelandDistributedComputing1994ScalableLoadBalancingStrategiesforParallelA*Algorithms1ShantanuDuttandNiharR.MahapatraDepartmentofElectricalEngineering,UniversityofMinnesota,Minneapolis,MN55455AbstractInthispaper,wedeveloploadbalancingstrategiesforscalablehigh-performanceparallelA*algorithmssuitablefordistributed-memorymachines.InparallelA*search,ine cienciessuchasprocessorstarvationandsearchofnonessentialspaces(searchspacesnotexploredbythesequentialalgorithm)growwiththenumberofprocessorsPused,thusrestrictingitsscalability.Toalleviatethise ect,weproposeanovelparallelstartupphaseandane cientdynamicloadbalancingstrat-egycalledthequalityequalizing(QE)strategy.Ournewparallelstartupschemeexecutesoptimallyin (logP)timeand,inaddition,achievesgoodinitialloadbalance.TheQEstrategypossessescertainuniquequantitativeandqualitativeloadbalancingpropertiesthatenableittosigni cantlyreducestarvationandnonessentialwork.Consequently,weobtainahighlyscalableparallelA*algorithmwithanalmost-linearspeedup.ThestartupandloadbalancingschemeswereemployedinparallelA*algorithmstosolvetheTravelingSalesmanProblemonannCUBE2hypercubemul-ticomputer.TheQEstrategyyieldsaveragespeedupimprovementsofabout20-185%and15-120%atlowandintermediateworkdensities(theratiooftheproblemsizetoP),respectively,overthreewell-knownloadbalancingmethods|theround-robin(RR),therandomcommunication(RC)andtheneighborhoodaveraging(NA)strategies.Theaveragespeedupobservedon1024processorsisabout985,representingaveryhighe ciencyof0:96.Finally,weanalyzeandempiricallyevaluatethescalabilityofparallelA*algorithmsintermsoftheisoe ciencymetric.Ouranalysisgives(1)a (P:logP)lowerboundontheisoe ciencyfunctionofanyparallelA*algorithm,and(2)ageneralexpressionfortheupperboundontheisoe ciencyfunctionofourparallelA*algorithmusingtheQEstrategyonanytopology|forthehypercubeand2-Dmesharchitecturestheupperboundsontheisoe ciencyfunctionarefoundtobe (P:log2P)and (P:pP),respectively.Experimentalresultsvalidateouranalysis,andalsoshowthatparallelA*searchusingtheQEloadbalancingstrategyhasbetterscalabilitythanwhenusingtheRR,RCorNAstrategies.1ThisresearchwasfundedinpartbyaGrant-in-AidfromtheUniversityofMinnesotaandinpartbyNSFgrantMIP-9210049.SandiaNationalLabsprovidedaccesstotheir1024-processornCUBE2parallelcomputer.1IntroductionTheA*algorithm[20]isawellknowngeneralizedbranch-and-boundsearchprocedure,widelyusedinthesolutionofmanycomputationallydemandingcombinatorialoptimizationproblems(COPs)[4,22].Itsoperation,asdetailedlater,canbeviewedessentiallyasabest- rstsearchofastate-spacegraph.Parallelizationofbranch-and-boundmethodsprovidesane ectivemeanstomeetthecomputationalneedsofmanypracticalsearchproblems[3,7].Theaimofourworkistodevelopscalablehigh-performanceparallelA*algorithmsforsolvingCOPsondistributed-memorymachines.However,parallelizationofA*introducesanumberofine ciencies.(1)First,thetimerequiredinitiallytosplitthewholesearchspaceamongallPprocessors,i.e.,thestartupphasetime,canbeasigni cantfractionofthetotalexecutiontimeatlowworkdensities(theratiooftheproblemsizetoP).Thereforethestartupphaseneedstobeexecutede ciently.Also,itisdesirabletohaveagoodinitialloadbalancetoreduceidlingatthebeginningofparallelA*.(2)InsearchalgorithmssuchasA*,theamountofworkcorrespondingtodi erentsearchsubspacesisverydi culttoestimateandcanvarywidely.Hencesomeformofdynamic,quantitativeloadbalancingiscrucialtoreducingtheidlingthatwouldotherwiseoccur.(3)Finally,processorsperformingbest- rstsearchoftheirlocalsubspacesinparallelA*maysearchspacesthatasequentialA*algorithmwillnotexplore.Thiscanleadtosubstantial\nonessentialwork.Toaddressthisproblem,itisimperativetoperformdynamicqualitativeloadbalancingsothatatalltimesdi erentprocessorssearchspacesthatarecomparablypromising.Inadditiontotheaboveine ciencies,duplicatedworkamongprocessorscanoccurwhenthesearchspaceisagraph.Thisproblemcanbetackledbyusinge cientduplicatepruningtechniques[8,16,17].However,sincethefocusofthispaperisonloadbalancingstrategies,wewillrestrictourattentiontotreesearchspacessothatperformancecomparisonofparallelA*algorithmsemployingdi erentloadbalancingmethodsre ectsthee ectivenessofthesealgorithmsinachievingloadbalance(ratherthantheire cacyinpruningduplicates).Thesameloadbalancingmethodscanbeappliedwithequale ectiveness,inconjunctionwithanyduplicatepruningtechniques,tograph-searchproblems[16]asexplainedbrie yinSection3.WhileanumberofinnovativeparallelA*methodshavebeenproposedinpastwork,theyhavenotadequatelyaddressedtheine cienciesofslowstartupandloadimbalance.Inpreviouswork,a (P)-timesequentialstartupphaseinwhichasingleprocessorgeneratesthePstartingnodesneededforparallelsearchbyallprocessorswasused[12].Also,noexplicitattemptsweremadeto1obtainagoodinitialloadbalanceinpriorstartupschemes.AnumberofdynamicloadbalancingmethodsforparallelA*havealsobeenpreviouslyproposed[1,2,9,10,11,12,14,19,21,23].Wecriticallyanalyzethee ectivenessofsomerepresentativemethodsinSection4andpointouttheirdrawbacks.Inthispaper,weproposeaparallelA*algorithmwithsigni