SubmittedtoIEEETransactionsonParallelandDistributedSystems,1994.ScalableGlobalandLocalHashingStrategiesforDuplicatePruninginParallelA*GraphSearch1NiharR.MahapatraandShantanuDuttDepartmentofElectricalEngineering,UniversityofMinnesota,Minneapolis,MN55455PrincipalContact:ShantanuDuttPhone:612-625-0323;Fax:612-625-4583emailaddress:dutt@ee.umn.eduJuly19,1995AbstractFormanyapplicationsoftheA*algorithm,thestatespaceisagraphratherthanatree.TheimplicationofthisforparallelA*algorithmsisthatdi erentprocessorsmayperformsigni cantduplicatedworkifinter-processorduplicatesarenotpruned.Inthispaper,weconsidertheproblemofduplicatepruninginparallelA*graph-searchalgorithmsimplementedondistributed-memorymachines.Acommonlyusedmethodforduplicatepruningusesahashfunctiontoassociatewitheachdistinctnodeofthesearchspaceaparticularprocessortowhichduplicatenodesarisingindi erentprocessorsaretransmittedandtherebypruned.Thisapproachhastwomajordrawbacks.First,loadbalanceisdeterminedsolelybythehashfunction.Second,nodetransmissionsforduplicatepruningareglobal;thiscanleadtohotspotsandslowermessagedelivery.Toovercometheseproblems,weproposetwodi erentduplicatepruningstrategies:(1)Toachievegoodloadbalance,wedecouplethetaskofduplicatepruningfromloadbalancing,byusingahashfunctionfortheformerandaloadbalancingschemeforthelatter.(2)Anovelsearch-spacepartitioningschemethatallocatesdisjointpartsofthesearchspacetodisjointsubcubesinahypercube(ordisjointprocessorgroupsinthetargetarchitecture),sothatduplicatepruningisachievedwithonlyintra-subcubeoradjacentinter-subcubecommunication.Thusmessagelatencyandhot-spotprobabilityaregreatlyreduced.TheaboveduplicatepruningschemeswereimplementedonannCUBE2hypercubemulticomputertosolvetheTravelingSalesmanProblem(TSP).Foruniformlydistributedinter-citycosts,ourstrategiesyieldaspeedupimprovementof13-35%on1024processorsoverpreviousmethodsthatdonotpruneanyduplicates,and13-25%overtheprevioushashing-onlyscheme.Fornormallydistributeddatathecorresponding guresare135%and10-155%.Finally,weanalyzethescalabilityofourparallelA*algorithmsonk-aryn-cubenetworksintermsoftheisoe ciencymetric,andshowthattheyhaveisoe ciencylowerandupperboundsof (PlogP)and (Pkn2),respectively.Keywords:A*algorithm,branch-and-boundsearch,communicationdelay,duplicatepruning,graphsearch,isoe ciencyfunction,k-aryn-cubes,parallelA*,scalability,travelingsalesmanproblem.1ThisresearchwasfundedinpartbyaGrant-in-AidfromtheUniversityofMinnesotaGraduateSchoolandinpartbytheNSFgrantMIP-9210049.SandiaNationalLabsprovidedaccesstotheir1024-processornCUBE2parallelcomputer.1IntroductionTheA*algorithm[24]isawell-known,generalizedbranch-and-boundsearchprocedure,widelyusedinthesolutionofmanycomputationallydemandingcombinatorialoptimizationproblems(COP’s)[27].Itsoperation,asdetailedlater,canbeviewedessentiallyasabest- rstsearchofastate-spacegraph.Parallelizationofbranch-and-boundmethodsprovidesane ectivemeanstomeetthecomputationalneedsofmanypracticalsearchproblems[7].Manyresearchershaveadoptedatreesearch-spaceformulationinproblemsliketheTravel-ingSalesmanProblem(TSP)[12,23],the15-puzzleproblem[12]andthevertexcoverproblem[12],wherethenaturalformulationisagraph.Thisimpliesthatnodesrepresentingidenticalsub-problemsarearti ciallyde nedtohavedi erentstates.Thusidentical-subproblemnodesremainundetectedresultinginduplicationofsearch.Thisleadstoanine cient,althougheasilyparal-lelizable,sequentialA*/best- rstsearchalgorithm:eachprocessorexploresadi erentpartofthesearchspacethatisdisjointfromthesearchspacesofotherprocessors.Thusa\misleadinggoodspeedupcanbeobtained.Agraphformulation,ontheotherhand,willenabledetectionofidenticalsubproblemsinA*andhenceavoidduplicatedsearch.However,itisnotaseasilyamenabletoparallelizationondistributed-memorymachinesasatreeformulation,sinceduplicatenodesmaybegeneratedindi erentprocessors.Acommonlyusedmethodforpruninginter-processorduplicatesingraphsearch,utilizesasuitablehashfunctiontoassociatean\ownerprocessorwitheachdistinctnodeofthesearchspace.Thenduplicatenodesarisingindi erentprocessorsaretransmittedtothesameownerwheretheyarepruned[8,21].Therearetwosigni cantshortcomingsinthisapproach.First,loadbalanceisdeterminedsolelybythehashfunctionandhencemaynotbeverye ective.Second,nodetransmissionsforduplicatepruningareglobal;thiscanleadtohighermessagecontention(hotspots)andlatency.Theoverallgoalofourworkistodevelopscalablehigh-performanceparallelA*algorithmsfordistributed-memorymachinesthatcanbeappliedtomostCOP’s.InourpreviousworkonparallelA*,wedevelopede cientloadbalancingstrategies[5,6]thatareequallyusefulforbothtree-searchandgraph-searchproblems,anddemonstratedtheirsuperiorperformanceoverothercompetingmethods[1,11,13,17,25].Inthispaper,wedevelope cientinter-processorduplicatepruningmethods,andincorporatetheminparallelA*algorithmstoobtainhighspeedupoversequentialA*graphsearch.InSec.2,we rstdescribeA*andthenpresentanimprovedversionusedinourimplementations.Next,inSec.3weoutlineourapplicationofA*toTSP,thetestproblemusedtodeterminethee cacyofourparallelizationtechniqu