Scalable global and local hashing strategies for d

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SubmittedtoIEEETransactionsonParallelandDistributedSystems,1994.ScalableGlobalandLocalHashingStrategiesforDuplicatePruninginParallelA*GraphSearch1NiharR.MahapatraandShantanuDuttDepartmentofElectricalEngineering,UniversityofMinnesota,Minneapolis,MN55455PrincipalContact:ShantanuDuttPhone:612-625-0323;Fax:612-625-4583emailaddress:dutt@ee.umn.eduJuly19,1995AbstractFormanyapplicationsoftheA*algorithm,thestatespaceisagraphratherthanatree.TheimplicationofthisforparallelA*algorithmsisthatdierentprocessorsmayperformsignicantduplicatedworkifinter-processorduplicatesarenotpruned.Inthispaper,weconsidertheproblemofduplicatepruninginparallelA*graph-searchalgorithmsimplementedondistributed-memorymachines.Acommonlyusedmethodforduplicatepruningusesahashfunctiontoassociatewitheachdistinctnodeofthesearchspaceaparticularprocessortowhichduplicatenodesarisingindierentprocessorsaretransmittedandtherebypruned.Thisapproachhastwomajordrawbacks.First,loadbalanceisdeterminedsolelybythehashfunction.Second,nodetransmissionsforduplicatepruningareglobal;thiscanleadtohotspotsandslowermessagedelivery.Toovercometheseproblems,weproposetwodierentduplicatepruningstrategies:(1)Toachievegoodloadbalance,wedecouplethetaskofduplicatepruningfromloadbalancing,byusingahashfunctionfortheformerandaloadbalancingschemeforthelatter.(2)Anovelsearch-spacepartitioningschemethatallocatesdisjointpartsofthesearchspacetodisjointsubcubesinahypercube(ordisjointprocessorgroupsinthetargetarchitecture),sothatduplicatepruningisachievedwithonlyintra-subcubeoradjacentinter-subcubecommunication.Thusmessagelatencyandhot-spotprobabilityaregreatlyreduced.TheaboveduplicatepruningschemeswereimplementedonannCUBE2hypercubemulticomputertosolvetheTravelingSalesmanProblem(TSP).Foruniformlydistributedinter-citycosts,ourstrategiesyieldaspeedupimprovementof13-35%on1024processorsoverpreviousmethodsthatdonotpruneanyduplicates,and13-25%overtheprevioushashing-onlyscheme.Fornormallydistributeddatathecorrespondingguresare135%and10-155%.Finally,weanalyzethescalabilityofourparallelA*algorithmsonk-aryn-cubenetworksintermsoftheisoeciencymetric,andshowthattheyhaveisoeciencylowerandupperboundsof(PlogP)and(Pkn2),respectively.Keywords:A*algorithm,branch-and-boundsearch,communicationdelay,duplicatepruning,graphsearch,isoeciencyfunction,k-aryn-cubes,parallelA*,scalability,travelingsalesmanproblem.1ThisresearchwasfundedinpartbyaGrant-in-AidfromtheUniversityofMinnesotaGraduateSchoolandinpartbytheNSFgrantMIP-9210049.SandiaNationalLabsprovidedaccesstotheir1024-processornCUBE2parallelcomputer.1IntroductionTheA*algorithm[24]isawell-known,generalizedbranch-and-boundsearchprocedure,widelyusedinthesolutionofmanycomputationallydemandingcombinatorialoptimizationproblems(COP’s)[27].Itsoperation,asdetailedlater,canbeviewedessentiallyasabest-rstsearchofastate-spacegraph.Parallelizationofbranch-and-boundmethodsprovidesaneectivemeanstomeetthecomputationalneedsofmanypracticalsearchproblems[7].Manyresearchershaveadoptedatreesearch-spaceformulationinproblemsliketheTravel-ingSalesmanProblem(TSP)[12,23],the15-puzzleproblem[12]andthevertexcoverproblem[12],wherethenaturalformulationisagraph.Thisimpliesthatnodesrepresentingidenticalsub-problemsarearticiallydenedtohavedierentstates.Thusidentical-subproblemnodesremainundetectedresultinginduplicationofsearch.Thisleadstoaninecient,althougheasilyparal-lelizable,sequentialA*/best-rstsearchalgorithm:eachprocessorexploresadierentpartofthesearchspacethatisdisjointfromthesearchspacesofotherprocessors.Thusa\misleadinggoodspeedupcanbeobtained.Agraphformulation,ontheotherhand,willenabledetectionofidenticalsubproblemsinA*andhenceavoidduplicatedsearch.However,itisnotaseasilyamenabletoparallelizationondistributed-memorymachinesasatreeformulation,sinceduplicatenodesmaybegeneratedindierentprocessors.Acommonlyusedmethodforpruninginter-processorduplicatesingraphsearch,utilizesasuitablehashfunctiontoassociatean\ownerprocessorwitheachdistinctnodeofthesearchspace.Thenduplicatenodesarisingindierentprocessorsaretransmittedtothesameownerwheretheyarepruned[8,21].Therearetwosignicantshortcomingsinthisapproach.First,loadbalanceisdeterminedsolelybythehashfunctionandhencemaynotbeveryeective.Second,nodetransmissionsforduplicatepruningareglobal;thiscanleadtohighermessagecontention(hotspots)andlatency.Theoverallgoalofourworkistodevelopscalablehigh-performanceparallelA*algorithmsfordistributed-memorymachinesthatcanbeappliedtomostCOP’s.InourpreviousworkonparallelA*,wedevelopedecientloadbalancingstrategies[5,6]thatareequallyusefulforbothtree-searchandgraph-searchproblems,anddemonstratedtheirsuperiorperformanceoverothercompetingmethods[1,11,13,17,25].Inthispaper,wedevelopecientinter-processorduplicatepruningmethods,andincorporatetheminparallelA*algorithmstoobtainhighspeedupoversequentialA*graphsearch.InSec.2,werstdescribeA*andthenpresentanimprovedversionusedinourimplementations.Next,inSec.3weoutlineourapplicationofA*toTSP,thetestproblemusedtodeterminetheecacyofourparallelizationtechniqu

1 / 34
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功