New Anticipatory Load Balancing Strategies for Par

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

TobepublishedinAmericanMathematicalSociety’sProc.intheDIMACSSeriesonDiscreteMathematicsandTheoreticalComputerSc.,Apr.1995.NewAnticipatoryLoadBalancingStrategiesforParallelA*AlgorithmsNiharR.MahapatraandShantanuDuttfmahapatra,duttg@ee.umn.eduDepartmentofElectricalEngineering,UniversityofMinnesota,Minneapolis,MN55455AbstractInthispaper,wedeveloploadbalancingstrategiesforscalablehigh-performanceparallelA*algorithmssuitablefordistributed-memoryma-chines.InparallelA*search,inecienciessuchasprocessorstarvationandsearchofnon-essentialspaces(searchspacesnotexploredbythesequentialalgorithm)growwiththenumberofprocessorsPused,thusrestrictingitsscalability.Toalleviatethiseect,weproposeanovelpar-allelstartupphaseandanecientdynamicloadbalancingstrategycalledthequalityequalizing(QE)strategy.Ournewparallelstartupschemeexecutesoptimallyin(logP)timeand,inaddition,achievesgoodini-tialloadbalance.TheQEstrategyemploysnear-neighborquantitativeandqualitativeloadbalancingschemestoachieveloadbalance.Theseschemesutilizeanticipatorymechanismstodetectandcorrectloadim-balancebeforeitsactualoccurrence;suchmechanismsareparticularlyusefulatlowerworkdensities(theratiooftheproblemsizetoP)andforlowergranularityapplications.TheQEstrategypossessescertainuniqueloadbalancingpropertiesthatenableittosignicantlyreducestarvationandnon-essentialwork,andthatmakeitsperformancero-bustacrossapplicationswithdierentcostdistributionsforsearch-spacenodes.Consequently,weobtainahighlyscalableparallelA*algorithmwithanalmost-linearspeedup.ThestartupandloadbalancingschemeswereemployedinparallelA*algorithmstosolvetheTravelingSalesmanProblemonannCUBE2hypercubemulticomputer.TheQEstrategyyieldsaveragespeedupimprovementsofabout20-185%and15-120%atlowandintermediateworkdensities,respectively,overthreewell-knownloadbalancingmethods|theround-robin(RR),therandomcommu-nication(RC)andtheneighborhoodaveraging(NA)strategies.Theaveragespeedupobservedon1024processorsisabout985,representingaveryhigheciencyof0:96.WealsotestedtheeectofincludingananticipatoryqualitativeloadbalancingschemeintheQEstrategyandfoundthatitreducestheaverageexecutiontimeby3:32%and8:77%onThisresearchwasfundedinpartbyaGrant-in-AidfromtheUniversityofMinnesotaandinpartbyNSFgrantMIP-9210049.SandiaNationalLabsprovidedaccesstotheir1024-processornCUBE2parallelcomputer.1256and512processors,respectively,atlowerworkdensities.Finally,wepresentanalyticalandempiricalresultsonthescalabilityofparallelA*algorithmsintermsoftheisoeciencymetric.Ouranalyticalre-sultsinclude(1)a(P:logP)lowerboundontheisoeciencyfunctionofanyparallelA*algorithm,and(2)ageneralexpressionfortheupperboundontheisoeciencyfunctionofourparallelA*algorithmusingtheQEstrategyonanytopology|forthehypercubeand2-Dmeshar-chitecturestheupperboundsontheisoeciencyfunctionarefoundtobe(P:log2P)and(P:pP),respectively.Experimentalresultsvalidateouranalysis,andalsoshowthatparallelA*searchusingtheQEloadbalancingstrategyhasbetterscalabilitythanwhenusingtheRR,RCorNAstrategies.1IntroductionTheA*algorithm[21]isawell-known,generalizedbranch-and-boundsearchprocedure,widelyusedinthesolutionofmanycomputationallydemandingcombinatorialoptimizationproblems(COPs)[4,23].Itsoperation,asde-tailedlater,canbeviewedessentiallyasabest-rstsearchofastatespacegraph.Parallelizationofbranch-and-boundmethodsprovidesaneectivemeanstomeetthecomputationalneedsofmanypracticalsearchproblems[3,8].Theaimofourworkistodevelopscalablehigh-performanceparallelA*algorithmsforsolvingCOPsondistributed-memorymachines.However,parallelizationofA*introducesanumberofineciencies.(1)First,thetimerequiredinitiallytosplitthewholesearchspaceamongallPprocessors,i.e.,thestartupphasetime,canbeasignicantfractionofthetotalexecutiontimeatlowworkdensities(theratiooftheproblemsizetoP).Thereforethestartupphaseneedstobeexecutedeciently.Also,itisdesirabletohaveagoodinitialloadbalancetoreduceidlingatthebeginningofparallelA*.(2)InsearchalgorithmssuchasA*,theamountofworkcorrespondingtodier-entsearchsubspacesisverydiculttoestimateandcanvarywidely.Hencesomeformofdynamic,quantitativeloadbalancingiscrucialtoreducingtheidlingthatwouldotherwiseoccur.(3)Finally,processorsperformingbest-rstsearchoftheirlocalsubspacesinparallelA*maysearchspacesthatasequentialA*algorithmwillnotexplore.Thiscanleadtosubstantial\non-essentialwork.Toaddressthisproblem,itisimperativetoperformdynamicqualitativeloadbalancingsothatatalltimesdierentprocessorssearchspacesthatarecomparablypromising.Inadditiontotheaboveineciencies,duplicatedworkamongprocessorscanoccurwhenthesearchspaceisagraph.1Thisproblemcanbetackledbyusingecientduplicatepruningtechniques[9,17,18].However,sincethefocusofthispaperisonloadbalancingstrategies,wewillrestrictour1Weusetheterm\graphmainlytodenotegraphsthatarenottrees,butsometimesweuseitmoregenerallytomeantreesaswell|thiswillbeclearfromthecontext.2attentiontotreesearchspacessothatperformancecomparisonofparallelA*algorithmsemployingdierentloadbalancingmethodsreectstheef-fectivenessofthesealgorithmsinachievingloadbalanc

1 / 36
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功