Basic Communication Operations

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

BasicCommunicationOperationsAnanthGrama,AnshulGupta,GeorgeKarypis,andVipinKumarToaccompanythetext``IntroductiontoParallelComputing'',AddisonWesley,2003TopicOverview•One-to-AllBroadcastandAll-to-OneReduction•All-to-AllBroadcastandReduction•All-ReduceandPrefix-SumOperations•ScatterandGather•All-to-AllPersonalizedCommunication•CircularShift•ImprovingtheSpeedofSomeCommunicationOperationsBasicCommunicationOperations:Introduction•Manyinteractionsinpracticalparallelprogramsoccurinwell-definedpatternsinvolvinggroupsofprocessors.•Efficientimplementationsoftheseoperationscanimproveperformance,reducedevelopmenteffortandcost,andimprovesoftwarequality.•Efficientimplementationsmustleverageunderlyingarchitecture.Forthisreason,werefertospecificarchitectureshere.•Weselectadescriptivesetofarchitecturestoillustratetheprocessofalgorithmdesign.BasicCommunicationOperations:Introduction•Groupcommunicationoperationsarebuiltusingpoint-to-pointmessagingprimitives.•Recallfromourdiscussionofarchitecturesthatcommunicatingamessageofsizemoveranuncongestednetworktakestimets+tmw.•Weusethisasthebasisforouranalyses.Wherenecessary,wetakecongestionintoaccountexplicitlybyscalingthetwterm.•Weassumethatthenetworkisbidirectionalandthatcommunicationissingle-ported.One-to-AllBroadcastandAll-to-OneReduction•Oneprocessorhasapieceofdata(ofsizem)itneedstosendtoeveryone.•Thedualofone-to-allbroadcastisall-to-onereduction.•Inall-to-onereduction,eachprocessorhasmunitsofdata.Thesedataitemsmustbecombinedpiece-wise(usingsomeassociativeoperator,suchasadditionormin),andtheresultmadeavailableatatargetprocessor.One-to-AllBroadcastandAll-to-OneReductionOne-to-allbroadcastandall-to-onereductionamongprocessors.One-to-AllBroadcastandAll-to-OneReductiononRings•Simplestwayistosendp-1messagesfromthesourcetotheotherp-1processors-thisisnotveryefficient.•Userecursivedoubling:sourcesendsamessagetoaselectedprocessor.Wenowhavetwoindependentproblemsderinedoverhalvesofmachines.•Reductioncanbeperformedinanidenticalfashionbyinvertingtheprocess.One-to-AllBroadcastOne-to-allbroadcastonaneight-nodering.Node0isthesourceofthebroadcast.Eachmessagetransferstepisshownbyanumbered,dottedarrowfromthesourceofthemessagetoitsdestination.Thenumberonanarrowindicatesthetimestepduringwhichthemessageistransferred.All-to-OneReductionReductiononaneight-noderingwithnode0asthedestinationofthereduction.BroadcastandReduction:ExampleConsidertheproblemofmultiplyingamatrixwithavector.•Thenxnmatrixisassignedtoannxn(virtual)processorgrid.Thevectorisassumedtobeonthefirstrowofprocessors.•Thefirststepoftheproductrequiresaone-to-allbroadcastofthevectorelementalongthecorrespondingcolumnofprocessors.Thiscanbedoneconcurrentlyforallncolumns.•Theprocessorscomputelocalproductofthevectorelementandthelocalmatrixentry.•Inthefinalstep,theresultsoftheseproductsareaccumulatedtothefirstrowusingnconcurrentall-to-onereductionoperationsalongthecolumns(usingthesumoperation).BroadcastandReduction:Matrix-VectorMultiplicationExampleOne-to-allbroadcastandall-to-onereductioninthemultiplicationofa4x4matrixwitha4x1vector.BroadcastandReductiononaMesh•Wecanvieweachrowandcolumnofasquaremeshofpnodesasalineararrayof√pnodes.•Broadcastandreductionoperationscanbeperformedintwosteps-thefirststepdoestheoperationalongarowandthesecondstepalongeachcolumnconcurrently.•Thisprocessgeneralizestohigherdimensionsaswell.BroadcastandReductiononaMesh:ExampleOne-to-allbroadcastona16-nodemesh.BroadcastandReductiononaHypercube•Ahypercubewith2dnodescanberegardedasad-dimensionalmeshwithtwonodesineachdimension.•Themeshalgorithmcanbegeneralizedtoahypercubeandtheoperationiscarriedoutind(=logp)steps.BroadcastandReductiononaHypercube:ExampleOne-to-allbroadcastonathree-dimensionalhypercube.Thebinaryrepresentationsofnodelabelsareshowninparentheses.BroadcastandReductiononaBalancedBinaryTree•Considerabinarytreeinwhichprocessorsare(logically)attheleavesandinternalnodesareroutingnodes.•Assumethatsourceprocessoristherootofthistree.Inthefirststep,thesourcesendsthedatatotherightchild(assumingthesourceisalsotheleftchild).Theproblemhasnowbeendecomposedintotwoproblemswithhalfthenumberofprocessors.BroadcastandReductiononaBalancedBinaryTreeOne-to-allbroadcastonaneight-nodetree.BroadcastandReductionAlgorithms•Allofthealgorithmsdescribedaboveareadaptationsofthesamealgorithmictemplate.•Weillustratethealgorithmforahypercube,butthealgorithm,ashasbeenseen,canbeadaptedtootherarchitectures.•Thehypercubehas2dnodesandmy_idisthelabelforanode.•Xisthemessagetobebroadcast,whichinitiallyresidesatthesourcenode0.BroadcastandReductionAlgorithmsOne-to-allbroadcastofamessageXfromsourceonahypercube.BroadcastandReductionAlgorithmsSingle-nodeaccumulationonad-dimensionalhypercube.EachnodecontributesamessageXcontainingmwords,andnode0isthedestination.CostAnalysis•Thebroadcastorreductionprocedureinvolveslogppoint-to-pointsimplemessagetransfers,eachatatimecostofts+twm.•Thetotaltimeisthereforegivenby:All-to-AllBroadcastandReduction•Generalizationofbroadcastinwhicheachprocessoristhesourceaswellasdestination.•Apr

1 / 70
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功