AComparisonofLocalandGangSchedulingonaBeowulfClusterPeterStrazdinsandJohnUhlmann,DepartmentofComputerScience,AustraliunNationalUniversityActonACT0200AUSTRALIA{Peter.Strazdins@cs.anu.edu.au,John.Uhlmann@anu.edu.au]AbstractGangSchedulingandrelatedtechniquesarewidelybe-lievedtobenecessaryforefficientjobschedulingondis-tributedmemoryparallelcomputers.Thisishecausetheyminimizecontextswitchingoverheadsandpermittheparal-leljobcurrentlyrunningtoprogressatthefastestpossiblerate.Howevecinthecaseofclustercomputers,andpanicu-larlythosewithCOTSnetworks,thesebenefitscanbeout-weighedinthemultiplejobtime-sharingcontextbythelosstheabilitytoutilizetheCPUforotherjobswhenthecurrent;ohiswaitingformessages.ExperimentsonaLinuBeowulfclusterwithIOOMbfastEthernetswitchesaremndecomparingtheScorebuddy-basedgangschedulingwithlocalscheduling(pmvidedbytheLinu2.4kernelwithMPIimplementedoverTCP/IP).Resultsforcommunication-intensivenumericalapplicationson16nodesrevealthatgangschedulingresultsin'slow-downs'uptoafactoroftwogreaterfor8simultaneousjobs.ThisphenomenonisnotduetoanydeficienciesinScorebutduetotherelarivecostsofcontextswitchingversusmessageoverhead,andweexpectsimilarresultswillholdforanygangschedulingimplementation.Aperformanceanalysisoflocalschedulingindicatesthatcachepollutionduetocontextswitchingismoresignijicantthanthedirectcontextswitchingoverheadontheapplica-tionsstudied.Whenthisistakenintoaccount,localschedul-ingbehaviourcomesclosetoachievingidealslowdownsforfiner-grainedcomputationssuchasLinpack.Theperfor-mancemodelsalsoindicatethatsimilartrendsaretobeexpectedforclusterswithfasternetworks.Keywords:parallelcomputing,jobscheduling,gangscheduling,clustercomputing1IntroductionGangscheduling[6],oroneofitsmanyvariants[II,16,191,iswidelybelievedtobenecessaryforoptimaldis-tributedmemorymultiprocessorresourceutilizationwithtime-sharedjobswithmedium-tofine-grainedcommuni-cationpatterns.Thesetechniquesrequirethat,oneachnode,theprocessesrelatedtoaparticularparalleltask(gangs)arescheduledtorunsimultaneously.Gangsarepackedtogetherandarepreemptedandrescheduledasaunit.Aswellasreducingcontextswitchingoverheads.thisensuresmessagesassoci-atedwithaparalleljobarereceivedassoonaspossiblearterdelively,andhencepermitsthatparalleljobtoprogressop-timally.GangScheduling(alsoknownasexplicitco-scheduling)normallyrequiresconsiderableinfrastructuretobeaddedtocluster.ExistingsystemsincludeScore[IO]andParPar[9].Analternativeisimplicitco-scheduling,wherethepro-cessesinthegangarenotexplicitlycoscheduledbutshouldcoschedulethemselvesthroughtheirbehaviourandlocalschedulingpolicies,e.g.ifamessagearrivesforanon-currentjob,aCPUmightpreemptthecurrentjobandsched-ulethenewone[12,3].Thesearepotentiallylesscomplextoimplement.Inthecontextofmorecoarse-grainedjobs(withinfrequentcommunicationphases)onclusters,ithasbeensuggestedthatcc-schedulingcanoccurwithoutanymeasuresbeingtaken,simplyduetothefactthatprocessesgethigherprioritywhencommunicating[4].However,thesimplesttechniqueofalltoimplementis(uncoordinated)localscheduling,wherenomeasuresaretakentocoscheduleprocessesinthegang.Inthiscase,gangsmaynotbecoscheduledatall.ThiswouldbethedefaultschedulingpolicyonarawLinuxcluster.Issueswiththechoiceofthesetechniquesincludestheircomplexitytoimplement(and,inpractice,todeploy)onanactualcluster,andtheirpotentialforperformance(e.g.inoveralljobthroughputandfairness).Thispapermakesacomparisonbetweenthetwoextremes(gangandlocal0-7803-8694-9/04/$20.0002004IEEE55CLUSTER2004scheduling)onaCOTSBeowulfcluster[2].Themainoriginalcontributionsofthispaperareasfol-lows.Itgivesadirectcomparisonofgangandlocalschedul-ingonaBeowulf-styleclustercomputer.Italsousesrealap-plicationstomakethiscomparison,anddemonstrateswhytheuscoftheseisimportantinevaluatingschedulingpoli-cies.ConclusionsthatcanbedrawnfromtheresultsofthepaperdonotseemtobereHectedinthecurrentliter-ature.Finally,italsoprovidesaperformanceanalysisforlocalscheduling,andproposesandevaluatesanoptimisticperformancemodelforit.Thispaperisorganizedasfollows.Relatedworkisde-scribedinSection2.Section3describestheBunyipBeowulfclusterusedinourexperiments,andSection4describestheScoreclustermanagementsystem.Ourexperimentalsetupandacomparisonofgangandlocalschedulingisgivenusing(non-synthetic)parallelapplicationsinSection5.Theper-formanceoflocalschedulingisfurtheranalyscdinSection6.ConclusionsaregiveninSection7.2RelatedworkThestudiesestablishingthebeliefthatgang(orsimilar)schedulingpoliciesareneededforefficienttime-sharinginmultiprocessorsbymakingdirectcomparisonsseemtobeveryfew.Theonlyoneknowntousthatmakesadirecteval-uationofgangscheduling(comparedwithlocalscheduling)is[6];thisusesasyntheticprogram(whichrepeatedlyexe-cutesacomputationloopfollowedbyabarrier)onanearlyNUMAsharedmemoryarchitecture.Butsharedmemoryparallelprocessorsandolder-styleMassivelyParallelPro-cessors(MPPs)haverelativelylowercommunicationover-headsthanformodemclusters.Furthermore,theassump-tionunderlyinggangscheduling:thatanymessagesreceivedaredestinedforthecurrentlyexecutingprocess,simplifies(