I.J.InformationTechnologyandComputerScience,2015,03,1-11PublishedOnlineFebruary2015inMECS()DOI:10.5815/ijitcs.2015.03.01Copyright©2015MECSI.J.InformationTechnologyandComputerScience,2015,03,1-11ADistributedFaultToleranceGlobalCoordinatorElectionAlgorithminUnreliableHighTrafficDistributedSystemsDanialRahdariDepartmentofComputerEngineering,ScienceandResearchBranch,IslamicAzadUniversity,IranE-mail:d.rahdary@srbiau.ac.irAmirMasoudRahmani,NiushaAboutalebyDepartmentofComputerEngineering,ScienceandResearchBranch,IslamicAzadUniversity,IranE-mail:rahmani@sr.iau.ac.ir;n.aboutaleby@gmail.comAliSheidaeiKarambastiNo15,GhafariAlley,MashahirSt.,GhaemMaghameFarahaniAve.,Tehran,IranE-mail:a.sheidaei@etadbir.comAbstract—Distributedsystemsconsistofseveralmanagementsiteswhichhavedifferentresourcesharinglevels.Resourcescanbesharedamonginnersiteandoutersiteprocessesatfirstandsecondlevelrespectively.Globalcoordinatorshouldexistinordertocoordinateaccesstomultisite’ssharedresources.Moreover;someothercoordinatorsshouldmanageaccesstoinnersite’ssharedresourcessothatexertingappropriatecoordinatorelectionalgorithmsineachleveliscrucialtoachievemostefficientsystem.Inthispaperahierarchicaldistributedelectionalgorithmisproposedwhicheliminatessinglepointoffailureofelectionlauncher.Meanwhiletrafficisappliedtonetworkatdifferenttimesandthenumberofelectionmessagesisextremelydecreasedaswellwhichappliesmoreefficiencyespeciallyinhightrafficnetworks.Astandbysystembetweencoordinatorsandtheirfirstalternativeisconsideredtoinductlesswaittimetoprocesseswhichwanttocommunicatewithcoordinator.IndexTerms—DistributedAlgorithm,CoordinatorElection,FaultTolerance,CloudComputing,HotStandby,Unreliability,·GlobalCoordinatorI.INTRODUCTIONProcessesandsystemscollaborationaregrowingmoreandmorebyemergingnewtechnologiesandconceptsintheworldofdistributedsystemswhichgivecommunicationthemostcrucialroleofthesesystems.ItisclearthatProcessescommunicatewitheachotherthroughoperatingsystemandTransitionlayer[1].Thefunctionsofsomeprocessesduringexecutionprocedurearedependentontheothersstatebecauseofthesamesharedresources.Asanexampleifaprocesswantstoaccessanexclusivesharedresource,itshouldcheckovertoensuretobetheonlyonewhichusesthecriticalregiontofulfillmutualexclusionproperty.Therearelotsofchallenges,fewofwhicharelistedbelow,thatshouldbeconsideredduringthisprocess.Processesshouldaskalltheothersforpermissionbeforeaccessingthesharedresources.Thereforeasignificantnumberofmessagesshouldbepassed.Duringpermissionaskingprocess,theprocessesresponseisn’tguaranteed.Itusuallyhappensbecauseoftheirfailureor100%usageofsystem’sprocessor.Thereforetheycannotaccesstheresourcebecauseoflackofpermission.Theexistenceofcentralcontrollingprocessisnecessarytohandlethesesituations.Singlepointoffailureisclearlythesothatifitcrashes,newelectionshouldbelaunchedtosetnewprocessascoordinator.Leaderelectionalgorithmswhichareusedforelectingcoordinatoranditsalternativesareusefulinmanyvariousareas.Theyareusedindistributedsystemsforloadbalancingandkeepingresourcereplicasconsistency[2].Whenacoordinatorcrashes,somewaittimeisappliedtoprocessessupposedtocommunicatetocoordinator.Thistimeisaffectedbythepowerofanelectionalgorithmwhichgoesmorebyhigherspeedandlessbandwidthusage.Hence,morepowerfulalgorithmsapplylesswaittimeandappreciatesystemperformance.Meanwhile,thereshouldbeatrade-offbetweenthesafetyandpowerofanelectionalgorithminthesystem,becauseifweneedmorepowerfulalgorithms,wemustattendtosafetylesssinceitneedstoexchangemoreandmoremessagesbetweenprocessorswhichcausetheweaknessofthealgorithm.Thisproblemisalsoknownaselectionproblem[3].Coordinatorsplayahugeroleinspreadareassuchasvideoconferencing,multiplayergames,recognizingprocessororcomputerfailurefordatatransferring,loadbalancing,andmanyotherswhichshowtheimportanceofthecoordinatorsoncemoreandconvincetheresearcherstogivemoreattentiontoit.2ADistributedFaultToleranceGlobalCoordinatorElectionAlgorithminUnreliableHighTrafficDistributedSystemsCopyright©2015MECSI.J.InformationTechnologyandComputerScience,2015,03,1-11II.RELATEDWORKThisareawelcomedwiderangeofalgorithmswiththepassingoftime.Algorithmsdifferinspecificationssuchasnetworktopologies,thekindofcommunicationsbetweenprocesses,andalsosettingthenameoftheprocesses.Bully[4]andRing[5]aretwoclassiconesthatarereferredtoinmanypapers.Bullyalgorithmwhosenetworktopologyisusedinthispaperlauncheselectionwhenprocessesfindcoordinatorcrashed.Inthefirststepofelection,theseprocessessendElectionmessagestotheprocesseswithanupperprocessnumberthanthemselves.ThenwhenprocessesreceiveElectionlabeledmessage,theywillrespondbyanOKmessage.However,ifnoprocessresponds,thesenderwouldintroduceitselfasthenewcoordinatortothesystembysendingaCoordinatormessagetothem.IfprocessP2repliesthesender,P2willsendanotherElectionmessageinthesystembyusingthepreviousprocedure.ThesestepscontinueuntilnootherprocesswithanuppernumberthanthesenderprocessexistsoranyotherOKmessagesfromtheuppernumberprocessesdidn’treceivetoinformer.Authorin[6]proposedaunif