3SoftwareDevelopmentforParallelandMulti-CoreProcessingKennR.LueckeTheBoeingCompanyUSA1.IntroductionTheembeddedsoftwareindustrywantsmicroprocessorswithincreasedcomputingfunctionalitythatmaintainsorreducesspace,weight,andpower(SWaP).Singlecoreprocessorswerethekeyembeddedindustrysolutionbetween1980and2000whenlargeperformanceincreaseswerebeingachievedonayearlybasisandwerefulfillingtheprophecyofMoore’sLaw.Moore’sLawstatesthat“thenumberoftransistorsthatcanbeplacedinexpensivelyonanintegratedcircuitdoublesapproximatelyeverytwoyears.”1Withtheincreasedtransistors,camemicroprocessorswithgreatercomputingthroughputwhilespace,weightandpowerweredecreasing.However,this‘freelunch’didnotlastforever.2Theadditionalpowerrequiredforgreaterperformanceimprovementsbecametoogreatstartingin2000.Hence,singlecoremicroprocessorsarenolongeranoptimalsolution.Although,distributedandparallelprogrammingsolutionsprovidegreaterthroughput,thesesolutionsunfortunatelyincreaseSWaP.Themostlikelysolutionismulti-coreprocessorswhichhavebeenintroducedintotheembeddedprocessormarkets.Mostmicroprocessormanufacturershaveconvertedfromdevelopingsinglecoreprocessorstomulti-coreprocessors.Withthisconversion,theprophecyofMoore’sLawisstillbeingachieved.SeeFigure1andnoticehowthesinglecoreprocessorsarenotkeepingpacewiththemulti-coreprocessors.Multi-coreprocessorsincreasethroughputwhilemaintainingorreducingSWaPforembeddedenvironmentswhichmakethemagoodhardwaresolutionfortheaerospaceindustry.Intel,inparticular,hasestimatedthatby2011,95%ofalltheprocessorsitshipswillcontainamulti-coredesign.However,thesoftwaremarketshowslessoptimismwithmulti-coreprocessors.Forinstance,only40%ofsoftwarevendorsthoughttheirtoolswouldbereadyformulti-coreprocessingby2011.Thereasonsforsoftwareengineering’slackofexcitementwithmulti-coreprocessorsincludethefollowingdrawbacks:Lackandimmaturityofmulti-corespecificdevelopmentanddebugsoftwaretools.Lackofmulti-coreprocessorstandards.Lackandimmaturityofmulti-coreenabledsystemsoftware.1(March,2005).“Thefreelunchisover.Afundamentalturntowardconcurrencyinsoftware,”Dr.Dobb’sJournal,Volume30,Number3.–HighPerformanceSystems,ApplicationsandProjects36Lackofparallelprogrammingexperiencebythesoftwarecommunity.Lackofparallelprogrammingmodelstosupportthesemulti-coreprocessors.Anabundanceofdifferentiatedmulti-coreprocessorsfrommultiplesuppliers.Greaterdifferentiationwithinexperiencecanbeproblematicforsoftwaredevelopersconvertingapplicationsformulti-coreprocessors.Fig.1.ProcessorTransistorCountsandMoore’sLaw3.TheseproblemsledChuckMoore,aSeniorFellowatAMD,tostate“TomakeeffectiveuseofMulti-corehardwaretoday,youneedaPhDincomputerscience.”4Therefore,multi-coresoftwaredevelopmenthasfallenbehindmulti-corehardwaredevelopment.Thischapterwillprovideinformationonthecurrentbesttechnologies,tools,methodologies,programminglanguages,models,andframeworksforsoftwaredevelopmentonmulti-coreprocessors.Wheredifferentsoftwaredevelopmentoptionsexist,comparisonsandrecommendationswillbeprovidedtothereader.2.MulticoredefinitionPreviousmultiprocessing,asopposedtomulti-coreprocessing,solutions,suchasparallelanddistributedprogramming,involvedtwoormoreprocessors,whichdoubled,tripled,or3Fittes,Dale,(October30,2009)UsingMulticoreProcessorsinEmbeddedSystems–Part1,EETimes.4Moore,Chuck,(May12,2008)“SolvingtheMulti-coreProgrammingProblem”,Dr.DobbsJournal.(SWaP)consumedandheatgeneratedbytheprocessingsystem.Thesesolutionscouldcompriselargenetworksleadingtodatalatenciesbetweenprocessingcomponents.However,multi-coreprocessorsplacemultipleprocessingcoresonasinglechiptoincreaseprocessingpowerwithoutnoticeablyincreasingthesystem’sSWaPandheatdissipation.Also,withmultiplecoresonasinglechipthedatalatenciesofdistributedprogrammingaremostlynegated.Withmulti-coreprocessing,thecomputerindustrycontinuespushingtheperformance/powerenvelopethroughparallelprocessingratherthanincreasingtheprocessorclockspeed.Forthemostpart,serialcomputinghasbeenthestandardsoftwaredevelopmentmodel,withmultiplecoresonaprocessor,nowparallelcomputingisemergingasthenewstandardandveryfewprogrammersarewellversedinparallelcomputing.Amulti-coreprocessor,ingeneral,appearssimilartothedualcoreandquadcoreprocessorsdisplayedinFigure2.Inbothcases,eachcorehasanassociatedL1cachewhiletheL2cacheissharedbetweenallthecores.ForsystemswithL1,L2,andL3cache,normallytheL3cacheissharedbetweenallcores,eachcorehasitsownsegregatedL1cache,andtheL2cachemaybesharedbetweencoresorsegregatedL2cacheswillbedevotedtoeachcore.Fig.2.ExampleDualCore(left)andQuadCore(right)Multi-coreProcessors.3.MultiprocessingmodelsandframeworksTraditionally,thereweretwomultiprocessingmodels:AsymmetricMulti-Processing(AMP)andSymmetricMultiprocessing(SMP).Forhighlyintegratedprocessing,AMPdesignsincorporateseveralcoresonachipwitheachprocessorusingitsownL1cache,andallprocessorsshareacommonglo