UnderconsiderationforpublicationinJ.FunctionalProgramming1CompilationofaSpecializedFunctionalLanguageforMassivelyParallelComputersPascalFRADETandJulienMALLETIRISA,CampusdeBeaulieu,35042Rennes,France(e-mail:ffradet,malletg@irisa.fr)AbstractWeproposeaparallelspecializedlanguagethatensuresportableandcost-predictableim-plementationsonparallelcomputers.Thelanguageisbasicallyarst-order,recursion-less,strictfunctionallanguageequippedwithacollectionofhigher-orderfunctionsorskeletons.Theseskeletonsapplyon(nested)vectorsandcanbegroupedinfourclasses:computation,reorganization,communication,andmaskskeletons.Thecompilationprocessisdescribedasaseriesoftransformationsandanalysesleadingtospmd-likefunctionalprogramswhichcanbedirectlytranslatedintorealparallelcode.Thelanguagerestrictionsenforceapro-grammingdisciplinewhosebenetistoallowastatic,symbolic,andaccuratecostanalysis.Theparallelcosttakesintoaccountbothloadbalancingandcommunications,andcanbestaticallyevaluatedevenwhentheactualsizeofvectorsorthenumberofprocessorsareunknown.Itisusedtoautomaticallyselectthebestdatadistributionamongasetofstandarddistributions.Interestingly,thisworkcanbeseenasacrossfertilizationbe-tweentechniquesdevelopedwithintheFortranparallelization,skeleton,andfunctionalprogrammingcommunities.1IntroductionAgoodparallelprogrammingmodelmustbeportableandcostpredictable.Gen-eralpurposelanguagessuchasFortranachieveportabilitybutcostestimationsareoftenveryapproximate.Aprecisecostanalysisisespeciallyimportantinthiscontextsincethegoalofparallelizationiseciencyanditsimpactontheoverallcostis,atbest,adivisionbyaconstant.So,ordersofmagnitudeormaximumcomplexitiesareinsucienttoguideparallelimplementationchoices.Theapproachdescribedinthispaperisbasedonarestrictedpurefunctionallanguagethatisportableandallowsustodesignanautomaticandaccuratecostanalysis.Thelanguagerestrictionscanbeseenasenforcingaprogrammingdisci-plinethatensuresapredictableperformanceonthetargetparallelcomputer(therewillbeno\performancebugs).Generalrecursionandconditionalsarereplacedbyskeletonsthatencapsulatecontrolanddataflowinthesenseof(Cole,1988)or(Darlingtonetal.,1993).Theskeletons,whichacton(potentiallynested)vec-tors,canbegroupedintofourclasses:thecomputationskeletons(classicaldataparallelfunctions),thereorganizationskeletons(creatingandstructuringvectors),thecommunicationskeletons(datamotionovervectors),andthemaskskeletons2P.FradetandJ.Mallet(conditionaldataparallelfunctions).Theskeletonsanddatastructureshavebeenchosenwithscienticcomputinginmind.Matrixcomputationsandnestedforloopsareeasytodescribeandmanystandardnumericalalgorithmshavebeenex-pressedeasilyinourkernellanguage.Concerningthetargetparallelarchitecture,weaimedatmimd(MultipleInstructionsMultipleData)computerswithsharedordistributedmemorybutsimd(SingleInstructionMultipleData)computerscouldbeaccommodatedaswell.Thecompilationprocessisdescribedasaseriesofprogramtransformationslead-ingtospmd-like(SingleProgramMultipleData)functionalprogramswhichcanbedirectlytranslatedintotrueparallelcode.Eachcompilationsteptransformsaskeleton-basedlanguageintoanotherclosertoacodewithexplicitparallelism.Themaincompilationstepsconsistofasizeanalysis,anupdate-in-placeanalysis,atransformationmakingallcommunicationsexplicit,transformationsimplementingdatadistribution,andasymboliccodeanalysis.Notethatusingafunctionallan-guageavoidsthedependenceanalysisneededbyimperativelanguagestodeterminethecomputationswhichcanbeexecutedinparallel.Akeytaskinthecompilationofdataparallelismisthechoiceofthedistributionsinceitdeterminesboththeloadbalancingandthecommunicationsbetweenprocessors.Inourapproach,therestrictionsimposedbythesourcelanguagemakeitpossibletochooseautomati-callythegloballybestdatadistributionamongasetofstandarddistributions.Thischoicereliesonthecostanalysiswhichevaluatesaccurateparallelexecutiontimes.Oneofthemainchallengesofourworkwastotacklethisproblemwithoutknowingtheactualsizeofvectorsorthenumberofprocessors.Inotherwords,theanalysisoughttobesymbolic.Thecontributionsofthisworkarebothtechnicalandmethodological.Thecompilationisecientandoriginal:itintegratesverydierentanalysistechniquesinasequenceofprogramtransformations.Maybethemostimpor-tantcontributionliesinthedenitionofthespecializedlanguage.Weestablishthenecessarylanguagerestrictionstoensuretheaccuracyofasymboliccostanalysis.Thecompilationtakesadvantageoftheanalysistochoosethebestparallelimplementationamongasetofstandardimplementations.Theanal-ysesforFortrannestedloops(Gupta&Banerjee,1992;Feautrier,1994)donotevaluatepreciselythecostofthecommunications.Inparticular,collectivecommunications(diusion,translation...)cannotbeautomaticallydetectedinFortranprograms.Inourapproach,theanalysistakesintoaccountbothloadbalancingandcommunicationcostssincethecollectivecommunicationsappearexplicitlythroughcommunicationskeletons.Mostexistingskeletonsimplementations(Blellochetal.,1994;Darlingtonetal.,1995)arebasedonxedimplementationsforeachskeleton.Thislocalviewmayleadtoredis-tributingdatabeforeeachskeletonapplication