Under consideration for publication in J. Function

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

UnderconsiderationforpublicationinJ.FunctionalProgramming1CompilationofaSpecializedFunctionalLanguageforMassivelyParallelComputersPascalFRADETandJulienMALLETIRISA,CampusdeBeaulieu,35042Rennes,France(e-mail:ffradet,malletg@irisa.fr)AbstractWeproposeaparallelspecializedlanguagethatensuresportableandcost-predictableim-plementationsonparallelcomputers.Thelanguageisbasicallya rst-order,recursion-less,strictfunctionallanguageequippedwithacollectionofhigher-orderfunctionsorskeletons.Theseskeletonsapplyon(nested)vectorsandcanbegroupedinfourclasses:computation,reorganization,communication,andmaskskeletons.Thecompilationprocessisdescribedasaseriesoftransformationsandanalysesleadingtospmd-likefunctionalprogramswhichcanbedirectlytranslatedintorealparallelcode.Thelanguagerestrictionsenforceapro-grammingdisciplinewhosebene tistoallowastatic,symbolic,andaccuratecostanalysis.Theparallelcosttakesintoaccountbothloadbalancingandcommunications,andcanbestaticallyevaluatedevenwhentheactualsizeofvectorsorthenumberofprocessorsareunknown.Itisusedtoautomaticallyselectthebestdatadistributionamongasetofstandarddistributions.Interestingly,thisworkcanbeseenasacrossfertilizationbe-tweentechniquesdevelopedwithintheFortranparallelization,skeleton,andfunctionalprogrammingcommunities.1IntroductionAgoodparallelprogrammingmodelmustbeportableandcostpredictable.Gen-eralpurposelanguagessuchasFortranachieveportabilitybutcostestimationsareoftenveryapproximate.Aprecisecostanalysisisespeciallyimportantinthiscontextsincethegoalofparallelizationiseciencyanditsimpactontheoverallcostis,atbest,adivisionbyaconstant.So,ordersofmagnitudeormaximumcomplexitiesareinsucienttoguideparallelimplementationchoices.Theapproachdescribedinthispaperisbasedonarestrictedpurefunctionallanguagethatisportableandallowsustodesignanautomaticandaccuratecostanalysis.Thelanguagerestrictionscanbeseenasenforcingaprogrammingdisci-plinethatensuresapredictableperformanceonthetargetparallelcomputer(therewillbeno\performancebugs).Generalrecursionandconditionalsarereplacedbyskeletonsthatencapsulatecontrolanddataflowinthesenseof(Cole,1988)or(Darlingtonetal.,1993).Theskeletons,whichacton(potentiallynested)vec-tors,canbegroupedintofourclasses:thecomputationskeletons(classicaldataparallelfunctions),thereorganizationskeletons(creatingandstructuringvectors),thecommunicationskeletons(datamotionovervectors),andthemaskskeletons2P.FradetandJ.Mallet(conditionaldataparallelfunctions).Theskeletonsanddatastructureshavebeenchosenwithscienti ccomputinginmind.Matrixcomputationsandnestedforloopsareeasytodescribeandmanystandardnumericalalgorithmshavebeenex-pressedeasilyinourkernellanguage.Concerningthetargetparallelarchitecture,weaimedatmimd(MultipleInstructionsMultipleData)computerswithsharedordistributedmemorybutsimd(SingleInstructionMultipleData)computerscouldbeaccommodatedaswell.Thecompilationprocessisdescribedasaseriesofprogramtransformationslead-ingtospmd-like(SingleProgramMultipleData)functionalprogramswhichcanbedirectlytranslatedintotrueparallelcode.Eachcompilationsteptransformsaskeleton-basedlanguageintoanotherclosertoacodewithexplicitparallelism.Themaincompilationstepsconsistofasizeanalysis,anupdate-in-placeanalysis,atransformationmakingallcommunicationsexplicit,transformationsimplementingdatadistribution,andasymboliccodeanalysis.Notethatusingafunctionallan-guageavoidsthedependenceanalysisneededbyimperativelanguagestodeterminethecomputationswhichcanbeexecutedinparallel.Akeytaskinthecompilationofdataparallelismisthechoiceofthedistributionsinceitdeterminesboththeloadbalancingandthecommunicationsbetweenprocessors.Inourapproach,therestrictionsimposedbythesourcelanguagemakeitpossibletochooseautomati-callythegloballybestdatadistributionamongasetofstandarddistributions.Thischoicereliesonthecostanalysiswhichevaluatesaccurateparallelexecutiontimes.Oneofthemainchallengesofourworkwastotacklethisproblemwithoutknowingtheactualsizeofvectorsorthenumberofprocessors.Inotherwords,theanalysisoughttobesymbolic.Thecontributionsofthisworkarebothtechnicalandmethodological.Thecompilationisecientandoriginal:itintegratesverydi erentanalysistechniquesinasequenceofprogramtransformations.Maybethemostimpor-tantcontributionliesinthede nitionofthespecializedlanguage.Weestablishthenecessarylanguagerestrictionstoensuretheaccuracyofasymboliccostanalysis.Thecompilationtakesadvantageoftheanalysistochoosethebestparallelimplementationamongasetofstandardimplementations.Theanal-ysesforFortrannestedloops(Gupta&Banerjee,1992;Feautrier,1994)donotevaluatepreciselythecostofthecommunications.Inparticular,collectivecommunications(di usion,translation...)cannotbeautomaticallydetectedinFortranprograms.Inourapproach,theanalysistakesintoaccountbothloadbalancingandcommunicationcostssincethecollectivecommunicationsappearexplicitlythroughcommunicationskeletons.Mostexistingskeletonsimplementations(Blellochetal.,1994;Darlingtonetal.,1995)arebasedon xedimplementationsforeachskeleton.Thislocalviewmayleadtoredis-tributingdatabeforeeachskeletonapplication

1 / 46
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功