高性能计算导论课程IntroductiontoHighPerformanceComputing课程内容什么是高性能计算应用对高性能计算的需求硬件对高性能计算的支持软件对高性能计算的支持参考文献黄铠,徐志伟著,可扩展并行计算技术,结构与编程,北京:机械工业出版社,2000陈国良著,并行计算-结构、算法、编程,北京:高等教育出版社,1999都志辉著,高性能计算并行编程技术-MPI并行程序设计,北京:清华大学出版社,2001Mpi:OpenMP:CUDA:OpenCL:High-performancecomputing(HPC)usessupercomputersandcomputerclusterstosolveadvancedcomputationproblems.Today,computersystemsapproachingtheteraflops-regionarecountedasHPC-computers.supercomputersandcomputerclusters→parallelcomputerAdvancedcomputationproblems→parallelprocessingalgorithmParallelComputerAparallelcomputerisasetofprocessorsthatareabletoworkcooperativelytosolveacomputationalproblem.Thisdefinitionisbroadenoughtoincludeparallelsupercomputersthathavehundredsorthousandsofprocessorsnetworksofworkstationsmultiple-processorworkstationsComputerwithmulti-coreprocessorandembeddedsystemsParallelcomputersofferthepotentialtoconcentratecomputationalresources---processors---onimportantcomputationalproblemsLargescaleofconcentratedcomputationalresourcesGridCloudParallelAlgorithmSpecifymultipleoperationsoneachstepAPPLICATIONSOFHPCDEMANDSONHPC应用对高性能计算的需求Areas气象、环境、海洋数值预报应用高能物理研究生命科学应用(基因序列搜寻与比对、新药研发)计算机辅助工程(工业生产)石油勘探图像渲染ReductionofexecutiontimeConcurrencyprovidedHIGHPERFORMANCECOMPUTERARCHITECTURE硬件对高性能计算的支持并行计算机模型物理机器模型物理(并行)机器模型SIMD计算机:Single-instructionmultiple-dataCPU+GPUMIMD计算机:multiple-instructionmultiple-dataSMPMPPClusterBladeserverPARALLELCOMPUTINGMODEL主要用于设计、分析并行算法抽象(并行)机器模型(abstractmachinemodel)PRAM模型(parallelrandom-accessmachine)多处理器之间的通信通过访问共享存储器实现BSP模型(bulksynchronousparallel)多处理器之间的通信通过网络传输实现PRAM模型ThecommunicationoverheadisignoredBSP模型Thecommunicationlatencyandsynchronousoverheadareconsidered.模型中的计算行为PRAM:ExclusivereadConcurrentreadExclusivewriteConcurrentwrite(unsafe)BSP:Computationsuperstepssynchronization成本函数PRAM:TEREWTCREWTCRCWBSP:TBSP=Pi+C+sLs:totalnumberofsuperstepsi..[0s-1]P:computationtimeC:communicationovertimeL:synchronizationoverheadPROGRAMMINGMODEL软件对高性能计算的支持软件对高性能计算的支持编程模型Aprogrammingmodelisacollectionofprogramabstractionsprovidingaprogrammerasimplifiedandtransparentviewofthecomputerhardware/softwaresystem.Parallelprogrammingmodelsarespecificallydesignedformultiprocessors,multicomputers,orvector/SIMDcomputers编程模型ProgrammingModel消息传递模型(messagepassing)MPI共享存储模型(sharedmemory)OpenMP数据并行模型(dataparallel)CUDA/OpenCLCPU-GPUarchitectureMapReduce编程模型PROGRAMSTRUCTURE程序结构SPMD:singleprogrammultipledataMPMD:multipleprogrammultipledataSIMD:singleinstructionmultipledataMaster/Workers衡量高性能系统性能的评价指标浮点运算的理论峰值(FLOPS)主频*总核心数*每个时钟周期浮点运算次数浮点运算的实测峰值(FLOPS)测试组件(Linpack)测试并行系统加速比加速比是衡量“并行收益”的重要指标Amdahl定律适用于固定计算规模的加速比性能描述Gustafson定律适用于可扩展问题Amdahl定律计算负载W(原(未改进)系统度量)串行分量Ws并行分量Wp并行部分的执行速度相对于原串行部分的加速比p串行分量比例f,并行分量比例1-f并行计算系统加速比:S=(Ws+Wp)/(Ws+Wp/p)=1/(f+(1-f)/p)p→∞,S→1/fGustafson定律计算负载W(原(未改进)系统度量)串行分量Ws并行分量p×Wp并行部分的执行速度相对于原串行部分的加速比p串行分量比例f,并行分量比例1-f并行计算系统加速比:S=(Ws+pWp)/(Ws+Wp)=f+p(1-f)f越低,p越大,S就越大