HighPerformanceComputingwithPOWER7--IBMPower755ServerGaoZhiQiangOutlineHPCandMarketOverviewPower755(Processorcore)OverviewPower755HPCIndustryBenchmarksPower755HPCApplicationBenchmarksPower755EnergyEfficiencyPower755ClustersandHPCSoftwareStackPower755SalesFocusHPCStructureandNodeTypesClusterBasicsWhatisCluster?将多台计算机组织起来进行协同工作来模拟一台功能更强大的计算机,这种技术称为集群技术。Types高可用性集群技术(HighAvailabilityCluster)高性能计算集群技术(HighPerformanceComputingCluster)高可扩展性集群技术(HorizontalScalingCluster)HighAvailabilityClusterHAcluster以减少宕机时间为目的,当一个节点出现故障时,运行在出故障的节点上的应用程序就会转移到另外的没有出现故障的服务器上。HAcluster可可可可可可分分分分分分为为为为为为以以以以以以下下下下下下几几几几几几类类类类类类:主/从(Active/passive)备份接点处于空闲状态,提供最佳性能保证。主/主(Active/active)所有接点处于活动状态,提供最大资源利用律。混合型(Hybrid)所有接点处于活动状态,只提供关键任务的故障转移。HighPerformanceComputingClusterHPCCluster以计算为目的,通过多个普通节点的并行计算(ParellelProcessing)实现强大的计算功能。HPCCluster以以以以以以对对对对对对称称称称称称多多多多多多处处处处处处理理理理理理((((((SMP))))))大大大大大大规规规规规规模模模模模模并并并并并并行行行行行行处处处处处处理理理理理理((((((MPP))))))和和和和和和向向向向向向量量量量量量机机机机机机架架架架架架构构构构构构为为为为为为主主主主主主SMP--对称多处理(SymmetricMutiProcessing)由共享存储,总线数据通道及操作系统的多个处理器以隐含方式(内存共享)进行消息传送,实现高性能计算。目前最大SMP系统处理器数量为32个。MPP--大规模并行处理(MassiveParallelProcessor)连接多个处理/存储(包括各自操作系统及应用拷贝)进行大规模运算。通过对多个节点的功能划分实现系统整合.向量处理器--在向量处理器中,CPU被优化以很好地处理向量数组的运算,因此系统性能很高,在20世纪80到90年代早期占有HPC系统架构统治地位Weather&EnvironmentalmodelsFocusApplicationAreasforIBMPowerHPCSystemsMedicalandLifeSciencesBasicResearchEngineering/ScientificandemergingtechnologiesPredictingthepathofthenexthurricaneModelingtheHumanBrainDiscoveringthesecretsoftheUniverseTomorrow’stechnologiestodayBRINGINGOURSTRENGTHTOBEARTheTechnicalComputingServerMarketreturnstogrowthin2010withtheSupercomputerSegmentforsystemssellingforover$500Kbeingthelargest,fastestgrowingsegment.2.51.71.81.92.02.12.23.22.52.72.93.13.23.51.41.11.11.21.21.31.42.73.43.63.94.14.44.6$0$2$4$6$8$10$12$142008200920102011201220132014WWOpportunity($B)Workgroup($100K)Departmental($100Kto$250K)Divisional($250Kto$500K)Supercomputer($500K)$9.7$8.6$9.2$9.8$10.4$11.0Source:IDC,March20106.3%CAGR6.5%5.0%6.7%5.9%$11.7WorldwideTechnicalComputerServerMarketOpportunityR4QHPCServerShare0%5%10%15%20%25%30%35%40%Q405Q106Q206Q306Q406Q107Q207Q307Q407Q108Q208Q308Q408Q109Q209Q309Q409Source:IDCHPCQuarterlyTrackerIBMHPDellSunCraySGIOtherRolling4QuarterTrendQ409ResultsIBMnowleadsHPintheHPCServermarketin4Q09,with31.2%sharewhichwasup+0.2pts,onarevenuegrowthof30.6%QTQ.IBMalsoleadsforfullyear2009with29.3%sharewhichis+2.8ptsYTY.Source:IDCHPCQuarterTracker.IDC:WW($M)RevenueRevenueGrowthRevenueShareShareChangeHPCServersQ409YTYQTQQ408Q309Q409YTYQTQIBM$80620.6%30.6%26.8%30.9%31.2%4.4Pts0.2PtsHP$715-19.6%27.8%35.6%28.0%27.6%-8Pts-0.4PtsDell$293-20.4%3.2%14.8%14.3%11.3%-3.4Pts-2.9PtsCray$20851.6%542.5%5.5%1.6%8.0%2.5Pts6.4PtsSun$94-6.5%-15.0%4.0%5.5%3.6%-0.4Pts-1.9PtsSGI$67140.3%108.9%1.1%1.6%2.6%1.5Pts1PtsOthers$40332.6%12.5%12.2%18.0%15.6%3.4Pts-2.4PtsTotal$2,5863.6%29.7%100%100%100%0Pts0PtsOverallHPCMarketPower755OverviewPOWER7ProcessorChipCores:8(4/6coreoptions)567mm2Technology:–45nmlithography,Cu,SOI,eDRAMTransistors:1.2B–Equivalentfunctionof2.7B–eDRAMefficiencyEightprocessorcores–12executionunitspercore–4WaySMTpercore–upto4threadspercore–32Threadsperchip–L1:32KBICache/32KBDCache–L2:256KBpercore–L3:Shared32MBonchipeDRAMDualDDR3MemoryControllers–100GB/sMemorybandwidthperchipScalabilityupto32Sockets–360GB/sSMPbandwidth/chip–20,000coherentoperationsinflightBinaryCompatibilitywithPOWER6POWER7COREL2CachePOWER7COREL2CachePOWER7COREL2CachePOWER7COREL2CachePOWER7COREL2CachePOWER7COREL2CachePOWER7COREL2CachePOWER7COREL2CacheL3CacheandChipInterconnectMC1MC0LocalSMPLinksRemoteSMP&I/OLinksFASTL3REGIONPOWER7:Core64-bitPowerPCarchitecturev2.07ExecutionUnits•2FixedPointUnits•2LoadStoreUnits•4DoublePrecisionFloatingPointUnits•1Branch•1ConditionRegister•1VectorUnit•1DecimalFloatingPointUnit•UnitsincludedistributedRecoveryFunctionOutofOrderExecutionL2CacheIFUCRU/BRUISUDFUFXUVSXFPULSUPOWER7continuestosupportVMX/ExtendsSIMDsupportwithVSX–2VSXunitsthatcaneachhandle2Double-PrecisionFPinstructions–8FLOPSpercycles–VSXunitscanalsohandle4SinglePrecisioninstructionspercycle–VSXinstructionsetsupportforvectorandscalarinstructionsPOWER7Vector/ScalarUnit64EntryVector/ScalarRegisterFile–128-bitwideregisters–Usedfor32b/64bscalaraswellas4x32B/2x64bSIMDinstructionsFourfloating-pointexecutionunits–EachFPunitcapableofsingleordoubleprecision–EachFPunitcancompleteamultiply-addinstructionpercycle(2FLOPS)–Maximumthroughput2FLOPSx4=8FLOPS/cycle–EachFPUcanalsoexecuteFPdivideandsqrtFloatingPointOperationsareANSI/IEEEstandard754-1985CompliantPerformancepernode•2XimprovementinSingleInstructionMultipleData(SIMD)acceleration−FullAltiVec™(VMX)instructionsetsupport−ExtendedVSXinstructionset•Upto8.4TFlopsperRack(10nodesperRack)•ClusterInterconnect−2-PortInfiniBand12XDDR•IBMHPCsoftwarestack•Boostfrequencyforbetterperformance&performance/wattIBMPower755IBMPower755(8236-E8C)4-socket,4Userver8-corePOWER7processors32-core3.3GHzconfigurationUpto256GBofmemoryUpto64clusterednodesEnergyStar-qualifiedGA:February