WilliamStallingsComputerOrganizationandArchitecture8thEditionChapter18MulticoreComputers多核计算机HardwarePerformanceIssues硬件性能问题•Microprocessorshaveseenanexponentialincreaseinperformance•我们看到微处理器在性能上已有指数性增长—Improvedorganization组织结构的提升—Increasedclockfrequency时钟频率的提升•IncreaseinParallelism并行性的提升—Pipelining流水线(阶段数多,需要更多逻辑,内部连接和控制信号)—Superscalar超标量(流水线数量增加,需更多逻辑来管理冲突及分段指令资源)—Simultaneousmultithreading(SMT)并发式多线程(复杂性,限制了线程和流水线的数量)•Diminishingreturns收益递减—Morecomplexityrequiresmorelogic越复杂就需要越复杂的逻辑—Increasingchipareaforcoordinatingandsignaltransferlogic协调工作和信号传输逻辑需要芯片的面积越来越大–Hardertodesign,makeanddebug设计,制造和调试愈来愈难AlternativeChipOrganizations几种芯片组织IntelHardwareTrends硬件发展趋势IncreasedComplexity复杂程度增加•Powerrequirementsgrowexponentiallywithchipdensityandclockfrequency随着芯片集成度和时钟频率的提高功耗需求呈指数增高—Canusemorechipareaforcache可将芯片更多的面积用来放cache–Smaller体积更小–Orderofmagnitudelowerpowerrequirements功耗需求比(控制逻辑电路)低几个数量级•By2015到2015年—100billiontransistorson300mm2die300—300mm2的晶片上(将集成)千亿只管子–Cacheof100MBcache容量可达100MB–1billiontransistorsforlogic十亿只管子留给控制逻辑部分•Pollack’srule:“鳕鱼”规则—Performanceisroughlyproportionaltosquarerootofincreaseincomplexity—性能与复杂性的平方根大致呈正比关系–Doublecomplexitygives40%moreperformance–复杂性加倍只能提高40%的性能•Multicorehaspotentialfornear-linearimprovement•多核技术具有近线性的增长潜力•Unlikelythatonecorecanuseallcacheeffectively•一个核心不太可能充分利用所有cache的容量PowerandMemoryConsiderations功耗和存储器的考虑因素ChipUtilizationofTransistors芯片内管子的利用SoftwarePerformanceIssues软件性能问题•Performancebenefitsdependentoneffectiveexploitationofparallelresources•性能的提升依赖于对并行资源的有效开发•Evensmallamountsofserialcodeimpactperformance即使少量的串行代码也会影响性能•加速比=•10%inherentlyserialon8processorsystemgivesonly4.7timesperformance•只有10%固有的串行代码运行于8核系统只有4.7倍的性能提升•Communication,distributionofworkandcachecoherenceoverheads•(另外)还有通讯,任务分配和维护cache一致性的开销•Someapplicationseffectivelyexploitmulticoreprocessors•有很多能有效开发一个多核系统的应用Nff/)1(1EffectiveApplicationsforMulticoreProcessors能有效利用多核系统的应用•Database数据库•Servershandlingindependenttransactions•进行独立事物处理的服务器•Multi-threadednativeapplications多线程本地应用—LotusDomino,SiebelCRM•Multi-processapplications多处理应用—Oracle,SAP,PeopleSoft•JavaapplicationsJava应用—JavaVMismulti-threadwithschedulingandmemorymanagementJava的虚拟机就是一个提供对Java应用的调度和内存管理的多线程处理—Sun’sJavaApplicationServer,BEA’sWeblogic,IBMWebsphere,Tomcat(应用服务器,及所有使用Java2平台的应用,快速受益于多核技术)•Multi-instanceapplications多实例应用—Oneapplicationrunningmultipletimes—一个应用运行多次•E.g.ValveGameSoftwareValve游戏软件MulticoreOrganization多核组织结构•Numberofcoreprocessorsonchip芯片上核处理器数量•Numberoflevelsofcacheonchip芯片上cache存储器级数•Amountofsharedcache共享cache的数目•Nextslideexamplesofeachorganization:•(a)ARM11MPCore•(b)AMDOpteron•(c)IntelCoreDuo•(d)IntelCorei7MulticoreOrganizationAlternativesAdvantagesofsharedL2Cache共享L2cache的优点•Constructiveinterferencereducesoverallmissrate结构相关能够减少整体失效率•Datasharedbymultiplecoresnotreplicatedatcachelevel被多个核所共享的数据无需在cache层次复制•Withproperframereplacementalgorithmsmeanamountofsharedcachededicatedtoeachcoreisdynamic•应用合适的帧替换算法,分配给每个核的共享cache的数量是动态的—Threadswithlesslocalitycanhavemorecache—局部性较差的线程将拥有更多的cache空间•Easyinter-processercommunicationthroughsharedmemory•通过共享的cache很容易实现处理器间的通信•CachecoherencyconfinedtoL1•cache一致性问题仅限于L1cache•DedicatedL2cachegiveseachcoremorerapidaccess•私有的L2级cache可以赋予每个核更快的访问速度—Goodforthreadswithstronglocality—非常适用于局部性非常强的线程•SharedL3cachemayalsoimproveperformance•共享的L3级cache也会提高性能IndividualCoreArchitecture•IntelCoreDuousessuperscalarcores•IntelCoreDuo用的超标量核•IntelCorei7usessimultaneousmulti-threading(SMT)•IntelCorei7用的是并发的多线程—Scalesupnumberofthreadssupported—所支持的线程数量按比例增加[提高];–4SMTcores,eachsupporting4threadsappearsas16core–4个SMT核,每个支持4个线程,(在应用层看来)与一个拥有16个核的多核系统相同–Assoftwareisdevelopedtomorefullyexploitparallelresources,anSMTapproachappearstobemoreattractivethanasuperscalarapproach.–随着软件开发对并行资源更全面的利用,SMT方法比超标量方法更有吸引力Intelx86MulticoreOrganization-CoreDuo(1)•2006•Twox86superscalar,sharedL2cache•2个x86超标量核,共享的L2高速缓存•DedicatedL1cachepercore每个核有私有的L1级cache—32KBinstructionand32KBdata32K的指令和32K的数据高速缓存•Thermalcontrolunitpercore每个核都有热量控制单元—Manageschipheatdissipation管理芯片的散热(功能)—Maximizeperformancewithinconstraints在温度限度内性能最大化—Improvedergonomics人类工程学(散热系统,降低风扇噪声)•AdvancedProgrammableInterruptControlled(APIC)高级可编程中断控制器—Inter-processorinterruptsbetweencores支持处理器间的中断—Routesinterruptstoappropriatecore将中断传递到其他相应的核—IncludestimersoOScaninterruptcore每个APIC含有一个定时器,能通过OS设置以产生中断给本地核Intelx86MulticoreOrganization-CoreDuo(2)•PowerManagementLogic功耗管理逻辑—MonitorsthermalconditionsandCPUactivity—监测功耗状况和CPU的工况—Adjustsvoltageandpowerconsumption—调节(工作)电压和功耗(工作在低功耗条件下)—Canswitchindividuallogicsubsystems—各个独立的逻辑子系统可以独立开关(控制)•2MBsharedL2cache2MB共享的L2cache—Dynamicallocation动态分配空间容量—MESIsupportforL1cachesL1cache支持MESI协议—ExtendedtosupportmultipleCoreDuoinSMP可以支持多个CoreDuo扩展SMP–L2datasharedbetweenlocalcoresorexternal–L2数据cache在本地核和外部处理器之间共享•Businterface总线接口Intelx86MulticoreOrganization-Corei7•November2008•Fourx86SMTprocessors•DedicatedL2,sharedL3cache•Speculativepre-fetchforcaches•OnchipDDR3memorycontroller—Three8bytechannels(192bits)giving32GB/s—Nofrontsidebus•QuickPat