1PerformanceMeasurement1PerformanceExecutiontime执行时间(latency等待时间):Timebetweenthestartandthecompletionofanevent一个事件从开始到结束所经过的时间Performance1/(Executiontime)性能与执行时间成反比Throughput吞吐量(bandwidth带宽):Totalamountofworkdoneinagiventime给定时间内完成的全部工作2PerformanceMeasurement1MachineXisn%fasterthanMachineY:机器X比机器Y快n%YXXYePerformancePerformanctimeExecutiontimeExecutionn10013PerformanceMeasurement2Example:MachineArunsaprogramin10seconds,MachineBrunsthesameprogramin15seconds,Ais__%fasterthanB.501005011015timeExecutiontimeExecution1001nnAB4MaketheCommonCaseFastPerhapsthemostimportantandpervasiveprincipleofcomputerdesignistomakethecommoncasefast:Inmakingadesigntrade-off,favorthefrequentcaseovertheinfrequentcase.计算机设计的最重要的原则就是:加快经常性发生事件的执行速度。5MaketheCommonCaseFastImprovingthefrequentevent,ratherthantherareevent,willobviouslyhelpperformance.Overflowcaseandnooverflowcaseinaddition提高频繁事件的执行速度,而不是提高罕见事件的执行速度,将带来明显的性能上的提高例如加法运算中的溢出和非溢出情况6Amdahl’sLaw1Amdahl’sLawstatesthattheperformanceimprovementtobegainedfromusingsomefastermodeofexecutionislimitedbythefractionofthetimethefastermodecanbeused.阿姆达定律表明:通过改进某模式得到的整体性能提高,受限于该改进模式所占的运行时间比例。7Amdahl’sLaw2Speedup(加速比)=Performanceforentiretaskusingtheenhancementwhenpossible(改进后完成整个任务的性能)Performanceforentiretaskw/ousingtheenhancement(改进前完成整个任务的性能)=Executiontimeforentiretaskw/ousingtheenhancement(改进前完成整个任务的时间)Executiontimeforentiretaskusingtheenhancementwhenpossible(改进前完成整个任务的时间)8Amdahl’sLaw3Executiontimenew=ExecutiontimeoldxwherefE:fractionofenhancementsE:improvementgainedbytheenhancementmode即:新的执行时间=原来执行时间x))1((EEEsff))1((增强加速比增强比例增强比例9Amdahl’sLaw3Speedup=即:加速比=原来的执行时间/新的执行时间1=EEEnewoldsff)1(1timeExecutiontimeExecution))1((增强加速比增强比例增强比例10Amdahl’sLaw4Example:Anenhancementrun10timesfasterthantheoriginalmachine,butitisusable40%ofthetime,thenthespeedup=__.Sol:fE=0.4sE=10Speedup=1/((1-0.4)+0.4/10)=1.5611Amdahl’sLawcanalsobeappliedtocomparetwoCPUdesignalternatives,forexample:Implementationsoffloating-point(FP)squarerootvarysignificantlyinperformance,especiallyamongprocessorsdesignedforgraphics.SupposeFPsquareroot(FPSQR)isresponsiblefor20%oftheexecutiontimeofacriticalgraphicsbenchmark.OneproposalistoenhancetheFPSQRhardwareandspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsinthegraphicsprocessorrunfasterbyafactorof1.6;FPinstructionsareresponseibleforatotalof50%oftheexecutiontimefortheapplication.Comparethesetwodesignalternatives.Amdahl’sLawcanalsobeappliedtocomparetwoCPUdesignalternatives,forexample:Implementationsoffloating-point(FP)squarerootvarysignificantlyinperformance,especiallyamongprocessorsdesignedforgraphics.Amdahl’sLaw也可以用于比较两种设计不同的CPU,特别是对于处理图形的处理器来说,求浮点数平方根的不同实现方法在性能上有很大差异。12Amdahl’sLawcanalsobeappliedtocomparetwoCPUdesignalternatives,forexample:Implementationsoffloating-point(FP)squarerootvarysignificantlyinperformance,especiallyamongprocessorsdesignedforgraphics.SupposeFPsquareroot(FPSQR)isresponsiblefor20%oftheexecutiontimeofacriticalgraphicsbenchmark.OneproposalistoenhancetheFPSQRhardwareandspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsinthegraphicsprocessorrunfasterbyafactorof1.6;FPinstructionsareresponseibleforatotalof50%oftheexecutiontimefortheapplication.Comparethesetwodesignalternatives.SupposeFPsquareroot(FPSQR)isresponsiblefor20%oftheexecutiontimeofacriticalgraphicsbenchmark.OneproposalistoenhancetheFPSQRhardwareandspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsinthegraphicsprocessorrunfasterbyafactorof1.6;FPinstructionsareresponseibleforatotalof50%oftheexecutiontimefortheapplication.Comparethesetwodesignalternatives.例如,求浮点数平方根的操作,在一个标准测试程序中占总执行时间的20%。一种方法是改进FPSQR硬件,将它的操作速度提10倍。另一种方法是将所有图形处理器中的FP指令的执行速度都提高1.6倍,这些FP指令在总的执行时间中占50%比较这两种设计方法。13Answer:wecancomparethesetwoalternativesbycomparingthespeedups:ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.23.1)6.15.0)5.01((122.1)102.0)2.01((1FPFPSQRSpeedupSpeedupAnswer:wecancomparethesetwoalternativesbycomparingthespeedups:(可以通过计算加速比来进行比较)ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.(可见提高所有FP操作的性能的方案要好,这是由于它们的执行频率较高)23.1)6.15.0)5.01((122.1)102.0)2.01((1FPFPSQRSpeedupSpeedup14Amdahl’sLaw6ExtremeCases极限情况fE=0Speedup=1fE=1Speedup=sEfE增强比例sE增强加速比EEEsff)1(1Speedup15CPUPerformance1Mostcomputersareconstructedusingaclockrunningataconstantrate多数计算机的运行都基于一个固定频率的时钟信号Referredtobylength/time,e.g.,10ns,orrate,e.g.,100MHzms=10–3sec,s=10–6sec,ns=10–9secHz=1/sec,KHz=103Hz,MHz=106Hz,GHz=109HzClockcycletime=1/clockrate16CPUPerformance2CPI(clockcycleperinstruction每条指令时钟周期数)(程序CPU时钟周期数)(程序指令数)CPUtimeforaprogram=CPUclockcyclesforaprogramxclockcycletime(执行程序花费的CPU时钟周期数)(时钟周期时间)CountnInstructioprogramaforcyclesclockCPUrateclockprogramaforcyclesclockCPUrateclockCountnInstructioCPI17CPUPerformance3CPIxInstructionCountx1/(clockrate)=CPUtimeBUT,noteveryinstructiontakesthesame