执行时间(latency等待时间)

7015340
0 ℃
2020-02-15

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

1PerformanceMeasurement1PerformanceExecutiontime执行时间（latency等待时间）:Timebetweenthestartandthecompletionofanevent一个事件从开始到结束所经过的时间Performance1/(Executiontime)性能与执行时间成反比Throughput吞吐量(bandwidth带宽)：Totalamountofworkdoneinagiventime给定时间内完成的全部工作2PerformanceMeasurement1MachineXisn%fasterthanMachineY:机器X比机器Y快n%YXXYePerformancePerformanctimeExecutiontimeExecutionn10013PerformanceMeasurement2Example:MachineArunsaprogramin10seconds,MachineBrunsthesameprogramin15seconds,Ais__%fasterthanB.501005011015timeExecutiontimeExecution1001nnAB4MaketheCommonCaseFastPerhapsthemostimportantandpervasiveprincipleofcomputerdesignistomakethecommoncasefast:Inmakingadesigntrade-off,favorthefrequentcaseovertheinfrequentcase.计算机设计的最重要的原则就是：加快经常性发生事件的执行速度。5MaketheCommonCaseFastImprovingthefrequentevent,ratherthantherareevent,willobviouslyhelpperformance.Overflowcaseandnooverflowcaseinaddition提高频繁事件的执行速度，而不是提高罕见事件的执行速度，将带来明显的性能上的提高例如加法运算中的溢出和非溢出情况6Amdahl’sLaw1Amdahl’sLawstatesthattheperformanceimprovementtobegainedfromusingsomefastermodeofexecutionislimitedbythefractionofthetimethefastermodecanbeused.阿姆达定律表明：通过改进某模式得到的整体性能提高，受限于该改进模式所占的运行时间比例。7Amdahl’sLaw2Speedup（加速比）=Performanceforentiretaskusingtheenhancementwhenpossible（改进后完成整个任务的性能）Performanceforentiretaskw/ousingtheenhancement（改进前完成整个任务的性能）=Executiontimeforentiretaskw/ousingtheenhancement（改进前完成整个任务的时间）Executiontimeforentiretaskusingtheenhancementwhenpossible（改进前完成整个任务的时间）8Amdahl’sLaw3Executiontimenew=ExecutiontimeoldxwherefE:fractionofenhancementsE:improvementgainedbytheenhancementmode即：新的执行时间=原来执行时间x))1((EEEsff))1((增强加速比增强比例增强比例9Amdahl’sLaw3Speedup=即：加速比＝原来的执行时间/新的执行时间1＝EEEnewoldsff)1(1timeExecutiontimeExecution))1((增强加速比增强比例增强比例10Amdahl’sLaw4Example:Anenhancementrun10timesfasterthantheoriginalmachine,butitisusable40%ofthetime,thenthespeedup=__.Sol:fE=0.4sE=10Speedup=1/((1-0.4)+0.4/10)=1.5611Amdahl’sLawcanalsobeappliedtocomparetwoCPUdesignalternatives,forexample:Implementationsoffloating-point(FP)squarerootvarysignificantlyinperformance,especiallyamongprocessorsdesignedforgraphics.SupposeFPsquareroot(FPSQR)isresponsiblefor20%oftheexecutiontimeofacriticalgraphicsbenchmark.OneproposalistoenhancetheFPSQRhardwareandspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsinthegraphicsprocessorrunfasterbyafactorof1.6;FPinstructionsareresponseibleforatotalof50%oftheexecutiontimefortheapplication.Comparethesetwodesignalternatives.Amdahl’sLawcanalsobeappliedtocomparetwoCPUdesignalternatives,forexample:Implementationsoffloating-point(FP)squarerootvarysignificantlyinperformance,especiallyamongprocessorsdesignedforgraphics.Amdahl’sLaw也可以用于比较两种设计不同的CPU，特别是对于处理图形的处理器来说，求浮点数平方根的不同实现方法在性能上有很大差异。12Amdahl’sLawcanalsobeappliedtocomparetwoCPUdesignalternatives,forexample:Implementationsoffloating-point(FP)squarerootvarysignificantlyinperformance,especiallyamongprocessorsdesignedforgraphics.SupposeFPsquareroot(FPSQR)isresponsiblefor20%oftheexecutiontimeofacriticalgraphicsbenchmark.OneproposalistoenhancetheFPSQRhardwareandspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsinthegraphicsprocessorrunfasterbyafactorof1.6;FPinstructionsareresponseibleforatotalof50%oftheexecutiontimefortheapplication.Comparethesetwodesignalternatives.SupposeFPsquareroot(FPSQR)isresponsiblefor20%oftheexecutiontimeofacriticalgraphicsbenchmark.OneproposalistoenhancetheFPSQRhardwareandspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsinthegraphicsprocessorrunfasterbyafactorof1.6;FPinstructionsareresponseibleforatotalof50%oftheexecutiontimefortheapplication.Comparethesetwodesignalternatives.例如，求浮点数平方根的操作，在一个标准测试程序中占总执行时间的20%。一种方法是改进FPSQR硬件，将它的操作速度提10倍。另一种方法是将所有图形处理器中的FP指令的执行速度都提高1.6倍，这些FP指令在总的执行时间中占50%比较这两种设计方法。13Answer:wecancomparethesetwoalternativesbycomparingthespeedups:ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.23.1)6.15.0)5.01((122.1)102.0)2.01((1FPFPSQRSpeedupSpeedupAnswer:wecancomparethesetwoalternativesbycomparingthespeedups:（可以通过计算加速比来进行比较）ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.（可见提高所有FP操作的性能的方案要好，这是由于它们的执行频率较高）23.1)6.15.0)5.01((122.1)102.0)2.01((1FPFPSQRSpeedupSpeedup14Amdahl’sLaw6ExtremeCases极限情况fE=0Speedup=1fE=1Speedup=sEfE增强比例sE增强加速比EEEsff)1(1Speedup15CPUPerformance1Mostcomputersareconstructedusingaclockrunningataconstantrate多数计算机的运行都基于一个固定频率的时钟信号Referredtobylength/time,e.g.,10ns,orrate,e.g.,100MHzms=10–3sec,s=10–6sec,ns=10–9secHz=1/sec,KHz=103Hz,MHz=106Hz,GHz=109HzClockcycletime=1/clockrate16CPUPerformance2CPI(clockcycleperinstruction每条指令时钟周期数)(程序CPU时钟周期数)(程序指令数)CPUtimeforaprogram=CPUclockcyclesforaprogramxclockcycletime(执行程序花费的CPU时钟周期数)(时钟周期时间)CountnInstructioprogramaforcyclesclockCPUrateclockprogramaforcyclesclockCPUrateclockCountnInstructioCPI17CPUPerformance3CPIxInstructionCountx1/(clockrate)=CPUtimeBUT,noteveryinstructiontakesthesame