邓仰东基于GPU的高性能嵌入式计算_IT168CUDA技术沙龙

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

HighPerformanceEmbeddedComputingwithMassivelyParallelProcessorsYangdongSteveDeng邓仰东dengyd@tsinghua.edu.cnTsinghuaUniversity22OutlineMotivationandbackgroundMorphingGPUintoanetworkprocessorHighperformanceradarDSPprocessorConclusion33HighPerformanceEmbeddedComputingFutureITinfrastructuredemandsevenhighercomputingpowerCoreInternetrouterthroughput:upto90Tbps4Gwirelessbasestation:1Gbit/sdataratepercustomerandupto200subscribersinserviceareaCMUdriverlesscar:270GFLOPs(GigaFLoatingpointOperationsPersecond)…44~$1MFastIncreasingICCostsFabricationCostMoore’sSecondLaw:ThecostofdoublingcircuitdensityincreasesinlinewithMoore'sFirstLaw.DesignCostNow$20-50MperproductWillreach$75-120Mat32nmnodeThe4-yeardevelopmentofCellprocessorbySony,IBM,andToshibacostsover$400M.55ImplicationsoftheProhibitiveCostASICswouldbeunaffordableformanyapplications!ScottMacGregor,CEOofBroadcom:•“Broadcomisnotintendingamoveto45nminthenextyearorsoasitwillbetooexpensive.”DavidTurek,VPofIBM:•“IBMwillbepullingoutofCelldevelopment,withPowerXCell8itobethecompany’slastentranceinthetechnology.”66MulticoreMachinesAreReallyPowerful!ManufacturerProcessorTypeModelModelNumber#CoresGFLOPsFP64GFLOPsFP32AMDGPGPUFireStream9270160/8002401200AMDGPURadeonHD5870320/16005442720AMDGPURadeonHD5970640/32009284640AMDCPUMagny-Cours12362.11362.11FujitsuCPUSPARC64VII4128128IntelCPUCore2ExtremeQX9775451.251.2nVidiaGPUFermi4805127801560nVidiaGPGPUTeslaC106024077.76933.12nVidiaGPGPUTeslaC2050448515.21288TileraCPUTilePro64166166AMD12-CoreCPUTileraTileGx100CPUNVidiaFermiGPUGPU:GraphicsProcessingUnitGPGPU:GeneralPurposeGPU77ImplicationsAnincreasingnumberofapplicationswouldbeimplementedwithmulti-coredevicesHuawei:multi-corebasestationsIntel:clusterbasedInternetroutersIBM:signalprocessingandradarapplicationsonCellprocessor…Alsomeetsthestrongdemandsforcustomizabilityandextendibility88OutlineMotivationandbackgroundMorphingGPUintoanetworkprocessorHighperformanceradarDSPprocessorConclusion99BackgroundandmotivationGPUbasedroutingprocessingRoutingtablelookupPacketclassificationDeeppacketinspectionGPUmicroarchitectureenhancementCPUandGPUintegrationQoS-awareschedulingSoftwareRoutingwithGPU1010Ever-IncreasingInternetTraffic1111FastChangingNetworkProtocols/ServicesNewservicesarerapidlyappearingData-center,Ethernetforwarding,virtualLAN,…PersonalcustomizationisoftenessentialforQoSHowever,today’sInternetheavilydependon2protocolsEthernetandIPv4,withbothdevelopedin1970s!1212InternetRouter…1313CiscoGSR124166ft19”2ftCapacity:160Gb/sPower:4.2kWInternetRouterBackbonenetworkdevicePacketforwardingandpathfindingConnectmultiplesubnetsKeyrequirements•Highthroughput:40G-90Tbps•HighflexibilityPacketsRouterPackets1414CurrentRouterSolutionsHardwareroutersFastLongdesigntimeExpensiveAndhardtomaintainNetworkprocessorbasedrouterNetworkprocessor:dataparallelpacketprocessorNogoodprogrammingmodelsSoftwareroutersExtremelyflexibleLowcostButslow1515OutlineBackgroundandmotivationGPUbasedroutingprocessingRoutingtablelookupPacketclassificationDeeppacketinspectionGPUmicroarchitectureenhancementCPUandGPUintegrationQoS-awarescheduling1616CriticalPathofRoutingProcessingIPAddressLookupUpdateHeaderHeaderProcessingRoutingTableIPAddrNextHopBufferMemoryPacketClassificationDataHdrDataHdrQueuePacketRuleSetHdrFieldsFlowSwitchFabricDeepPacketInspection1717GPUBasedSoftwareRouterCPU0CPU1CPU2CPU3FrontSideBus(FSB)NorthBridge(Memorycontroller)NICNICPCIe16-lanePCIe4-lanePCIe4-laneMainMemoryMemoryBusGPUGPUMemoryGraphicsCardInternetDatalevelparallelism=packetlevelparallelism1818RoutingTableLookupRoutingtablecontainsnetworktopologyinformationFindtheoutputportaccordingtodestinationIPaddressPotentiallylargeroutingtable(~1Mentries)•CanbeupdateddynamicallyDestinationAddressPrefixNext-HopOutputPort24.30.32/20192.41.177.148224.30.32.160/28192.41.177.36208.12.32/20192.41.177.1961208.12.32.111/32192.41.177.1955Anexemplarroutingtable1919RoutingTableLookupLongestprefixmatchMemoryboundUsuallybasedonatriedatastructure•Trie:aprefixtreewithstringsaskeys•Anode’spositiondirectlyreflectsitskey•Pointeroperations•Widelydivergentbranches!DestinationAddressPrefixNext-HopOutputPort24.30.32/20192.41.177.148224.30.32.160/28192.41.177.36208.12.32/20192.41.177.1961208.12.32.111/32192.41.177.195524.30.32/2024.30.32.160/28208.12.32/20SearchTrie208.12.32.111/32102342020GPUBasedRoutingTableLookupOrganizethesearchtrieintoanarrayPointerconvertedtooffsetwithregardtoarrayhead6Xspeedupevenwithfrequentroutingtableupdates2121PacketClassificationMatchheaderfieldswithpredefinedrulesSizeofrule-setscanbehuge(i.e.,over5000rules)RuleExamplePriorityTreatpacketsdestinedto166.111.66.70-166.111.66.77ashighestpriorityPacketfilteringDenyalltrafficfromISP3destinedto166.111.66.77TrafficratelimitEnsureISP2doesnotinjectmorethan10Mbpsemailtrafficoninterface2Accounting&billingTreatvideotrafficto166.111.X.Xashighestpriorityandperformaccounting2222PacketClassificationHardwaresolutionUsuallywithTernaryCAM(TCAM)•Expensiveandpowerhu

1 / 54
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功