THEENDOFTHEGPUROADMAPTimSweeneyCEO,FounderEpicGamestim@epicgames.comBackground:EpicGamesBackground:EpicGamesIndependentgamedeveloperLocatedinRaleigh,NorthCarolina,USAFoundedin1991Over30gamesreleasedGearsofWarUnrealseriesUnrealEngine3isusedby100’sofgamesHistory:UnrealEngineUnrealEngine11996-1999FirstmoderngameengineObject-orientedReal-time,visualtoolsetScriptinglanguageLastmajorsoftwarerendererSoftwaretexturemappingColoredlighting,shadowingVolumetriclighting&fogPixel-accurateculling25gamesshippedUnrealEngine22000-2005PlayStation2,Xbox,PCDirectX7graphicsSingle-threaded40gamesshippedUnrealEngine32006-2012PlayStation3,Xbox360,PCDirectX9graphicsPixelshadersAdvancedlighting&shadowingMultithreading(6threads)AdvancedphysicsMorevisualtoolsGameScriptingMaterialsAnimationCinematics…150gamesindevelopmentUnrealEngine3GamesMassEffect(BioWare)ArmyofTwo(ElectronicArts)BioShock(2KGames)Undertow(ChairEntertainment)GameDevelopment:2009GearsofWar2:ProjectOverviewProjectResources15programmers45artists2-yearschedule$12MdevelopmentbudgetSoftwareDependencies1middlewaregameengine~20middlewarelibrariesPlatformlibrariesGearsofWar2:SoftwareDependenciesGearsofWar2GameplayCode~250,000linesC++,scriptcodeUnrealEngine3MiddlewareGameEngine~2,000,000linesC++codeDirectXGraphicsOpenALAudioSpeedTreeRenderingFaceFXFaceAnimationBinkMovieCodecZLibDataCompr-ession…Hardware:HistoryComputingHistory1985Intel80386:Scalar,in-orderCPU1989Intel80486:Caches!1993Pentium:Superscalarexecution1995PentiumPro:Out-of-orderexecution1999Pentium3:Vectorfloating-point2003AMDOpteron:Multi-core2006PlayStation3,Xbox360:“Many-core”…andwe’rebacktoin-orderexecutionGraphicsHistory19843Dworkstation(SGI)1997GPU(3dfx)2002DirectX9,Pixelshaders(ATI)2006GPUwithfullprogramminglanguage(NVIDIAGeForce8)2009?x86CPU/GPUHybrid(IntelLarrabee)Hardware:2012-2020Hardware:2012-2020L2CacheProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsD$D$D$D$D$ProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsProcessorI$InOrder4ThreadsD$D$D$D$D$NVIDIAGeForce8GeneralPurposeGPUCUDA“C”CompilerDirectX/OpenGLMany-core,vectorarchitectureTeraflop-classperformanceIntelLarrabeex86CPU-GPUHybridC/C++CompilerDirectX/OpenGLMany-core,vectorarchitectureTeraflop-classperformanceHardware:2012-2020CONCLUSIONCPU,GPUarchitecturesaregettingcloserTHEGPUTODAYTheGPUTodayLargeframebufferComplicatedpipelineIt’sfixed-functionButwecanspecifyshaderprogramsthatexecuteincertainpipelinestagesShaderProgramLimitationsNorandom-accessmemorywritesCanwritetocurrentpixelinframebufferCan’tcreatedatastructuresCan’ttraversedatastructuresCanhackitusingtextureaccessesHardtosharedatabetweenmainprogramandshadersprogramsWeirdprogramminglanguageHLSLratherthanC/C++Result:“TheShaderALUPlateau”AntialiasingLimitationsMSAA&OversamplingEvery1bitofoutputprecisioncostsupto2Xmemory&performance!Ideallywant10-20bitsDiscretesampling(ingeneral)TexturefilteringonlyimpliesantialiasingwhenshaderequationislinearMostshaderequationsarenonlinearAliasingisthe#1visualartifactinGearsofWarTextureSamplingLimitationsInherentartifactsofbilinear/trilinearPoorapproximationofIntegrate(color,area)inthepresenceof:SmalltrianglesTextureseamsAlphatranslucencyMaskingFixed-function=poorscalabilityMegatexture,etcFrameBufferModelLimitationFramebuffer:1(orn)layersof4-vectors,wheren=smallconstantIneffectiveforGeneraltranslucencyComplexshadowingmodelsMemorybandwidthrequirement=FPS*PixelCount*LayersDepth*pow(2,n)wheren=qualityofMSAASummaryofLimitations“TheShaderALUPlateau”AntialiasinglimitationsTextureSamplinglimitationsFrameBufferlimitationsTheMeta-Problem:Thefixed-functionpipelineistoofixedtosolveitsproblemsResult:AllgameslooksimilarDerivelittlebenefitfromMoore’sLawCrysisonhigh-endNVIDIASLIsolutiononlylooksatmostmarginallybetterthantopXbox360gamesThisisamarketBEGGINGtobedisrupted:-)SO...Returnto100%“Software”RenderingBypasstheOpenGL/DirectXAPIImplementa100%softwarerendererBypassallfixed-functionpipelinehardwareGenerateimagedirectlyBuild&traversecomplexdatastructuresUnlimitedpossibilitiesCouldimplementthis…OnIntelCPUusingC/C++OnNVIDIAGPUusingCUDA(noDirectX)SoftwareRenderinginUnreal1(1998)Ran100%onCPUNoGPUrequired!FeaturesReal-timecoloredlightingVolumetricFogTiledRenderingOcclusionDetectionSoftwareRenderingin1998vs201260MHzPentiumcouldexecute:16operationsperpixelat320x200,30HzIn2012,a4Teraflopprocessorwouldexecute:16000operationsperpixelat1920x1080,60HzAssumption:Using50%ofcomputingpowerforgraphics,50%forgameplayFutureGraphics:RaytracingForeachpixelCastarayoffintosceneDeterminewhichobjectswerehitContinueforreflections,refraction,etcConsiderLessefficientthanpurerenderingCanuseforreflectionsintraditionalrenderFutureGraphics:TheREYESRenderingModel“Dice”allobjectsinscenedownintosub-pixel-sizedtrianglesRenderingwithFlatShading(!)AnalyticantialiasingPer-p