AcceleratingVASPElectronicStructureCalculationsUsingGraphicProcessingUnitsMohamedHacene,[a]AniAnciaux-Sedrakian,[a]XavierRozanska,[b]yDiegoKlahr,[a]zThomasGuignon,*[a]andPaulFleurat-Lessard*[b]WepresentawaytoimprovetheperformanceoftheelectronicstructureViennaAbinitioSimulationPackage(VASP)program.Weshowthathigh-performancecomputersequippedwithgraphicsprocessingunits(GPUs)asacceleratorsmayreducedrasticallythecomputationtimewhenoffloadingthesesectionstothegraphicchips.Theprocedureconsistsof(i)profilingtheperformanceofthecodetoisolatethetime-consumingparts,(ii)rewritingthesesothatthealgorithmsbecomebetter-suitedforthechosengraphicaccelerator,and(iii)optimizingmemorytrafficbetweenthehostcomputerandtheGPUaccelerator.WechosetoaccelerateVASPwithNVIDIAGPUusingCUDA.WecomparetheGPUandoriginalversionsofVASPbyevaluatingtheDavidsonandRMM-DIISalgorithmsonchemicalsystemsofupto1100atoms.Inthesetests,thetotaltimeisreducedbyafactorbetween3and8whenrunningonn(CPUcoreþGPU)comparedtonCPUcoresonly,withoutanyaccuracyloss.VC2012WileyPeriodicals,Inc.DOI:10.1002/jcc.23096IntroductionComputationalchemistryevolvedtobecomeahighlyvaluabletoolforthecharacterizationandanalysisofmaterialsandchemicalphenomena.Quantumchemicalcalculationsarenowroutinelyusedtocomplementexperiments.[1,2]Becausethestudiedsystemsareincreasinglycomplex,reducingthecom-putationtimeisanimportantissue.Knownbottlenecksinquantumchemical-physicssimulationsattheatomisticlevelarematrixproducts,vectoroperations,andFastFourierTransforms(FFTs).Twoapproachesmaybefollowedtoreducetheoverallcomputationtimeandtoallowsimulatinglargerchemicalsystems.Inthefirstone,wecouldoptimize(i)themathematicalapproximationsforthetheoreti-calchemicalequations(e.g.,resolutionofidentitydensityfunc-tionaltheory(DFT),[3–6]waveletDFT,[7,8]linearscalingapproaches[9]),and(ii)thephysicochemicalapproximationsforthechemicalsystemand/oritsenvironment(e.g.,cluster,peri-odicorhybridapproaches,implicitelectrostaticenvironments).Inthesecondone,wecould(re)designcompletelyorpartiallythesoftwaretotakeadvantageofthenewesthardwaretech-nologies.Wegoforthelatter.CurrenthardwaredesignsincludemulticoreCPU(thatisCentralProcessingUnitcontainingmorethantwocores),manycoreCPU(morethan32coresperCentralProcessingUnit),andGPU(GraphicsProcessingUnits)platforms.Hetero-geneousGPU-basedmulticoreplatformsarecomposedofGPUsandmulticoresCPUs.Theefficiencyofthesearchitec-turesisdemonstratedbydifferentbenchmarkslikeFFTorSparseMatrix-Vectormultiplicationtestsforinstance.[10,11]Inthisarticle,westudytheheterogeneousGPU-basedmulticorearchitectureperformanceinnumericalsimulationsofchemicalsystems.Duringthepastyears,theoreticalchemistrysoftwareshavebeenmodifiedordevelopedfromscratchtobenefitfromthemassivelyparallelGPUtechnology.[8,12–24]TheViennaAbinitioSimulationPackage(VASP)isanefficientplane-wavecodebasedonperiodicDFT.[25–28]Itallowstheoreticalstudyofchemicalsystemsviaenergyandforcescalculations,whichpermitgeometryoptimizations,moleculardynamicssimula-tions,anddeterminationofawiderangeofphysicochemicalpropertiesforsolidsorsurfaces.[1,2,25–29]Itshowsgoodper-formanceonCPUhardwareandcouldgainattractivenessafterbeingportedtoGPUhardware.Maintzetal.[24]recentlyportedtheVASPBlocked-DavidsonwavefunctionoptimizationtoGPU.Theirmodificationresultedinacomputationtimereduc-tionbyafactorof7onaC2050graphiccard(Fermiarchitec-ture)incomparisontoanIntelXeonX55602.8GHzprocessor.However,theRMM-DIISalgorithmismoreefficientforlargechemicalsystemsandmoleculardynamicssimulations.[28]ItisthusdesirabletoportthisalgorithmtoGPU.Therefore,inthiswork,wefocusontheroutinesinvolvedintheelectronicminimizationandtheirbehavioronGPU-basedclusters.Inparticular,westudytheGPUversionoftheblockedDavidson(ALGO¼Normalkeyword),RMM-DIIS(ALGO¼[a]M.Hacene,A.Anciaux-Sedrakian,D.Klahr,T.GuignonIFPEnergiesNouvelles,1et4avenuedeBois-Preau,F-92852Rueil-MalmaisonCedex,FranceE-mail:Thomas.Guignon@ifpen.fr[b]X.Rozanska,P.Fleurat-LessardLaboratoiredeChimiedel’ENSdeLyon,UniversitedeLyon,UMRCNRS5182,46Alleed’Italie,F-69364LyonCedex07,FranceE-mail:Paul.Fleurat-Lessard@ens-lyon.fr†Presentaddress:MaterialsDesign,18ruedeSaisset,F-92120Montrouge,France.‡Presentaddress:TotalE&P,CentreScientifiqueetTechniqueJ.Feger,AvenueLarribau,F-64000Pau,France.Contract/grantsponsor:KingAbdullahUniversityofScienceandTechnology(KAUST,AwardNo.UK-C0017).VC2012WileyPeriodicals,Inc.JournalofComputationalChemistry2012,33,2581–25892581FULLPAPER)andmixedblockedDavidsonandRMM-DIIS(ALGO¼FAST)algorithms.Beforedescribingtheappliedapproachandimplementationdetails,wesummarizebrieflynecessarybackgroundtounderstandthehardwareandsoftwarecon-straintsimposedbyaGPUarchitecture.Thisarticleisorganizedasfollows:Inthefirstpart,wepres-enttheGPUevolution,themainfeaturesofcurrentGPUsandtheirspecificitiesintermofprogrammingmodels.ThesecondpartdescribestheportingofsomeVASProutinestoGPUwithspecialcaretakenformulticoreCPU,andmulti-GPUarchitec-tures.Resultsaregatheredandanalyzedinthethirdsection,whilethesectionfourconcludesthis