一种三维快速傅里叶变换并行算法-方维

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ISSN1000-1239PCN11-1777PTPJournalofComputerResearchandDevelopment48(3):440-446,2011:2009-06-04;:2010-05-28:(61033009,60873210);/0(2009AA01A134);方维孙广中吴超陈国良(230027)(230027)(fangwei@mail.ustc.edu.cn)AParallelAlgorithmofThree-DimensionalFastFourierTransformFangWei,SunGuangzhong,WuChao,andChenGuoliang(SchoolofComputerScienceandTechnology,UniversityofScienceandTechnologyofChina,Hefei230011)(KeyLaboratoryonHighPerformanceComputingofAnhuiProvince,Hefei230027)AbstractThree-dimensionalfastFouriertransform(3D-FFT)iswidelyusedinphysics.Itiscrucialtomanyapplicationsbecauseitdemandsheavycalculationandcommunications.Thusinmostcasesitis3D-FFTthatdominatesthecomputationaltime.Thetraditionalparallelalgorithmsof3D-FFTarenotsuitableforthesparselatticewhichisoftenencounteredinthefieldofquantumcomputing,becausetheblockpartitioningusedmayinvolvemanyredundantcomputingandcommunications,duetothesparseofnon-zeroelementsinFFTgrid.Inthispaperweproposeanovalparallelalgorithmof3D-FFT.Unlikethepreviousmethods,thenewalgorithmusesslicepartitioning,andredesignsthecomputingorderinordertominimizethecalculationtimeandcommunicationcost.Takingadvantageoftheslicepartitioning,thenewmethodarehighlyscalableandcanautomaticallysatisfythedemandsofloadbalancing.Wecompareitwithtraditionalalgorithmsintheoryandinpractice.Theoreticalperformanceanalysisshowsthatthenewmethodcangreatlyreducethecomputationaltimeandincreaseparallelspeedup.Theexperimentshavebeencarriedcutinsomehigh-performancemachines,suchasKD-50,IBMJS22andDAWNING.Theresultsshowthatournewalgorithmbehavesmuchbetterthantraditionalalgorithmsinperforming3D-FFTforsparselattice.Keywords3DfastFouriertransform;parallelalgorithm;parallelcomputing;speedup三维快速傅里叶变换在物理计算领域中被广泛地使用.传统并行算法所使用的面划分和块划分方法并不适合稀疏三维向量的傅里叶变换.提出了一种新三维快速傅里叶变换的并行算法,针对稀疏三维向量的傅里叶变换,新算法通过重新调整x,y,z三个方向的计算顺序,能最大限度地减少计算量以及进程间的通信量,从而减少计算时间,提高并行加速比.详尽的理论分析以及多个高性能计算平台上的实验结果证明:在对稀疏三维向量作傅里叶变换时,新算法优于传统算法.三维快速傅里叶变换;并行算法;并行计算;加速比TP301,(fastFouriertransform,FFT).,,,.FFT.FFT,FFT.1),:;(firstprinciple).,,(densityfunctionaltheory),,,,,,.,:E=E[n0(r)]=min{E[n(r]}.(1)Kohn-Sham-Ü22m$2+Veff(r)Ui(r)=EiUi(r),(2)-Ü22m$2,Veff(r).Ui(r)Ui(r)=Ejcjqj,(2)EkqTk,:EkqTk-Ü22m$2+Veff(r)Ejcjqj=EiEkqTkEjcjqj=EiEjcj,(3),,.,:EkqTk[Veff(r)]Ejcjqj=EkqTkexp(-iqr)[Veff(r)]exp(iqr)Ejcjqj.(4),,Veff(r),,.2)[1].{xk},0k[n,{Xk}:Xk=En-1j=0Xkjnxj,(5),Xnn,Xn=exp(-2PiPn).(5){Xk}F(x0,,,xn-1),Fkj=Xkjn./(n2).n=2m,(1)Xk=En-1j=0,evenXkjnxj+En-1j=0,oddXkjnxj=E2m-1j=0Xkjnxej+XknE2m-1j=0Xkjnxoj.(6)(6)X2kn=XknP2,xek=x2k,xok=x2k+1.(6),n2nP2.,/(nlbn).n.,Xkxkykz=Enx-1jx=0XkxjxnxEny-1jy=0XkyjynyEnz-1jz=0Xkzjznzxjxjyjz.(7),(,,xjxjyjz,,).(7),.,n@n@n,x,y,zn@n,n@n,n@n.FFT[2-4]..FFTW[5-6],.Haynes[7].,FFT,FFT,FFT,.IBMJS22DAWNING4000AKD-50,.12,.42.441:,{ax,y,z}FFT,{axc,y,z}xFFT,yz.1.1.:FFT{ax,y,z};:FFT{ax,y,z}.Step1.FFT4.1{axc,yc,z},z(1(a));Step2.,{axc,y,zc},y(1(b));Step3.,{ax,yc,zc},x(1(c)).Fig.1ThefirsttraditionalparallelFFTalgorithm.112.2.:FFT{ax,y,z};:FFT{ax,y,z}.Step1.FFTy-z4,1{axc,y,z},zy(2(a));Step2.alltoall,{ax,y,zc},x(2(b)).Fig.2ThesecondtraditionalparallelFFTalgorithm.222,21,2.22.1FFT,.,{ax,y,z}(x,y,z),/xiyj-0:FFTxxi,yyjz.N={1,2,,,n},n.8N@N@N.8fz8fx,z:fz:8yN@N,(x,y,z)y(x,y);fx,z:8yN,(x,y,z)y(y).3.FFT.:FFT,(xi,yj,zk),(i,j,k)I80;:FFT.Step1.(i,j)Ifz(8),/xiyj-0FFT;Step2.(i,j)IN@fx,z(8),/yizj-0FFT;Step3.(i,j)IN@N,/xizj-0FFT.3,,0.4.8FFT.4422011,48(3):FFT;:FFT(8).Step1.(i,j)IN@N,/xizj-0FFT;Step2.(i,j)IN@fx,z(8),/yizj-0FFT;Step3.(i,j)Ifz(8),/xiyj-0FFT.4.FFT,43.,,31/xiyi-0,.p,:5.FFT.:FFT,80;:FFT.Step1./xiyi-0p;Step2./xiyi-0;Step3.alltoall,/xiyi-0,NPpxy;Step4.(i,j)IN@fx,z(8),/yizj-0FFT;Step5.(i,j)IN@N,/xizj-0FFT.,5FFT,.213.2.2,,(4)cin@n@nFFT,3,nP2ci,FFT0.ciA,:1)AaiFFT(xi,yi,zi).ai(xi,yi,zi)FFT;2)FFTFFT(),,FFT();3)FFT(xi,yi,zi)aic()A.4.Fig.3Anexampleinquantumcomputing.3Fig.4UsingtranditionalparallelFFTalgorithminquantumcomputing.4n@n@nFFT,p,,n3(p-1)Pp,n3(p-1)Pp.FFT.FFT,:6.FFT.:A;:A.Step1.A,/xy-0,FFT.0;Step2.5FFT;1FFT,4,443:FFT.Step3.(i,j)IN@N,/xizj-0FFT;Step4.y/yz-0FFT;Step5.,/xy-0FFT;Step6.alltoall,/xy-0;Step7./xy-0FFT;Step8.Step1A.5:Fig.5UsingnewparallelFFTalgorithminquantumcomputing.5,(p-1)@n@(xy-)Pp.FFT,.2.33.n@n@nFFT,nP2.,Pn3(p-1)P16p,PP(4@4)U012.FFTP@nP16+n@nP2+n@nU117n2,3n2,0156.,0xy-m.:1).,n@mPp,n@n@nPp.2).mxy-y,mxy-my,1y,yFFT:E(m)=C(m)N;(8),m[n,:C(m)=C1nQm1+2C2nQm2+,+kCknQmk+,+(m-1)Cm-1nQmm-1+mCmnQmm=Emk=1kCknQmk;N=m+n-1m.Qmkmk,.Nmn.mn,:C(m)=Enk=-mn-kCknRmk,Rmkmk,[1,n].,Rmk=EEni=1iki=mk!Fnj=1kj!,N=EEni=0iki=mn!Fnj=0kj!.E(m)=Emk=1knkm-1k-1m+n-1m=nmn+m-1(m[n);Enk=-mn-knkEEni=1iki=mk!Fnj=1kj!EEni=0iki=mn!Fnj=0kj!(mn).(9)xx-,T(m)=m+E(y)@n+n2,Tt(m)=3@n@n.m+E(y)@n+n23@n@n.1[E(y)[n,4442011,48(3)2P3,1P3.FFT0r.,xy-0,0..3IBMJS224000AKD-50,KD-50.KD-50GNUPLinux,GCC410,MPImpich2-11016.2F750MHz,1GB,.IBMJS22IBMAIX5LV513,Power6410GHz,SMP,16GB,Infiniband.4000AGNUPLinux,OpteronQuadCore119GHzPGiga-E,4GB,.,C.FFTW,FFTW21115.Fig.6ThecomparisonoftimecostbetweennewparallelFFTalgorithmandtranditionalFFTalgorithminKD-50.6KD-50FFT6FFT.FFT64@64@64,/xy-0673.6,1,,if/xy-0/yz-0,,,.2,,,,FFT,65536().,FFT,3589,.4,8,16,.7KD-50.7,,48,.cac

1 / 7
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功