利用Matlab进行极值统计的一个例子——GPD方法科研菜鸟数据和例子均来自于S.Coles,Anintroductiontostatisticalmodelingofextremevalues,Springer,2007一、数据数据是south-westEngland这个地方的日降雨量(1914-1962),原始数据可以从R语言包中调出:二、GPD的极大似然估计1、超越量的似然估计广义Pareto分布为:{}1/|()11yPXuyXuHyξξσ−−≡=−+其中,0y,1/0yξσ+,并且广义Pareto分布和广义极值分布的参数关系为:ξξ=,()uσσξµ=+−。Matlab中涉及这一方面的函数主要有:050001000015000020406080Indexrainparmhat=gpfit(X)returnsmaximumlikelihoodestimatesoftheparametersforthetwo-parametergeneralizedPareto(GP)distributiongiventhedatainX.parmhat(1)isthetailindex(shape)parameter,Kandparmhat(2)isthescaleparameter,sigma.gpfitdoesnotfitathreshold(location)parameter.[parmhat,parmci]=gpfit(X)returns95%confidenceintervalsfortheparameterestimates.-----------------------------------------------------------程序----------------------------------------------------clear;clc;load('rain.txt');srain=rain(rain30);[parmhat,parmci]=gpfit(srain-30);bin=min(srain-30):0.05:max(srain-30);fori=1:length(bin)g(i)=sum(srain-30=bin(i))./length(srain-30);endx=parmhat(1).*(bin./parmhat(2));y=-parmhat(1).*log(1-g);plot(x,y,'ko')holdon;plot(0.5:0.01:1.5,log(1+(0.5:0.01:1.5)));xlabel('\xi(y/\sigma)');ylabel('-\xilog(1-H(y))');-------------------------------------------------------结果---------------------------------------------------------该结果与Coles的结果类似。还可利用evim包进行极值统计,evim包的下载地址:~rgencay/evim.html,evim包涉及到这一方面的函数为:out=gpd(data,threshold,nextremes,’information’);wheredataistheinputvectorandthresholdistheexceedancethresholdvalue.Theoptionnextremessetsthenumberofextremes.Noticethateithernextremesorthresholdshouldbeprovided.Theoptioninformationisastringinputthatdetermineswhetherstandarderrorswillbecalculatedwithdataorwillbepostulatedvalue.Itcanbesettoeitherexpectedorobserved;thedefaultvalueisobserved.Outputoutisastructurewiththefollowingsevenfields:•out.parests:vectorofestimatedparameters:thefirstelementisestimatedξandthesecondisestimatedβ•out.funval:valueofthenegativelog-likelihoodfunction•out.terminated:terminationcondition:1ifsuccessfullyterminated,0otherwise•out.details:detailsofthenonlinearminimizationprocessofthenegativelikelihood•out.varcov:estimatedvariance-covariancematrixoftheparameters•out.parses:estimatedstandarddeviationsoftheparameters•out.data:vectorofexceedances-----------------------------------------程序----------------------------------clear;clc;load('portpirie.txt');out=gpd(rain,30,[],’observed’);%out.par_ests(1):shapepara.%out.par_ests(2):scalepara.%95%confidenceintervals:ci_plus=out.par_ests+1.96.*out.par_ses;ci_neg=out.par_ests-1.96.*out.par_ses;%out.par_ses:stdofpara.-------------------------------------------------------结果---------------------------------------------------------2、回归水平的估计当0ξ≠时,1upzupξςσξ=+−其中,{}pPXzp=(pzu)为方便起见,上式中的参数ξ和σ均去掉了头上的帽子,下文类同,且:{}ukPXunς≡≈,其中,k表示n个观测数据中大于u的个数。当0ξ=时,logupzupςσ=+画pz关于1logp的图。显然,在这种图中,可以较为方便的看出形状参数的符号。下图中的直线就是根据方框中的公式计算的。也可以直接根据样本来画returnlevel图。将大于u的数据从大到小排列,即:(1)(2)()Nzzz≤≤,()iz所对应的回归间隔为:111TNin=−++,注意上式中n是数据的总数目,N只是大于u的数据数目。()izT的图如下图圆点所示。下图中的虚线是95%信度区间,计算方法为:1.96(),1.96()ppppzVarzzVarz−+其中:()TpppVarzzVz=∇∇{}{}1211[()1()log()()1]Tpuuuuuzmmmmmξξξξξσςσξςσξςςξς−−−−∇=−−+−,其中,1mp=,注意到此时方差-协方差矩阵的对角线顺序为:(1)/000()()0()()uunVVarEEVarςςξσξξσσ−=右下角部分是参数的方差和协方差矩阵。下图是以天为单位的returnlevel图:下图是以年为单位的returnlevel图:-----------------------------------------程序----------------------------------functionreturn_level_gpd(par,vcov,data0,threshold)%returenlevelplot%parofGPDfittedtodata%par(1):shapepara.%par(2):scalepara.%vcov:variance-covariancematrix%data0:originaldatadata=data0(data0threshold);N=length(data);%empiricalreturnlevelemp_rl=sort(data);emp_rp=1./((N:-1:1)./(length(data0)+1));%modelreturnlevelx=0.0000005:0.000001:0.008;ku=sum(data0threshold)./length(data0);mod_rl=threshold+(par(2)./par(1)).*((ku./x).^par(1)-1);mod_rp=1./x;%varianceofreturnlevel.new_vcov=zeros(3,3);new_vcov(1,1)=(ku*(1-ku))/length(data0);new_vcov(2:3,2:3)=vcov;fori=1:length(x)m=1./x(i);dzp=[par(2).*m.^par(1).*ku.^(par(1)-1);-par(2).*par(1).^(-2).*((m.*ku).^par(1)-1)+par(2).*par(1).^(-1).*(m.*ku).^par(1).*log(m.*ku);par(1).^(-1).*((m.*ku).^par(1)-1)];varzp(i)=dzp'*new_vcov*dzp;clearypdzp;end%95%confidenceintervalposi_ci=mod_rl+1.96.*sqrt(varzp);neg_ci=mod_rl-1.96.*sqrt(varzp);figure(1)semilogx(emp_rp,emp_rl,'k.');holdon;plot(mod_rp,mod_rl);plot(mod_rp,posi_ci,'k--','LineWidth',1);plot(mod_rp,neg_ci,'k--','LineWidth',1);xlabel('returnperiod(day)');ylabel('returnlevel');figure(2)semilogx(emp_rp./365,emp_rl,'k.');holdon;plot(mod_rp./365,mod_rl);plot(mod_rp./365,posi_ci,'k--','LineWidth',1);plot(mod_rp./365,neg_ci,'k--','LineWidth',1);xlabel('returnperiod(year)');ylabel('returnlevel');3、模型的诊断PP图经验CDF为:()()/(1)iHyiN=+根据拟合的GPD计算的CDF为:1/()()ˆ()11iiyHyξξσ−=−+画()()ˆ()()iiHyHy的图即为P-P图。--------------------------------------------------------程序---------------------------------------------------functiongpd_prob_plot(par,data0,threshold)%p-pplot%parofGPDfittedtodata%par(1):shapepara.%par(2):scal