《马氏距离判别与贝叶斯判别》实验报告姓名:学号:班级:一、目的:1.熟练掌握matlab软件进行距离判别贝叶斯判别的方法与步骤。2.掌握判别分析的回代误判率与交叉误判率的编程。3.掌握贝叶斯判别的误判率的计算。二、内容:我国山区某大型化工厂,在厂区及邻近地区挑选有代表性的15个大气取样点,每日4次同时抽取大气样品,测定其中含有的6种气体的浓度,前后共4天,每个取样点每种气体实测16次,计算每个取样点每种气体的平均浓度,数据见表1。气体数据对应的污染地区分类见表1中最后一列。现有两个取自该地区的4个气体样本,气体指标见表1中的后4行,试解决一下问题:1.判别两类总体的协方差矩阵是否相等,然后用马氏距离判别这4个未知气体样本的污染类别,并计算回代误判率与交叉误判率;若两类总体服从正太分布,第一类与第二类的先验概率分别为7/15、8/15,利用贝叶斯判别样本的污染分类。2.先验概率为多少时,距离判别与贝叶斯判别相同?调整先验概率对判别结果的影响是什么?3.对第一类与第二类的先验概率分别为7/15、8/15,计算误判概率。表1大气样品数据表气体氯硫化氢二氧化硫碳4环氧氯丙烷环已烷污染分类10.0560.0840.0310.0380.00810.022120.040.0550.10.110.0220.0073130.050.0740.0410.0480.00710.02140.0450.050.110.10.0250.0063150.0380.130.0790.170.0580.043260.030.110.070.160.050.046270.0340.0950.0580.160.20.029180.030.090.0680.180.220.039190.0840.0660.0290.320.0120.0412100.0850.0760.0190.30.010.042110.0640.0720.020.250.0280.0382120.0540.0650.0220.280.0210.042130.0480.0890.0620.260.0380.0362140.0450.0920.0720.20.0350.0322150.0690.0870.0270.050.0890.0211样品10.0520.0840.0210.0370.00710.022待定样品20.0410.0550.110.110.0210.0073待定样品30.030.1120.0720.160.0560.021待定样品40.0740.0830.1050.190.021待定三、程序马氏距离判别:A=load('shiyan4.txt');x1=A([1:47815],2:7);x2=A([569:14],2:7);m1=mean(x1);m2=mean(x2);n1=size(x1,1);n2=size(x2,1);s1=cov(x1);s2=cov(x2);p=6;s=((n1-1)*s1+(n2-1)*s2)/(n1+n2-2);Q1=(n1-1)*(log(det(s))-log(det(s1))-p+trace(inv(s)*s1));Q2=(n2-1)*(log(det(s))-log(det(s2))-p+trace(inv(s)*s2));ifQ1chi2inv(0.95,21)&&Q2chi2inv(0.95,21)disp('两组数据协方差相等')elsedisp('两组数据协方差不全相等')end;%判断两总体协方差是否相等%协方差不相等,马氏距离判别x=A(16:19,2:7);w=mahal(x,x1)-mahal(x,x2);fori=1:4ifw(i)=0disp(['第',num2str(i),'个样品污染类别为1']);elsedisp(['第',num2str(i),'个样品污染类别为2']);endend%计算回代误判率fori=1:n1d11(i)=mahal(x1(i,:),x1)-mahal(x1(i,:),x2);endfori=1:n2d22(i)=mahal(x2(i,:),x2)-mahal(x2(i,:),x1);endn11=length(find(d110));n22=length(find(d220));p0=(n11+n22)/(n1+n2)%计算交叉误判率fori=1:n1B=x1([1:i-1,i+1:n1],:);n1=length(B(:,1));n2=length(x2(:,1));m1=mean(B);m2=mean(x2);S1=cov(B);S2=cov(x2);S=((n1-1)*S1+(n2-1)*S2)/(n1+n2-2);Q1=(n1-1)*(log(det(S))-log(det(S1))-p+trace(inv(S)*S1));Q2=(n2-1)*(log(det(S))-log(det(S2))-p+trace(inv(S)*S2));ifQ1chi2inv(0.95,21)&&Q2chi2inv(0.95,21)D11(i)=(x1(i,:)-mean(B))*inv(S)*(x1(i,:)-mean(B))'-(x1(i,:)-mean(x2))*inv(S)*(x1(i,:)-mean(x2))';elseD11(i)=mahal(x1(i,:),x1)-mahal(x1(i,:),x2);end;endfori=1:n2D=x2([1:i-1,i+1:n2],:);n1=length(x1(:,1));n2=length(D(:,1));S1=cov(x1);S2=cov(D);S=((n1-1)*S1+(n2-1)*S2)/(n1+n2-2);Q1=(n1-1)*(log(det(S))-log(det(S1))-p+trace(inv(S)*S1));Q2=(n2-1)*(log(det(S))-log(det(S2))-p+trace(inv(S)*S2));ifQ1chi2inv(0.95,21)&&Q2chi2inv(0.95,21)D22(i)=(x1(i,:)-mean(B))*inv(S)*(x1(i,:)-mean(B))'-(x1(i,:)-mean(x2))*inv(S)*(x1(i,:)-mean(x2))';elseD22(i)=mahal(x2(i,:),x1)-mahal(x2(i,:),x2);end;endN11=length(find(D110));N22=length(find(D220));p1=(N11+N22)/(n1+n2)贝叶斯判别:A=load('shiyan4.txt');x1=A([1:47815],2:7);x2=A([569:14],2:7);n1=size(x1,1);n2=size(x2,1);s1=cov(x1);s2=cov(x2);p=2;s=((n1-1)*s1+(n2-1)*s2)/(n1+n2-2);Q1=(n1-1)*(log(det(s))-log(det(s1))-p+trace(inv(s)*s1));Q2=(n2-1)*(log(det(s))-log(det(s2))-p+trace(inv(s)*s2));%判断两总体协方差是否相等ifQ1chi2inv(0.95,3)&&Q2chi2inv(0.95,3)disp('两组数据协方差相等')elsedisp('两组数据协方差不全相等')end;%判断两总体协方差是否相等%协方差不相等贝叶斯判别x=A(16:19,2:7);p1=7/15;p2=8/15;fori=1:4d1(i)=mahal(x(i,:),x1)-log(det(s1))-2*log(p1);d2(i)=mahal(x(i,:),x2)-log(det(s2))-2*log(p2);ifd1(i)=d2(i)disp(['第',num2str(i),'个样品污染类别为1']);elsedisp(['第',num2str(i),'个样品污染类别为2']);endend%计算回代误判率n11=0;n22=0;fori=1:n1d1(i,1)=(x1(i,:)-m1)*inv(s1)*(x1(i,:)-m1)'-log(det(s1))-2*log(p1);d1(i,2)=(x1(i,:)-m2)*inv(s2)*(x1(i,:)-m2)'-log(det(s2))-2*log(p2);forj=1:2ifd1(i,j)==min(d1(i,:))&j~=1n11=n11+1;endendendfori=1:n2d2(i,1)=(x2(i,:)-m1)*inv(s1)*(x2(i,:)-m1)'-log(det(s1))-2*log(p1);d2(i,2)=(x2(i,:)-m2)*inv(s2)*(x2(i,:)-m2)'-log(det(s2))-2*log(p2);forj=1:2ifd2(i,j)==min(d2(i,:))&j~=2n22=n22+1;endendendp0=(n11+n22)/(n1+n2)%计算交叉误判率N11=0;N22=0;fork=1:n1A=x1([1:k-1,k+1:n1],:);N1=length(A(:,1));M1=mean(A,1);m2=mean(x2);s11=cov(A);s2=cov(x2);S1=((N1-1)*s11+(n2-1)*s2)/(N1+n2-k);fori=1:n1d11(i,1)=M1*inv(S1)*x1(i,:)'-1/2*M1*inv(S1)*M1'+log(p1);d11(i,2)=m2*inv(S1)*x1(i,:)'-1/2*m2*inv(S1)*m2'+log(p2);forj=1:2ifd11(i,j)==min(d11(i,:))&j~=1N11=N11+1;endendendendfork=1:n2B=x2([1:k-1,k+1:n2],:);N2=length(B(:,1));M2=mean(B,1);m1=mean(x1);s22=cov(B);s1=cov(x1);S2=((n1-1)*s1+(N2-1)*s22)/(n1+N2-k);fori=1:n2d22(i,1)=m1*inv(S2)*x2(i,:)'-1/2*m1*inv(S2)*m1'+log(p1);d22(i,2)=M2*inv(S2)*x2(i,:)'-1/2*M2*inv(S2)*M2'+log(p2);forj=1:2ifd22(i,j)==min(d22(i,:))&j~=2N22=N22+1;endendendendp1=(N11+N22)/(n1+n2)四、结果马氏距离判别:两组数据协方差不全相等,第1、2个样品污染类别为1第3、4个样品污染类别为2,回代误判率p=0,交叉误判率p=0.5714。贝叶斯判别:两组数据协方差不全相等,第1、2个样品污染类别为1,第3、4个样品污染类别为2,回代误判率p=0,交叉误判率p=0.2667