-317-附录四判别分析在生产、科学研究和日常生活中,经常会遇到对某一研究对象属于哪种情况作出判断。例如要根据这两天天气情况判断明天是否会下雨;医生要根据病人的体温、白血球数目及其它症状判断此病人是否会患某种疾病等等。从概率论的角度看,可把判别问题归结为如下模型。设共有n个总体:nξξξ,,,21L其中iξ是m维随机变量,其分布函数为),,(1mixxFL,ni,,2,1L=而),,(1mxxL是表征总体特性的m个随机变量的取值。在判别分析中称这m个变量为判别因子。现有一个新的样本点Tmxxx),,(1L=,要判断此样本点属于哪一个总体。Matlab的统计工具箱提供了判别函数classify。函数的调用格式为:[CLASS,ERR]=CLASSIFY(SAMPLE,TRAINING,GROUP,TYPE)其中SAMPLE为未知待分类的样本矩阵,TRAINING为已知分类的样本矩阵,它们有相同的列数m,设待分类的样本点的个数,即SAMPLE的行数为s,已知样本点的个数,即TRAINING的行数为t,则GROUP为t维列向量,若TRAINING的第i行属于总体iξ,则GROUP对应位置的元素可以记为i,TYPE为分类方法,缺省值为'linear',即线性分类,TYPE还可取值'quadratic','mahalanobis'(mahalanobis距离)。返回值CLASS为s维列向量,给出了SAMPLE中样本的分类,ERR给出了分类误判率的估计值。例已知8个乳房肿瘤病灶组织的样本,其中前3个为良性肿瘤,后5个为恶性肿瘤。数据为细胞核显微图像的10个量化特征:细胞核直径,质地,周长,面积,光滑度。根据已知样本对未知的三个样本进行分类。已知样本的数据为:13.54,14.36,87.46,566.3,0.0977913.08,15.71,85.63,520,0.10759.504,12.44,60.34,273.9,0.102417.99,10.38,122.8,1001,0.118420.57,17.77,132.9,1326,0.0847419.69,21.25,130,1203,0.109611.42,20.38,77.58,386.1,0.142520.29,14.34,135.1,1297,0.1003-318-待分类的数据为:16.6,28.08,108.3,858.1,0.0845520.6,29.33,140.1,1265,0.11787.76,24.54,47.92,181,0.05263解:编写程序如下:a=[13.54,14.36,87.46,566.3,0.0977913.08,15.71,85.63,520,0.10759.504,12.44,60.34,273.9,0.102417.99,10.38,122.8,1001,0.118420.57,17.77,132.9,1326,0.0847419.69,21.25,130,1203,0.109611.42,20.38,77.58,386.1,0.142520.29,14.34,135.1,1297,0.1003]x=[16.6,28.08,108.3,858.1,0.0845520.6,29.33,140.1,1265,0.11787.76,24.54,47.92,181,0.05263]g=[ones(3,1);2*ones(5,1)];[class,err]=classify(x,a,g)