南京航空航天大学硕士学位论文聚类和孤立点检测算法的研究与实现姓名:郑健申请学位级别:硕士专业:计算机应用技术指导教师:皮德常20071201I(ODBSN)r-(RNBC)ODBSNk-ODBSN(SNOF)RNBCDBSCANJavaODBSNRNBCLOFDBSCANODBSNRNBCLOFDBSCANr-IIAbstractDataminingtechniquescanbeusedtofindoutpotentialandusefulknowledgefromvastamountofdata.Withtherapiddevelopmentofthedataminingtechniques,clusteringanalysisandoutlierdetectionarewidelyappliedtothefieldsuchaspatternrecognition,dataanalysis,imageprocessing,andmarketresearch.Researchonclusteringanalysisandoutlierdetectionalgorithmshasbecomeahighlyactivetopicinthedataminingfield.Thisthesisintroducesthetheoryofdatamining,anddeeplyanalyzesthealgorithmsofclusteringandoutlierdetection.Basedontheanalysisofdensity-basedclusteringandoutlierdetectionalgorithms,wepresentOutlierDetectionalgorithmBasedonSymmetricNeighborhood(ODBSN)andr-NeighborhoodBasedClusteringalgorithm(RNBC).IntheODBSNalgorithm,weintroducetheconceptofreverseknearestneighbors.Basedonthisconcept,wedesignanoutlierdetectionalgorithmbasedonsymmetricneighborhoodtoimprovetheefficiencyofthedensitybasedoutlierdetectionalgorithms.TheODBSNalgorithmdoesnotneedtocomputethereachabledistanceandreachabledensity,sothecomputationcostcanbegreatlyreduced.Inthemeanwhile,outlierdetectionbasedonSymmetricNeighborhood-basedOutlierFactor(SNOF)alsomakestheoutliermoreaccurate.IntheRNBCalgorithm,weintroducetheconceptofrelativedensityfactor.Basedonthisconcept,wedesignanewdensity-basedclusteringalgorithm.ComparedwithclusteringalgorithmDBSCAN,thisalgorithmhastowadvantages:first,weusetherelativedensityfactortodistinguishthelocalcorepointfromlocalborderpoint,thenwecanclusterdatasetsbasedonlocaldatadistribution.Inthisway,multi-densityclusterscanbefound.Second,thealgorithmcandetectoutliersbymeasuringtheoutliernessofsomedataobjectusingrelativedensityfactorduringtheclusteringprocess.WehaveimplementedODBSN,RNBC,LOFandDBSCANalgorithmswithJava.Asshownintheexperimentalresults,ODBSN,RNBCalgorithmscancorrectlydiscoveroutliersandclustersrespectively,andthosetwoalgorithmsarebetterontheeffectivenessandefficiencythanthatofLOFandDBSCANrespectively.Keywords:DataMining,ClusteringAnalysis,OutlierDetection,SymmetricNeighborhood,r-NeighborhoodV2.1.......................................................................152.2.......................................................................................................212.3...............................................................................................253.1p1k-...............................................................................................273.2.......................................................................................................303.3Dataset1.................................................................................................................363.4Dataset2.................................................................................................................363.5Dataset3.................................................................................................................363.6ODBSNDataset1.................................................................363.7ODBSNDataset2.................................................................363.8ODBSNDataset3.................................................................363.9ODBSNLOF.......................................................................393.10k...................................................................................394.1...........................................................................................414.2Dataset1.................................................................................................................474.3Dataset2.................................................................................................................474.4Dataset3.................................................................................................................474.5RNBCDataset1.......................................................................474.6RNBCDataset2.......................................................................484.7RNBCDataset3.......................................................................484.8...........................................................................................................494.9DBSCAN..........................................................494.10RNBC.............................................................494.11RNBCDBSCAN...............................................................50VI3.1LOF..............................................................................................303.2LymphographyDataset....................................................................373.3LymphographyDataset...........................................373.4WisconsinBreastCancerDataset.....................................................373.5WisconsinBreastCancerDataset............................383.6..............................................................................