77聚类和孤立点检测算法的研究与实现

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

南京航空航天大学硕士学位论文聚类和孤立点检测算法的研究与实现姓名:郑健申请学位级别:硕士专业:计算机应用技术指导教师:皮德常20071201I(ODBSN)r-(RNBC)ODBSNk-ODBSN(SNOF)RNBCDBSCANJavaODBSNRNBCLOFDBSCANODBSNRNBCLOFDBSCANr-IIAbstractDataminingtechniquescanbeusedtofindoutpotentialandusefulknowledgefromvastamountofdata.Withtherapiddevelopmentofthedataminingtechniques,clusteringanalysisandoutlierdetectionarewidelyappliedtothefieldsuchaspatternrecognition,dataanalysis,imageprocessing,andmarketresearch.Researchonclusteringanalysisandoutlierdetectionalgorithmshasbecomeahighlyactivetopicinthedataminingfield.Thisthesisintroducesthetheoryofdatamining,anddeeplyanalyzesthealgorithmsofclusteringandoutlierdetection.Basedontheanalysisofdensity-basedclusteringandoutlierdetectionalgorithms,wepresentOutlierDetectionalgorithmBasedonSymmetricNeighborhood(ODBSN)andr-NeighborhoodBasedClusteringalgorithm(RNBC).IntheODBSNalgorithm,weintroducetheconceptofreverseknearestneighbors.Basedonthisconcept,wedesignanoutlierdetectionalgorithmbasedonsymmetricneighborhoodtoimprovetheefficiencyofthedensitybasedoutlierdetectionalgorithms.TheODBSNalgorithmdoesnotneedtocomputethereachabledistanceandreachabledensity,sothecomputationcostcanbegreatlyreduced.Inthemeanwhile,outlierdetectionbasedonSymmetricNeighborhood-basedOutlierFactor(SNOF)alsomakestheoutliermoreaccurate.IntheRNBCalgorithm,weintroducetheconceptofrelativedensityfactor.Basedonthisconcept,wedesignanewdensity-basedclusteringalgorithm.ComparedwithclusteringalgorithmDBSCAN,thisalgorithmhastowadvantages:first,weusetherelativedensityfactortodistinguishthelocalcorepointfromlocalborderpoint,thenwecanclusterdatasetsbasedonlocaldatadistribution.Inthisway,multi-densityclusterscanbefound.Second,thealgorithmcandetectoutliersbymeasuringtheoutliernessofsomedataobjectusingrelativedensityfactorduringtheclusteringprocess.WehaveimplementedODBSN,RNBC,LOFandDBSCANalgorithmswithJava.Asshownintheexperimentalresults,ODBSN,RNBCalgorithmscancorrectlydiscoveroutliersandclustersrespectively,andthosetwoalgorithmsarebetterontheeffectivenessandefficiencythanthatofLOFandDBSCANrespectively.Keywords:DataMining,ClusteringAnalysis,OutlierDetection,SymmetricNeighborhood,r-NeighborhoodV2.1.......................................................................152.2.......................................................................................................212.3...............................................................................................253.1p1k-...............................................................................................273.2.......................................................................................................303.3Dataset1.................................................................................................................363.4Dataset2.................................................................................................................363.5Dataset3.................................................................................................................363.6ODBSNDataset1.................................................................363.7ODBSNDataset2.................................................................363.8ODBSNDataset3.................................................................363.9ODBSNLOF.......................................................................393.10k...................................................................................394.1...........................................................................................414.2Dataset1.................................................................................................................474.3Dataset2.................................................................................................................474.4Dataset3.................................................................................................................474.5RNBCDataset1.......................................................................474.6RNBCDataset2.......................................................................484.7RNBCDataset3.......................................................................484.8...........................................................................................................494.9DBSCAN..........................................................494.10RNBC.............................................................494.11RNBCDBSCAN...............................................................50VI3.1LOF..............................................................................................303.2LymphographyDataset....................................................................373.3LymphographyDataset...........................................373.4WisconsinBreastCancerDataset.....................................................373.5WisconsinBreastCancerDataset............................383.6..............................................................................

1 / 64
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功