华中科技大学硕士学位论文反馈式聚类算法研究姓名:罗如意申请学位级别:硕士专业:计算机应用技术指导教师:肖国强20080606IWebCUREMyEclipseCURE(20)CRMIIAbstractClusteranalysisisoneofthebasicmethodsindataanalysis.Itcanbeusedasanindependentdataminingtoolstoobtainthedistributionofdatas,orasapretreatmentstepforotherdataminingalgorithms.Therefore,ithasbroadprospectsinthedomainsofmarketorcustomersegmentation,patternrecognition,biology,spacedataanalysis,Webandotherdocumentsclassification,etc.Ithasbeenwidespreadconcernedbydomesticandforeignscholars,andfruitfulresearchresultshasachieved.However,conclusionfromthepracticalapplicationundercurrentsituationshowsthatmostclusteringalgorithmsconsideronlyhowtoconductaone-wayanalysisofrawdata,andtheeffectisoflittlesatisfaction.Basedonthis,firstofall,acomprehensiveanalysisoftheexistingclusteringalgorithmsandtheirapplicationathomeandabroadismade,thentheirdeficienciesarepointedout.BasedonthetraditionalCUREalgorithm,theideaoffeedbackclusteringalgorithmisputforward,whichmeansfeedingtheresultdataofclusteranalysisbacktothetheinitialphaseofclusteranalysis,toadjusttheprocessofclusteranalysisbasedonfeedbackdata.Thisprocessofresultbacktotheclusteranalysisstageistobecontinuedforrefinement.Inthefeedbackclusteringalgorithm,conceptsoffeedbacksets,setsofrelationshipbetweentwofeedbacksetsandresultsetsareestablishedinnovatively,andtherelatedcharacteroftheseconceptsarestudied.Fourstepsofthefeedbackclusteringalgorithmaredefinedasresultsetsconstruction,resultsetsmerging,clusterinitializing,clusterpost-treatment,anddetailalgorithmineachstepisgiven.UsingMyEclipsedevelopmenttool,asimulationsystemisdesignedandimplemented.Basedonthesimulationsystem,experimentsaremadefromtheaspectsofalgorithmcomplexity,clusteringaccuracyoftheresults,andcheckrateofabnormaldataforanalysisandverification.ThecomparisonwithCUREalgorithmismade,too.Finally,thefeedbackclusteringalgorithmisusedinoneoftelecomoperators’scustomerrelationshipmanagementsystem(thetotalnumberofexperimentaldatasetsis200,000)forcustomersegment.Theresultsandanalysisoftheapplicationshowsthat,theIIIbasicperformanceofthesystemislittleaffected,whileitcancontinuefindingdataqualityissuesoftheoriginalCRMsystem,andintheclusterstageoverdisturbcustomerscanbeexcludedoutofthetargetcustomer.Sotheclusteringaccuracyoftheresultshasamarkedimprovement.Keywords:ClusterAnalysis,FeedbackClusteringAlgorithm,CustomerRelationshipManagement___“√”111.11.1.1(Clustering)()(Cluster)[1]Web[2,3,4][5]1.1.2(CRM)CRMCRM(provideCorrectServicetoCorrectPersonthroughCorrectChannelinCorrectTime)[6]1()[7]2[8][9]2[10]()[11](1)[12](2)3(3)[13]1.21.2.1[14,15]40K-K-CLARANSBIRCHCURE[16,17,18]1XnKK-K-CLARACLARANS[19,20]4(1)K-KMaCQuenKnKO(nkt)(t)K[21](2)K-KK-(3)ClaraClara(ClusteringLARgeApplications)K-ClaraO(ks2+k(n-k))s[22](4)ClaransClarans(ClusteringLargeApplicationsbaseduponRANdomizedSearch)ClaraClaraClaraClaransClaraO(n2)52[23]BIRCHCURE(1)BIRCHBIRCH(BalancedIterativeReducingandClusteringusingHierarchies)(CF)BT[24]BIRCHO(n)T(2)CURECURE(ClusteringUsingREprisentatives)Guha1998[25]O(n2)3DBSCANOPTICSDENCLUE(1)DBSCANDBSCAN(DensityBasedSpatialClusteringofApplicationswithNoise)εεMinpts6[26]εMinptsR*-O(nlogn)O(n2)(2)OPTICSOPTICS(OrderingPointToIdentifytheClusteringStructure)DBSCANεMinpts[27]OPTICSDBSCAN(3)DENCLUEDENCLUE(DENsitybasedCLUstEring)DENCLUEδζ4(cell)STINGWaveClusterCLIQUE(1)STINGSTING(STatisticalINformationGrid)[28]O(n)O(g)g7(2)WaveClusterWaveCluster(ClusteringUsingWaveletTransformation)O(n)[29](3)CLIQUECLIQUE(CLusteringInQUEst)kk-1CLIQUE5[30,31,32](1)COBWEBCOBWEBCOBWEB(2)(CompetitiveLearning)RuxnelhartZipser()8(3)SOFM(SOFM,SelfOrganizingFeatureMaps)()(HardClustering)[33]1.2.2[34][35][36][37]c[38]9KKCRM[39]K-Means[40]1.2.3BIRCHDBSCANCURE(1)ETL(2)1.310(1)(2)CURECURE(3)CURE(4)CRMCURE1122.1AX()Xi=(Xi1…Xid)ЄAXi()XijXN*dXNXi(i=1…,N)XkCm(m=1…k)CnX,[41,42]≠Φ==)(CjCi...21jiCCCCXnkIUUUU(2.1)Cm2.22.2.1nXi(i=12…n)pijXijnpn*pnpnnppXXXXXXXXX.....................212222111211(2.2)[43]nn*n0...)2,()1,(.........0)1,2(0ndndd(2.3)122.2.2pnPnPXi=(Xi1Xi2…Xip)TXj=(Xj1Xj2…Xjp)T[44](Euclid)2112),(−=−=∑=pkjkikjiXXXXXjXid(2.4).(1)jiXX−=0Xi=Xj(2)ij0≥−jiXX(3)ijjiXX−=ijXX−(4)ijkkjjikiXXXXXX−+−≤−2.2.3GnXi(i=12…n)∑==niiXNm11(2.5)∑=−−=niTiiGmXmXA1))(((2.6)1−=NASGG(2.7)13)),((,i1XjXidMAXDnjG≤≤=(2.8)∑=−−=niiTiGmXmXD1)()((2.9)2.3CURE(1)CURECURE(2)CURECURE(3)CURECRMCURECUREKSS1CURE2.114HeapKD-TreeKD-TreeKD-Tree()K-DTQSKuvwQu,v,Tu,v,wQQxxwwxXuv?XxwXwwQQxTwQx2.1CURE22.215cα0α1ααα=1ww.meanCentroid-baseapproachα=0All-pointsapproach2.2CURE16cc34pn/pn/pqq1CUREFirstpass)log)1((O22pnqqpn−(2.10)Secondpass)log(O22qnqn(2.11)pqn/pqk2~35CUREc6CURE(1)17(2)Fraction()(1/3)2.4CURECURECURE1833.13.1.1U,vUU={u|uЄNodes}v1U={u|uЄerrNodes}19U2()U={u|uЄClusterN}U3.1.2A,B,rABPr1}ClusterNaClusterNbClusterN,bClusterNaB,bA,a|PB,A{∈→∈∈→∈∈∀∈∀∈3.1rArB,BrC→ArC(1)ClusterNbClusterNaArBBbAa∈→∈∴∈∀∈∀Q,,(2)ClusterNcClusterNbBrCCcBb∈→∈∴∈∀∈∀Q,,20(3)(1)(2)ClusterNcClusterNaCcAa∈→∈∈∀∈∀,,(4)ClusterNaClusterNcCcAa∈→∈∈∀∈∀,,rArC2}ClusterNaClusterNbClusterN,bClusterNaB,bA,a|PB,A{∉→∈∉→∈∈∀∈∀∈3.2rtA