三峡大学硕士学位论文数据挖掘技术在电信行业客户流失分析中的应用研究姓名:田瑞申请学位级别:硕士专业:计算机应用技术指导教师:周学君20090401IISPSSClementineLIIIAbstractWiththereformofChinesetelecommunicationsystemandtheprogressofdataminingtechnology,thecompetitioninmobiletelecommunicationoperationisbecomingincreasinglyfiercerandtheimportanceofthedataminingisapprovedbymoreandmorepeople.Therefore,thetelecommunicationoperationcompaniesareurgenttofigureouthowtomaintaintheexistingcustomer.Inordertodoit,thetelecommunicationoperationcompaniesneedtopredictthepossibilityoftheclient-drainingbeforeclientsgiveuptheirservices.TheinvestigationintopresentpredictivesystemoftelecomchurnbasedonDataMiningisintroducedinthispaperandamethodanalysisistofindtheimpliedoperationregulationsaccordingtothemodelwhichisbasedontheknowndata,andthentomakethepredictionwiththerulesmentionedabovetoguidethedecision-making.Accordingtotheforecastforthelossofcustomersinthetelecommunicationsindustry,theexpertsandscholarsathomeandabroadhavecarriedoutagreatdealofresearchwork.However,thereismuchlessthantheexistingmethods,andthearticlewilladdresstheinadequacyofexistingmethodsforthelossofcustomerstoimprovethepredictionmodel.Inthispaper,theworkincludessomeparts:Firstofall,thetheoriesofdataminingtechnologyareintroducedandthedecisiontree’sarithmeticandartificialnervenetwork’sarithmeticareanalyzed.Secondly,basedonthepracticalsituationoftelecomcorporations,theimportanceofapplicationofDMisanalyzed,andthebasicdescriptionofpredictivesystemisgivenaccordingtothepracticalrequirement.Formostofexistingdata,miningmodelsofsingleareindisadvantage.Inordertotakefulladvantageofdecisiontree,theadvantagesofneuralnetworkalgorithmareproposedandtwoalgorithmsbasedonthispredictionofthehybridmodelforthelossofcustomersaresetup.Thirdly,thischapterdescribestheprocessofestablishingthepredicationmodelofPHScustomerchurningindetailsandfromtheevaluationmodelwithactualdata,itdemonstratesthatsuchapredictivemodelcanprovideacomparativelyaccuratepredictionofclientschurn.Finally,wedoaCaseStudywiththeuseofthemixedmodelforpredictionLbranchofShanxiTelecom,andcometothepreliminaryimpactofthelossofanumberofimportantfactors,sowecombinewiththeactualsituationofthecompany'sproposedIVmeasurestoretaincustomers.Theresultindicatesthattheforecastingmodelisscientificandpractical,andcanprovidethepredictiveinformationandthesolutionprojectfordecision-maker.Itcanbebelievedthatwiththeprogressofdataminingtechnology,morevaluableinformationofcustomerswillbediscoveredtodirectthecompany.Keywords:DataMiningCustomChurnPredictionMixedModelDecisionTreeNeuralNetworkI12090211.1[1]()[2][3]1425%25%315%50%34585%56670%760%88-102-39:121231.21999[4]25%30%48%4(BSS)[5][6]1.2.1MaddenetalISP[7]LeeandFeick[8]KimandKwon[9]Gerpottetal[10][11][12][13][14]1.2.2[15][16][17][18]Mozer[19]KIM[20][21][22]5121.3121.1341.3.11[23]2AB1.1631.3.21231.4CRISP-DMClementineL722.12.1.1[24][24]Zekulin,,[23]FerruzzaJonnParsay2.12090OLAP8KDDKDDKnowledgeDiscoveryinDatabase[24]KDD2.1.22.1||--------------------------------------------------------------||------|||OLAP//9[24][24][24]2.1.31[24]102[24]311[24]45122.1.4[25][26]1[27]2QuinlanID3[28]ID3ID3SchlimmerFisherID4[29]IBLE33BP[30]HopfiledARTKoholon134Apriori[31]C4.55RoughSetPawlak1982[32][33][34]2.22.2.1[35][36]()Building14Pruning[37]InformationTheoryGINILowestGINIindexID3C4.5CARTSLIQSPRINTCostComplexityPruningPessimisticPruningMDLMinimumDescriptionLengthC4.5Quilan1993C4.5C4.5[38]12.1[38]2()(((,)/||)log((,)/||))kiiInfoSfreqCSSfreqCSS=−×∑2.1(,)ifreqCSSiCk||SS2.2()((||/||)())xiiInfoTTTInfoT=−×∑2.2Tx152.3()()()xGainXInfoTInfoT=−2.392.12.124452348465130354322()39log3969log690.9184InfoT=−−=12222()49(12log1212log12)59(15log1545log45)0.9xInfoT=−−+−−=1()0.91840.90.0184Gainx=−=32222()39(13log1323log23)69(13log1323log23)0.9183xInfoT=−−+−−=3()0.91840.91830.0001Gainx=−=1x3x{232430354345464851}4822222()29(12log1212log12)79(27log2757log57)0.721xInfoT=−−+−−=2()0.91840.7210.1974Gainx=−=0.19742.2162.22.32.32.448484848172.44848212C4.5C5.0FF=/[38]()(()())xGainXFInfoTInfoT=−2.43TSTTSTSNEE=48=4818(;,)(!/[!()!])(1)ENEfENpNENEpp−=×−−2.5p(;,)fENpNE0(;,)EifENpa==∑aC4.5a0.25ENap'ENp=×*TT*TmS*T/1miE=∑*TS*E*/1miEE=≤∑[39]122.2.21ArtificialNeutralNetworkANNlink1515weight192.5[40]2.52W20[30]FeedForwardNetworkRecurrentNetwork[41]123Sigmoid1989RobertHecht-Neilsen42.2.32-2212.22.2.41CRISP-DMCRISP-DM[42]1999CRISP-DM1.0SPSSNCRCRISP-DMCRISP-DMCRISP-DMCRISP-DM2.6[43]2.6CRISP-DM222.62ClementineSPSSClementineSPSSClementineCRISP-DMClementine()ClementineCRISP-DM2.7ClementineCRISP-DM232.7ClementineCRISP-DMClementineCRISP-DMBusinessUnderstandingDataUnderstandingDataPreparationModelingEvaluationDeploymentClementine2.82.8Clementine11.1ClementineC5.0C&RTreeAprioriLogisticClementineClementine1SASSPSSExcel2324456SPSSSASExcelPublisher2533.13.1.1123.1.2STATEF0AF0XF0JF0M2.97262.967.89%0.33%13.75%3.29%14.74%1200273.2OLAPOnLineAnalyticalProcessing:12208034562872883.3/29304123454.11006031SCV1VIP4.14.24.34.44.1ADDRESS_IDPKNUMBER(9)NOTNULLPROVINCE_NAMEVARCHAR2(20)NULLCITY_NAMEVARCHAR2(40)NULLSTREET_NAMEVARCHAR2(40)NULLSTREET_NBRVA