上海交通大学硕士学位论文应用机器学习对保险公司历史订单整合进行的研究姓名:谢昉申请学位级别:硕士专业:工商管理指导教师:徐博艺20080601ASTUDYONHISTORICALORDERINTEGRATIONFORINSURANCECOMPANYABSTRACTDataminingisaimportanttechnologyimplementedinCustomerRelationshipManagementandMarketingResearch.Inindustriesthatfocusingonindividualorsmallbusinesscustomerslikeinsuranceandbanking,dataanalysisoncustomerdataplaysaninfluentialroleincorporationstrategyaswellasmarketingplan.However,itshardtominethedataofmassiveindividualcustomersorsmallbusinesscustomersespeciallyforthoseordersplacedbeforetheCRMsystemwasapplied.Inordertoimproveforecastofbusinessdata,theBusinessIntelligencesystemneedtodefinewhichordersbelongtooneexactcustomer.Onecustomercouldopenseveralaccountswithdifferentaddresses,phonenumbersandevennames.Thefirststepofcustomerdataprocessingistopre-processorderdatatoidentifywhichorders/accountsbelongtosamecustomer.Inthispaper,wewillusenaïveBayesianclassifier,DecisionTreeandNeuralNetworkstodomachinelearningwithordermatchingresult,andevaluatetheperformanceofthesethreeclassifiers.Thepaperwillprovidealgorithmintroduction,stepsformodelestablishing,performanceevaluationandresult.Thetoolsandmodelswillbeimplementedtoadatasetofaninsurancecompanyforrealbusinessapplication.Themodelsdiscussedcanbeimplementedtodifferentindustrieswithdifferentapplications,includingInternet,Banking,Insurance,etc.KEYWORDS:datamining,classifier,NaïveBayesian,decision,neuralnetwork200849__200849200849MBA1CRMCustomerRelationshipManagementCRMCRMCRMCRM10CRMCRMITMBA2CRMMISPOS/CRMMISPOSCRMEXCELEXCELCRMCRMCRM1.2CRM•1CRM•2CRM•3MBA3•IntelligentMinerIBMIntelligentMiner,IntelligentMinerforDataIntelligentMinerforTextIntelligentMinerforData,IntelligentMinerforTextWebLotusNotes•EnterpriseMinerSASEnterpriseMiner--------SASOLAP•SPSSClementineSPSSClementineSMART--CRISP-DMClementineLEVEL5QuestMineSet(SGI)PartekSE-LearnSPSSSnobAshrafAzmySuperQueryWINROSAXmdvTool®PresenceCare®PresenceCare®MBA4PresenceCare®PresenceCare®•••PresenceCare®CRMCRM•ABBPresenceCare®CRM••CRMPresenceCare®MBA5CRMLNameFNameMIMiddleName.StartDateEndDateStreetNoStreetNameCityStateZIPZIPExtHomePhoneWorkPhoneEmailAddDOBGenderMaritalStatusSmokerPCPNoPCPEmployer1MBA6Classifier1010,ïdCii=1,2,…MMP(Ci|d)dP(Ci)CidPd|Cidwjj=1,2,…NNdP(d|Ci)P(wj|Ci)CRM[1][2]PresenceCare®MBA7[15]PresenceCare®ID3C4.5•.•.••••MBA8ArtificialNeuralNetworks(ArtificialNeuralNetworks,ANN)[3]activationfunctionweightLevenberg-MarquardtCPUMBA9PresenceCare•••••••••••••••MBA10PresenceCare®PresenceCare1.PresenceCare®2.PresenceCare®CRM3.PresenceCare®100/110{(x1,y1),…,(xn,yn)}xXyYxiy0/10/1MBA111AccessA0B1Figure1PartofOrderMatchResult1IDCompIDx0IDMatchy,1=while0=.LNamex1FNamex2MIx3StartDatex4EndDatex5StreetNox6StreetNamex7MBA12IDCityx8Statex9ZIPx10ZIPExtx11HomePhonex12WorkPhonex13EmailAddx14DOBx15Genderx16MaritalStatusx17Smokerx18PCPNox19PCPEmployerx20Table2:Tableoffeatures2{(x1,1,x1,2,…,x1,20,y1),…,(xn,1,xn,2,…,xn,20,yn)}h:xy[x1,…,x20]XyY10)0(=yp)1(=yp)0|(=yxpi)1|(=yxpi1.FrequencyChart2.GoalValue0MBA13µ[4]3.µ)0|(=yxpi)1|(=yxpi)0(=yp)1(=ypPresenceCare®01•••Figure2SimplifiedDecisionTree201MBA14PresenceCare®011.entropy2.InformationGain3.4.)Pr(iixXp==1=H(X)=H(p1,p2,…,pn)=∑=−niiipp12log2H(X|Y)=H(X,Y)–H(X)3ipip10InformationGainGainRatioInformationGain=H(y)-H(y|xi)4GainRatio=InformationGain/SplitEntropy5MBA15x1Figure3H(x1|y)Calculation3H(x1|y)•)75497539,754910(82807549)731659,73172(8280731)|(1HHyxH×+×==0.05427•SplitEntropy=)82807549,8280731(H=0.4307•H(y)=08015.0)82808198,828082(=H•x1=H(y)–H(x1|y)=0.02588•=0.05427/0.4307=0.0601•x1=0•x1=1Y=1:72instancesY=0:659instancesY=1:10instancesY=0:7539instances10731of82807549of8280MBA163EXCELEmailAdd,x14FrequencyChartDecidetherootfeatureAffiliationTotal10EntropyInfogainSplitentropyGainratio82808281980.080152Lname173172659075491075390.054270.02588210.4307190.06009Fname19137384007367973580.0564150.02373680.5007170.047406MI182342781074574074170.0722890.00786290.4670780.016834StartDate13390433347048903948510.0797650.0003870.9761950.000396EndDate13346383308049344448900.0800480.0001040.9733020.000107StreetNo186756811074132673870.0662980.01385360.4837510.028638StreetName11662551607066182765910.0726640.00748830.7233750.010352City13325643261049551849370.0758440.00430780.9718620.004432State1828082819800000000ZIP11693601633065872265650.0708670.00928530.7307760.012706ZIPExt1991386081816981120.0760920.00405960.0935010.043417HomePhone187662814074042073840.0631090.01704340.4871070.034989WorkPhone11229661163070511670350.0645720.01558010.6058970.025714EmailAdd11104565081703781330.0541940.02595780.1018580.254843DOB15238144207757177560.0410170.03913480.339880.115143Gender1420982412704071040710.0704160.00973630.99980.009738MaritalStatus15840615779024402124190.0800995.311E-050.8746986.07E-05Smoker1663882655601642016420.0769760.00317560.7185340.00442PCPNO1000082808281980000Employer1329475321904986749790.0715760.00857620.9696650.008844Table3:Lookingfortherootnodeofthedecisiontree3MBA17[5]01011010431.2.3.δmomentumWeka0.3momentum=0.2MBA18IDCompIDx0MatchyLNamex1FNamex2MIx3StartDatex4EndDatex5StreetNox6Str