5、商业智能核心技术及应用-数据挖掘

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

数据挖掘业务理解数据理解数据准备建模评估部署挖掘模型数据挖掘引擎预测需求数据挖掘引擎预测结果训练数据挖掘模型挖掘模型CubeHistoricalDatasetNewDatasetDataTransform(ETL)ReportingModelBrowsingPredictionLOBApplicationMiningModelsAnalysisServicesOLAP&DataMiningIntegrationServicesSQLServerRelationalEngineReportingServicesManagementToolsDevToolsVisualStudio.NetExcelOWCMapPointDataAnalyzerBalancedScoreCardSharePointPortalServerWindowsServerWindowsClientCREATEMININGMODELCreditRisk(CustIDLONGKEY,GenderTEXTDISCRETE,IncomeLONGCONTINUOUS,ProfessionTEXTDISCRETE,RiskTEXTDISCRETEPREDICT)USINGMicrosoft_Decision_TreesINSERTINTOCreditRisk(CustId,Gender,Income,Profession,Risk)SelectCustomerID,Gender,Income,Profession,RiskFromCustomersSelectNewCustomers.CustomerID,CreditRisk.Risk,PredictProbability(CreditRisk)FROMCreditRiskPREDICTIONJOINNewCustomersONCreditRisk.Gender=NewCustomer.GenderANDCreditRisk.Income=NewCustomer.IncomeANDCreditRisk.Profession=NewCustomer.Profession决策树聚类时间序列序列聚类关联Naïve贝叶斯神经网络逻辑回归线性回归文本挖掘•已知–性别–年龄–交通距离–收入–汽车数目–子女数目–客户类型(”好”、”坏”)•预测–潜在客户•贝叶斯(NaiveBayes)•决策树(DecisionTrees)•神经网络(NeuralNetworks)•聚类(Clustering)•……好客户55%Y45%N3512030256055504540234567年龄月薪(千元)决策树原理:谁是我们的好客户?好客户55%Y45%N好客户73%Y27%N好客户33%Y67%N351203025605550454023456735+35-月薪(千元)年龄年龄决策树原理:谁是我们的好客户?好客户55%Y45%N好客户87%Y13%N好客户33%Y67%N好客户17%Y83%N好客户67%Y33%N好客户73%Y27%N好客户33%Y67%N3525年龄月薪35+35-5-5+2+2-月薪(千元)年龄决策树原理:谁是我们的好客户?•贝叶斯(NaiveBayes)、神经网络(NeuralNetworks)、聚类(Clustering)……•更多参数可以设置……•挑战:如何判断哪个算法更适合?•LiftChart•ProfitChart•ClassificationMatrixSELECTFLATTENEDt.[CustomerKey],[TMDecisionTree].[BikeBuyer],(PredictProbability([TMDecisionTree].[BikeBuyer]))as[Prob]From[TMDecisionTree]PREDICTIONJOIN@InputRowsetAStON[TMDecisionTree].[MaritalStatus]=t.[MaritalStatus]AND[TMDecisionTree].[Gender]=t.[Gender]AND[TMDecisionTree].[YearlyIncome]=t.[YearlyIncome]AND[TMDecisionTree].[TotalChildren]=t.[TotalChildren]AND[TMDecisionTree].[NumberChildrenAtHome]=t.[NumberChildrenAtHome]AND[TMDecisionTree].[HouseOwnerFlag]=t.[HouseOwnerFlag]AND[TMDecisionTree].[NumberCarsOwned]=t.[NumberCarsOwned]AND[TMDecisionTree].[CommuteDistance]=t.[CommuteDistance]AND[TMDecisionTree].[Region]=t.[Region]AND[TMDecisionTree].[Age]=t.[Age]•依据过去预测未来•具有一定时间周期性的业务场景•Microsoft时序算法提供了一些针对连续值(例如一段时间内的产品销售额)预测进行了优化的回归算法。•影碟商店案例•会员制影碟商店•会员调查•“谁”买了“什么电影”•在历史数据中,快速找出产品之间的关联规则•可以处理海量数据•规则包括–一对一(AB的概率)–多对一(A,BC的概率)•找出经常同时出现的项集•画出关联网络CustIDGenderMaritalStatusEducationHomeOwnership980001MaleMarriedBachelorsRent980002MaleMarriedBachelorsOwn980003FemaleSingleMastersOwn980004MaleSingleSomeCollegeOwn980005FemaleMarriedBachelorsRent980006FemaleMarriedMastersRentCustIDMovie980001LordoftheRings980001Matrix980002StarTrek980002Terminator980002StarWars980003E.T980004StarWars980004SixthSense980004ABeautifulMind980005Hours980005Signs980006MoulinRouge980006DieHard980006ApocalypseNowCustIDGenderMaritalStatusEducationHomeOwnership980001MaleMarriedBachelorsRent980002MaleMarriedBachelorsOwn980003FemaleSingleMastersOwn980004MaleSingleSomeCollegeOwn980005FemaleMarriedBachelorsRent980006FemaleMarriedMastersRentLordoftheRingsMatrixStarTrekTerminatorStarWarsE.TStarWarsSixthSenseABeautifulMindHoursSignsMoulinRougeDieHardApocalypseNowMoviesAdomdConnectionconn=newAdomdConnection(DataSource=localhost\\sql2005;Catalog=MovieSample;IntegratedSecurity=SSPI);conn.Open();AdomdCommandcmd=conn.CreateCommand();cmd.CommandText=generateDMX();AdomdDataReaderdr=cmd.ExecuteReader();while(dr.Read()){suggestListBox.Items.Add(dr.GetString(0));}conn.Close();

1 / 37
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功