《数据仓库与数据挖掘》第9章

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

2019年8月1日星期四DataMining:ConceptsandTechniques1第7章:分类和预测Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2019年8月1日星期四DataMining:ConceptsandTechniques2Classification:predictscategoricalclasslabels(discreteornominal)classifiesdata(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdataPrediction:modelscontinuous-valuedfunctions,i.e.,predictsunknownormissingvaluesTypicalApplicationscreditapprovaltargetmarketingmedicaldiagnosistreatmenteffectivenessanalysisClassificationvs.Prediction2019年8月1日星期四DataMining:ConceptsandTechniques3Classification—ATwo-StepProcessModelconstruction:describingasetofpredeterminedclassesEachtuple/sampleisassumedtobelongtoapredefinedclass,asdeterminedbytheclasslabelattributeThesetoftuplesusedformodelconstructionistrainingsetThemodelisrepresentedasclassificationrules,decisiontrees,ormathematicalformulaeModelusage:forclassifyingfutureorunknownobjectsEstimateaccuracyofthemodelTheknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodelAccuracyrateisthepercentageoftestsetsamplesthatarecorrectlyclassifiedbythemodelTestsetisindependentoftrainingset,otherwiseover-fittingwilloccurIftheaccuracyisacceptable,usethemodeltoclassifydatatupleswhoseclasslabelsarenotknown2019年8月1日星期四DataMining:ConceptsandTechniques4ClassificationProcess(1):ModelConstructionTrainingDataNAMERANKYEARSTENUREDMikeAssistantProf3noMaryAssistantProf7yesBillProfessor2yesJimAssociateProf7yesDaveAssistantProf6noAnneAssociateProf3noClassificationAlgorithmsIFrank=‘professor’ORyears6THENtenured=‘yes’Classifier(Model)2019年8月1日星期四DataMining:ConceptsandTechniques5ClassificationProcess(2):UsetheModelinPredictionClassifierTestingDataNAMERANKYEARSTENUREDTomAssistantProf2noMerlisaAssociateProf7noGeorgeProfessor5yesJosephAssistantProf7yesUnseenData(Jeff,Professor,4)Tenured?2019年8月1日星期四DataMining:ConceptsandTechniques6Supervisedvs.UnsupervisedLearningSupervisedlearning(classification)Supervision:Thetrainingdata(observations,measurements,etc.)areaccompaniedbylabelsindicatingtheclassoftheobservationsNewdataisclassifiedbasedonthetrainingsetUnsupervisedlearning(clustering)TheclasslabelsoftrainingdataisunknownGivenasetofmeasurements,observations,etc.withtheaimofestablishingtheexistenceofclassesorclustersinthedata2019年8月1日星期四DataMining:ConceptsandTechniques7第7章:分类和预测Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2019年8月1日星期四DataMining:ConceptsandTechniques8IssuesRegardingClassificationandPrediction(1):DataPreparationDatacleaningPreprocessdatainordertoreducenoiseandhandlemissingvaluesRelevanceanalysis(featureselection)RemovetheirrelevantorredundantattributesDatatransformationGeneralizeand/ornormalizedata2019年8月1日星期四DataMining:ConceptsandTechniques9Issuesregardingclassificationandprediction(2):EvaluatingClassificationMethodsPredictiveaccuracySpeedandscalabilitytimetoconstructthemodeltimetousethemodelRobustnesshandlingnoiseandmissingvaluesScalabilityefficiencyindisk-residentdatabasesInterpretability:understandingandinsightprovidedbythemodelGoodnessofrulesdecisiontreesizecompactnessofclassificationrules2019年8月1日星期四DataMining:ConceptsandTechniques10第7章:分类和预测Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2019年8月1日星期四DataMining:ConceptsandTechniques11TrainingDatasetageincomestudentcredit_ratingbuys_computer=30highnofairno=30highnoexcellentno31…40highnofairyes40mediumnofairyes40lowyesfairyes40lowyesexcellentno31…40lowyesexcellentyes=30mediumnofairno=30lowyesfairyes40mediumyesfairyes=30mediumyesexcellentyes31…40mediumnoexcellentyes31…40highyesfairyes40mediumnoexcellentnoThisfollowsanexamplefromQuinlan’sID32019年8月1日星期四DataMining:ConceptsandTechniques12Output:ADecisionTreefor“buys_computer”age?overcaststudent?creditrating?noyesfairexcellent=3040nonoyesyesyes30..402019年8月1日星期四DataMining:ConceptsandTechniques13AlgorithmforDecisionTreeInductionBasicalgorithm(agreedyalgorithm)Treeisconstructedinatop-downrecursivedivide-and-conquermannerAtstart,allthetrainingexamplesareattherootAttributesarecategorical(ifcontinuous-valued,theyarediscretizedinadvance)Examplesarepartitionedrec

1 / 86
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功