Chapter 5. Concept Description

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

2020年2月26日星期三DataMining:ConceptsandTechniques1DataMining:ConceptsandTechniques—SlidesforTextbook——Chapter5—DepartmentofComputerScienceAndEngineeringZhongmeiZhouEmailzzm@zju.edu.cn2020年2月26日星期三DataMining:ConceptsandTechniques2Chapter5:ConceptDescription:CharacterizationandComparisonWhatisconceptdescription?Datageneralizationandsummarization-basedcharacterizationAnalyticalcharacterization:AnalysisofattributerelevanceMiningclasscomparisons:DiscriminatingbetweendifferentclassesMiningdescriptivestatisticalmeasuresinlargedatabasesDiscussionSummary2020年2月26日星期三DataMining:ConceptsandTechniques3Descriptivevs.predictivedataminingFromadataanalysispointofview,dataminingcanbeclassifiedintotwocategories:descriptivedataminingandpredictivedatamining.descriptivedataminingdescribesthedatasetinaconciseandsummarativemannerandpresentsinterestinggeneralpropertiesofthedata;predictivedatamininganalyzesthedatainordertoconstructoneorasetofmodels,andattemptstopredictthebehaviorofnewdatasets.WhatisConceptDescription?Thesimplestkindofdescriptivedataminingisconceptdescription.Conceptdescription:Conceptdescriptiongeneratesdescriptionsforcharacterizationandcomparisonofthedata.Itissometimescalledclassdescription.Characterization:providesaconciseandsuccinctsummarizationofthegivencollectionofdataComparison(alsoknownasdiscrimination):providesdescriptionscomparingtwoormorecollectionsofdata2020年2月26日星期三DataMining:ConceptsandTechniques5ConceptDescriptionvs.OLAPConceptdescription:canhandlecomplexdatatypesoftheattributesandtheiraggregationsamoreautomatedprocessOLAP:restrictedtoasmallnumberofdimensionandmeasuretypesuser-controlledprocess2020年2月26日星期三DataMining:ConceptsandTechniques6Chapter5:ConceptDescription:CharacterizationandComparisonWhatisconceptdescription?Datageneralizationandsummarization-basedcharacterizationAnalyticalcharacterization:AnalysisofattributerelevanceMiningclasscomparisons:DiscriminatingbetweendifferentclassesMiningdescriptivestatisticalmeasuresinlargedatabasesDiscussionSummary2020年2月26日星期三DataMining:ConceptsandTechniques7DataGeneralizationandSummarization-basedCharacterizationDatageneralizationAprocesswhichabstractsalargesetoftask-relevantdatainadatabasefromalowconceptuallevelstohigherones.Approaches:Datacubeapproach(OLAPapproach)Attribute-orientedinductionapproach12345Conceptuallevels2020年2月26日星期三DataMining:ConceptsandTechniques8Characterization:DataCubeApproachDataarestoredindatacubeIdentifyexpensivecomputationse.g.,count(),sum(),average(),max()PerformcomputationsandstoreresultsindatacubesGeneralizationandspecializationcanbeperformedonadatacubebyroll-upanddrill-downAnefficientimplementationofdatageneralization2020年2月26日星期三DataMining:ConceptsandTechniques9DataCubeApproach(Cont…)Limitationscanonlyhandledatatypesofdimensionstosimplenonnumericdataandofmeasurestosimpleaggregatednumericvalues.Lackofintelligentanalysis,can’ttellwhichdimensionsshouldbeusedandwhatlevelsshouldthegeneralizationreach2020年2月26日星期三DataMining:ConceptsandTechniques10Attribute-OrientedInductionProposedin1989(KDD‘89workshop)Notconfinedtocategoricaldatanorparticularmeasures.Howitisdone?Collectthetask-relevantdata(initialrelation)usingarelationaldatabasequeryPerformgeneralizationbyattributeremovalorattributegeneralization.Applyaggregationbymergingidentical,generalizedtuplesandaccumulatingtheirrespectivecountsInteractivepresentationwithusers2020年2月26日星期三DataMining:ConceptsandTechniques11ExampleDMQL:DescribegeneralcharacteristicsofgraduatestudentsintheBig-UniversitydatabaseuseBig_University_DBminecharacteristicsas“Science_Students”inrelevancetoname,gender,major,birth_place,birth_date,residence,phone#,gpafromstudentwherestatusin“graduate”CorrespondingSQLstatement:Selectname,gender,major,birth_place,birth_date,residence,phone#,gpafromstudentwherestatusin{“Msc”,“MBA”,“PhD”}BasicPrinciplesofAttribute-OrientedInductionDatafocusing:task-relevantdata,includingdimensions,andtheresultistheinitialrelation.Attribute-removal:removeattributeAifthereisalargesetofdistinctvaluesforAbut(1)thereisnogeneralizationoperatoronA,or(2)A’shigherlevelconceptsareexpressedintermsofotherattributes.Attribute-generalization:IfthereisalargesetofdistinctvaluesforA,andthereexistsasetofgeneralizationoperatorsonA,thenselectanoperatorandgeneralizeA.Attribute-thresholdcontrol:typical2-8,specified/default.Generalizedrelationthresholdcontrol:controlthefinalrelation/rulesize.seeexamplep1862020年2月26日星期三DataMining:ConceptsandTechniques13Table5.1NameGenderMajorBirth-PlaceBirth_dateResidencePhone#GPAJimWoodmanMCSVancouver,BC,Canada8-12-763511MainSt.,Richmond687-45983.67ScottLachanceMCSMontreal,Que,Canada28-7-753451stAve.,Richmond253-91063.70LauraLee…F…Physics…Seattle,WA,USA…25-8-70…125AustinAve.,Burnaby…420-5232…3.83…RemovedRetainedSci,Eng,BusCountryAgerangeCityRemovedExcl,VG,..2020年2月26日星期三DataMining:ConceptsandTechniques14thegeneralizationproceedsasfollows:1.name:Sincetherearealargenumberofdistinctvaluesfornameandthereisnogeneralizationoperationdefin

1 / 51
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功