IDataMining:ConceptsandTechniques3rdEditionSolutionManualJiaweiHan,MichelineKamber,JianPeiTheUniversityofIllinoisatUrbana-ChampaignSimonFraserUniversityVersionJanuary2,2012CMorganKaufmann,2011ForInstructorsreferencesonlyIIContentsChapter1Introduction......................................................................................................................11.1Exercises.............................................................................................................................1Chapter2GettingtoKnowYourData..............................................................................................42.1Exercises.............................................................................................................................4Chapter3DataPreprocessing.........................................................................................................123.1Exercises...........................................................................................................................12Chapter6MiningFrequentPatterns,Associations,andCorrelations:BasicConceptsandMethods........................................................................................................................................................206.1Exercises...........................................................................................................................20Chapter8Classification:BasicConcepts.......................................................................................248.1Exercises...........................................................................................................................24Chapter9Classification:AdvancedMetrods..................................................................................279.1Exercises...........................................................................................................................27Chapter10ClusterAnalysis:BasicConceptsandMethods...........................................................2810.1Exercises.........................................................................................................................281Chapter1Introduction1.1Exercises1.1Whatisdatamining?Inyouranswer,addressthefollowing:(a)Isitanotherhype?(hype英[haɪp]美[haɪp]n.天花乱坠的广告宣传vt.大肆宣传;夸张地宣传(某人或某事物))(b)Isitasimpletransformationorapplicationoftechnologydevelopedfromdatabases,statistics,machinelearning,andpatternrecognition?(c)Wehavepresentedaviewthatdataminingistheresultoftheevolutionofdatabasetechnology.Doyouthinkthatdataminingisalsotheresultoftheevolutionofmachinelearningresearch?Canyoupresentsuchviewsbasedonthehistoricalprogressofthisdiscipline?Address(Do)thesameforthefieldsofstatisticsandpatternrecognition.(d)Describethestepsinvolvedindataminingwhenviewedasaprocessofknowledgediscovery.Answer:Dataminingreferstotheprocessormethodthatextractsorminesinterestingknowledgepatternsfromlargeamountsofdata.(a)Isitanotherhype?Dataminingisnotanotherhype.Instead,theneedfordatamininghasarisenduetothewideavailabilityofhugeamountsofdataandtheimminent(英['ɪmɪnənt]美['ɪmɪnənt]adj.(通常指不愉快的事)即将发生的;迫切的,危急的;逼近的;迫在眉睫)needforturningsuchdataintousefulinformationandknowledge.Thus,dataminingcanbeviewedastheresultofthenaturalevolutionofinformationtechnology.(b)Isitasimpletransformationorapplicationoftechnologydevelopedfromdatabases,statistics,machinelearning,andpatternrecognition?No.Dataminingismorethanasimpletransformationoftechnologydevelopedfromdatabases,statistics,andmachinelearning.Instead,datamininginvolvesanintegration,ratherthanasimpletransformation,oftechniquesfrommultipledisciplinessuchasdatabasetechnology,statistics,machinelearning,high-performancecomputing,patternrecognition,neuralnetworks,data2visualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.(c)Explainhowtheevolutionofdatabasetechnologyledtodatamining.Databasetechnologybeganwiththedevelopmentofdatacollectionanddatabasecreationmechanismsthatledtothedevelopmentofefectivemechanismsfordatamanagementincludingdatastorageandretrieval,andqueryandtransactionprocessing.Thelargenumberofdatabasesystemsoferingqueryandtransactionprocessingeventuallyandnaturallyledtotheneedfordataanalysisandunderstanding.Hence,dataminingbeganitsdevelopmentoutofthisnecessity.(d)Describethestepsinvolvedindataminingwhenviewedasaprocessofknowledgediscovery.Thestepsinvolvedindataminingwhenviewedasaprocessofknowledgediscoveryareasfollows:zDatacleaning,aprocessthatremovesortransformsnoiseandinconsistent(英[ɪnkənsɪstənt]美[ɪnkənsɪstənt]adj.不一致的,不调和的;前后矛盾的,不合逻辑的;反复无常的;歧出)datazDataintegration,wheremultipledatasourcesmaybecombinedzDataselection,wheredatarelevanttotheanalysistaskareretrievedfromthedatabasezDatatransformation,wheredataaretransformedorconsolidatedintoformsappropriateforminingzDatamining,anessentialprocesswhereintelligentandefficientmethodsareappliedinordertoextractpatternszPatternevaluation,aprocessthatidentifiesthetrulyinterestingpatternsrepresentingknowledgebasedonsome