DataMining:ConceptsandTechniques2ndEditionSolutionManualJiaweiHanandMichelineKamberTheUniversityofIllinoisatUrbana-Champaignc°MorganKaufmann,2006Note:ForInstructors'referenceonly.Donotcopy!Donotdistribute!Contents1Introduction31.11Exercises.................................................32DataPreprocessing132.8Exercises.................................................133DataWarehouseandOLAPTechnology:AnOverview313.7Exercises.................................................314DataCubeComputationandDataGeneralization414.5Exercises.................................................415MiningFrequentPatterns,Associations,andCorrelations535.7Exercises.................................................536Classi¯cationandPrediction696.17Exercises.................................................697ClusterAnalysis797.13Exercises.................................................798MiningStream,Time-Series,andSequenceData918.6Exercises.................................................919GraphMining,SocialNetworkAnalysis,andMultirelationalDataMining1039.5Exercises.................................................10310MiningObject,Spatial,Multimedia,Text,andWebData11110.7Exercises.................................................11111ApplicationsandTrendsinDataMining12311.7Exercises.................................................1231Chapter1Introduction1.11Exercises1.1.Whatisdatamining?Inyouranswer,addressthefollowing:(a)Isitanotherhype?(b)Isitasimpletransformationoftechnologydevelopedfromdatabases,statistics,andmachinelearning?(c)Explainhowtheevolutionofdatabasetechnologyledtodatamining.(d)Describethestepsinvolvedindataminingwhenviewedasaprocessofknowledgediscovery.Answer:Dataminingreferstotheprocessormethodthatextractsor\minesinterestingknowledgeorpatternsfromlargeamountsofdata.(a)Isitanotherhype?Dataminingisnotanotherhype.Instead,theneedfordatamininghasarisenduetothewideavailabilityofhugeamountsofdataandtheimminentneedforturningsuchdataintousefulinformationandknowledge.Thus,dataminingcanbeviewedastheresultofthenaturalevolutionofinformationtechnology.(b)Isitasimpletransformationoftechnologydevelopedfromdatabases,statistics,andmachinelearning?No.Dataminingismorethanasimpletransformationoftechnologydevelopedfromdatabases,sta-tistics,andmachinelearning.Instead,datamininginvolvesanintegration,ratherthanasimpletransformation,oftechniquesfrommultipledisciplinessuchasdatabasetechnology,statistics,ma-chinelearning,high-performancecomputing,patternrecognition,neuralnetworks,datavisualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.(c)Explainhowtheevolutionofdatabasetechnologyledtodatamining.Databasetechnologybeganwiththedevelopmentofdatacollectionanddatabasecreationmechanismsthatledtothedevelopmentofe®ectivemechanismsfordatamanagementincludingdatastorageandretrieval,andqueryandtransactionprocessing.Thelargenumberofdatabasesystemso®eringqueryandtransactionprocessingeventuallyandnaturallyledtotheneedfordataanalysisandunderstanding.Hence,dataminingbeganitsdevelopmentoutofthisnecessity.(d)Describethestepsinvolvedindataminingwhenviewedasaprocessofknowledgediscovery.Thestepsinvolvedindataminingwhenviewedasaprocessofknowledgediscoveryareasfollows:²Datacleaning,aprocessthatremovesortransformsnoiseandinconsistentdata²Dataintegration,wheremultipledatasourcesmaybecombined34CHAPTER1.INTRODUCTION²Dataselection,wheredatarelevanttotheanalysistaskareretrievedfromthedatabase²Datatransformation,wheredataaretransformedorconsolidatedintoformsappropriateformining²Datamining,anessentialprocesswhereintelligentande±cientmethodsareappliedinordertoextractpatterns²Patternevaluation,aprocessthatidenti¯esthetrulyinterestingpatternsrepresentingknowl-edgebasedonsomeinterestingnessmeasures²Knowledgepresentation,wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser1.2.Presentanexamplewheredataminingiscrucialtothesuccessofabusiness.Whatdataminingfunctionsdoesthisbusinessneed?Cantheybeperformedalternativelybydataqueryprocessingorsimplestatisticalanalysis?Answer:Adepartmentstore,forexample,canusedataminingtoassistwithitstargetmarketingmailcampaign.Usingdataminingfunctionssuchasassociation,thestorecanusetheminedstrongassociationrulestodeterminewhichproductsboughtbyonegroupofcustomersarelikelytoleadtothebuyingofcertainotherproducts.Withthisinformation,thestorecanthenmailmarketingmaterialsonlytothosekindsofcustomerswhoexhibitahighlikelihoodofpurchasingadditionalproducts.Dataqueryprocessingisusedfordataorinformationretrievalanddoesnothavethemeansfor¯ndingassociationrules.Similarly,simplestatisticalanalysiscannothandlelargeamountsofdatasuchasthoseofcustomerrecordsinadepartmentstore.1.3.SupposeyourtaskasasoftwareengineeratBig-Universityistodesignadataminingsystemtoexaminetheiruniversitycoursedatabase,whichcontainsthefollowinginformation:thename,address,andstatus(e.g.,undergraduateorgraduate)ofeachstudent,thecoursestaken,andtheircumulativegradepointaverage(GPA).Describethearchitectureyouwouldchoose.Whatis