iDataMining:ConceptsandTechniquesTheSecondEditionJiaweiHanMichelineKamberUniversityofIllinoisatUrbana-ChampaignMorganKaufmannPublishers340PineStreet,SixthFloor,SanFrancisco,CA94104-3205,USA°2006AcademicPressAllrightsreservedPrintedintheUnitedStatesofAmericaNopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans|electronic,mechanical,photocopying,recording,orotherwise|withoutthepriorwrittenpermissionofthepublisher.iiToY.DoraandLawrenceforyourloveandencouragementJ.H.ToErik,Kevan,Kian,andMikaelforyourloveandinspirationM.K.Downloadat|scienti¯cdata,medicaldata,demographicdata,¯nancialdata,andmarketingdata.Peoplehavenotimetolookatthisdata.Humanattentionhasbecomeapreciousresource.So,wemust¯ndwaystoautomaticallyanalyzethedata,toautomaticallyclassifyit,toautomaticallysummarizeit,toautomaticallydiscoverandcharacterizetrendsinit,andtoautomatically°aganomalies.Thisisoneofthemostactiveandexcitingareasofthedatabaseresearchcommunity.Researchersinareassuchasstatistics,visualization,arti¯cialintelligence,andmachinelearningarecontributingtothis¯eld.Thebreadthofthe¯eldmakesitdi±culttograspitsextraordinaryprogressoverthelastfewyears.JiaweiHanandMichelineKamberhavedoneawonderfuljoboforganizingandpresentingdatamininginthisveryreadabletextbook.Theybeginbygivingquickintroductionstodatabaseanddataminingconceptswithparticularemphasisondataanalysis.Theyreviewthecurrentproducto®eringsbypresentingageneralframeworkthatcoversthemall.Theythencoverinachapter-by-chaptertourtheconceptsandtechniquesthatunderlieclassi¯cation,prediction,association,andclustering.Thesetopicsarepresentedwithexamples,atourofthebestalgorithmsforeachproblemclass,andpragmaticrulesofthumbaboutwhentoapplyeachtechnique.Ifoundthispresentationstyletobeveryreadable,andIcertainlylearnedalotfromreadingthebook.JiaweiHanandMichelineKamberhavebeenleadingcontributorstodataminingresearch.Thisisthetexttheyusewiththeirstudentstobringthemuptospeedonthe¯eld.The¯eldisevolvingveryrapidly,butthisbookisaquickwaytolearnthebasicideas,andtounderstandwherethe¯eldistoday.Ifounditveryinformativeandstimulating,andIexpectyouwilltoo.iiiDownloadat|OnWhatKindofData?.................................71.3.1RelationalDatabases........................................71.3.2DataWarehouses..........................................91.3.3TransactionalDatabases......................................111.3.4AdvancedDatabaseSystemsandAdvancedDatabaseApplications..............111.4DataMiningFunctionalities|WhatKindsofPatternsCanBeMined?................151.4.1Concept/ClassDescription:CharacterizationandDiscrimination...............161.4.2MiningFrequentPatterns,Associations,andCorrelations...................171.4.3Classi¯cationandPrediction...................................181.4.4ClusterAnalysis..........................................191.4.5OutlierAnalysis..........................................191.4.6EvolutionAnalysis.........................................201.5AreAllofthePatternsInteresting?....................................201.6Classi¯cationofDataMiningSystems..................................211.7DataMiningTaskPrimitives.......................................231.8IntegrationofaDataMiningSystemwithaDatabaseorDataWarehouseSystem.........251.9MajorIssuesinDataMining.......................................261.10Summary..................................................281.11Exercises..................................................291.12BibliographicNotes.............................................312DataPreprocessing352.1WhyPreprocesstheData?........................................352.2DescriptiveDataSummarization.....................................382.2.1MeasuringtheCentralTendency.................................38vDownloadat