DataMiningTechniques1AbouttheCourse•Email:–dmnwpu@163.comDataMiningTechniques2Review(Ⅰ)•Whatisdatamining?–Dataminingisthetaskofdiscoveringinterestingpatternsfromlargeamountsofdata,wherethedatacanbestoredindatabases,datawarehouses,orotherinformationrepositories.Itisayounginterdisciplinaryfield,drawingfromareassuchasdatabasesystems,datawarehousing,statistics,machinelearning,datavisualization,informationretrieval,andhigh-performancecomputing.Othercontributingareasincludeneuralnetworks,patternrecognition,spatialdataanalysis,imagedatabases,signalprocessing,andmanyapplicationfields,suchasbusiness,economics,andbioinformatics.DataMiningTechniques3Review(Ⅱ)•KDD-knowledgediscoveryindatabases•Datamining—coreofknowledgediscoveryprocessdatacleaning,integration,andselectionDatabaseorDataWarehouseServerDataMiningEnginePatternEvaluationGraphicalUserInterfaceKnowledge-BaseDatabaseDataWarehouseWorld-WideWebOtherInfoRepositoriesDataMiningTechniques4Review(Ⅲ)•Defineeachofthefollowingdataminingfunctionalities:associationanalysis,classification,prediction,andclusteringanalysis.Giveexampleofeachdataminingfunctionality,usingareal-lifedatabasewithwhichyouarefamiliar.–Associationanalysis•showingattribute-valueconditionsthatoccurfrequentlyinagivensetofdata–Classification•findingasetofmodelsthatdescribeanddistinguishdataclassesorconcepts,forthepurposeofbeingabletousethemodeltopredicttheclassofobjectswhoseclasslabelisunknown–Clusteringanalysis•analyzingdataobjectswithoutconsultingaknownclasslabel–Outlieranalysis•findingdataobjectsthatdonotcomplywiththegeneralbehaviorormodelofthedataDataMiningTechniques5MiningFrequentPatterns,AssociationsDataMiningTechniques6Outline•Whatisassociationruleminingandfrequentpatternmining?•Methodsforfrequent-patternmining•Constraint-basedfrequent-patternmining•Frequent-patternmining:achievements,promisesandresearchproblemsDataMiningTechniques7MarketBasketAnalysisThisbasketcontainsanassortmentofproductsWhichgroupsorsetsofitemsarecustomerslikelytopurchaseonagiventripofthestore????MarketingbasketanalysisisaprocessthatanalyzescustomerbuyinghabitsResultscanbeusedinplanmarketingoradvertisingstrategies,orinthedesignofanewcatalog.DataMiningTechniques8WhatMarketBasketAnalysisCanHelp?•Customer:whotheyare?whytheymakecertainpurchase?•Merchandise:whichproductstendtobepurchasedtogether?Whicharemostamenabletopromotion?Doesabrandofproductsmakeadifference?•Usage:–Storelayout;–Productlayout;–Couponsissue;DataMiningTechniques9AssociationRulesfromMarketBasketAnalysisMethod:Transaction1:Frozenpizza,cola,milkTransaction2:Milk,potatochipsTransaction3:Cola,frozenpizzaTransaction4:Milk,pretzelsTransaction5:Cola,pretzelsFrozenPizzaMilkColaPotatoChipsPretzelsFrozenPizza21200Milk13111Cola21301PotatoChips01010Pretzels01102Hintsthatfrozenpizzaandcolamaysellwelltogether,andshouldbeplacedside-by-sideintheconveniencestore..Results:wecouldderivetheassociationrules:IfacustomerpurchasesFrozenPizza,thentheywillprobablypurchaseCola.IfacustomerpurchasesCola,thentheywillprobablypurchaseFrozenPizza.DataMiningTechniques10UseofRuleAssociations•Coupons,discounts–Don’tgivediscountson2itemsthatarefrequentlyboughttogether.Usethediscounton1to“pull”theother•Productplacement–Offercorrelatedproductstothecustomeratthesametime.Increasessales•Timingofcross-marketing–SendcamcorderoffertoVCRpurchasers2-3monthsafterVCRpurchase•Discoveryofpatterns–PeoplewhoboughtX,YandZ(butnotanypair)boughtWoverhalfthetimeDataMiningTechniques11WhatareFrequentPatterns?•Frequentpatterns:patterns(itemsets,subsequences,substructures,etc.)thatoccurfrequentlyinadatabase[AIS93]Forexample:–Asetofitems,suchasmilkandbread,thatappearfrequentlytogetherinatransactiondatasetisafrequentitemset–Asubsequence,suchasbuyingfirstaPC,thenadigitalcamera,andthenamemorycard,ifitoccursfrequentlyinashoppinghistorydatabase,isafrequentsequentialpattern–Asubstructure,canrefertodifferentstructuralforms,suchassubgraph,subtree,orsublattics.Ifasubstructureoccursfrequently,itiscalledafrequentstructuredpattern.DataMiningTechniques12Motivation•Frequentpatternmining:findingregularitiesindata–Whatproductswereoftenpurchasedtogether?–beeranddiapers?!–WhatarethesubsequentpurchasesafterbuyingaPC?–WhatkindsofDNAaresensitivetoanewdrug?–Canweautomaticallyclassifywebdocumentsbasedonfrequentkey-wordcombinations?DataMiningTechniques13WhyIsFreq.PatternMiningImportant?•Formsthefoundationformanyessentialdataminingtasks–Association,correlation,andcausalityanalysis–Sequential,structural(e.g.,sub-graph)patterns–Patternanalysisinspatiotemporal,multimedia,time-series,andstreamdata–Classification:associativeclassification–Clusteranalysis:frequentpattern-basedclustering–Datawarehousing:icebergcubeandcube-gradient–Semanticdatacompression:fascicles–Broadapplications:Basketdataanalysis,cross-marketing,catalogdesign,salecampaignanalysis,weblog(clickstream)analysis,…DataMiningTechniques14BasicConcepts•Iisthesetofitems{i1,i2,…id}•AtransactionTisasetofitems:T={ia,ib,…,it},.Eachtransactionisassociatedwithanidentifier,calledTID.•D,thetask-relevan