国内国际新闻社会生活新闻采编中心。。

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

MiningAssociationRulesPartIIbtbo精选网址DataMiningDatawarehousesandOLAP(OnLineAnalyticalProcessing.)AssociationRulesMiningClustering:HierarchicalandPartitionalapproachesClassification:DecisionTreesandBayesianclassifiersSequentialPatternsMiningAdvancedtopics:outlierdetection,webminingProblemStatementI={i1,i2,…,im}:asetofliterals,calleditemsTransactionT:asetofitemss.t.TIDatabaseD:asetoftransactionsAtransactioncontainsX,asetofitemsinI,ifXTAnassociationruleisanimplicationoftheformXY,whereX,YITheruleXYholdsinthetransactionsetDwithconfidencecifc%oftransactionsinDthatcontainXalsocontainYTheruleXYhassupportsinthetransactionsetDifs%oftransactionsinDcontainXYFindallrulesthathavesupportandconfidencegreaterthanuser-specifiedminsupportandminconfidenceProblemDecomposition1.Findallsetsofitemsthathaveminimumsupport(frequentitemsets)2.UsethefrequentitemsetstogeneratethedesiredrulesMiningFrequentItemsets:theKeyStepFindthefrequentitemsets:thesetsofitemsthathaveminimumsupportAsubsetofafrequentitemsetmustalsobeafrequentitemseti.e.,if{AB}isafrequentitemset,both{A}and{B}shouldbeafrequentitemsetIterativelyfindfrequentitemsetswithcardinalityfrom1tok(k-itemset)Usethefrequentitemsetstogenerateassociationrules.TheAprioriAlgorithmLk:Setoffrequentitemsetsofsizek(thosewithminsupport)Ck:Setofcandidateitemsetofsizek(potentiallyfrequentitemsets)L1={frequentitems};for(k=1;Lk!=;k++)dobeginCk+1=candidatesgeneratedfromLk;foreachtransactiontindatabasedoincrementthecountofallcandidatesinCk+1thatarecontainedintLk+1=candidatesinCk+1withmin_supportendreturnkLk;HowtoGenerateCandidates?SupposetheitemsinLk-1arelistedinorderStep1:self-joiningLk-1insertintoCkselectp.item1,p.item2,…,p.itemk-1,q.itemk-1fromLk-1p,Lk-1qwherep.item1=q.item1,…,p.itemk-2=q.itemk-2,p.itemk-1q.itemk-1Step2:pruningforallitemsetscinCkdoforall(k-1)-subsetssofcdoif(sisnotinLk-1)thendeletecfromCkHowtoCountSupportsofCandidates?Whycountingsupportsofcandidatesaproblem?ThetotalnumberofcandidatescanbeveryhugeOnetransactionmaycontainmanycandidatesMethod:Candidateitemsetsarestoredinahash-treeLeafnodeofhash-treecontainsalistofitemsetsandcountsInteriornodecontainsahashtableSubsetfunction:findsallthecandidatescontainedinatransactionHash-tree:searchGivenatransactionTandasetCkfindallofitsmemberscontainedinTAssumeanorderingontheitemsStartfromtheroot,useeveryiteminTtogotothenextnodeIfyouareataninteriornodeandyoujustuseditemi,thenuseeachitemthatcomesafteriinTIfyouareataleafnodechecktheitemsetsIsAprioriFastEnough?—PerformanceBottlenecksThecoreoftheApriorialgorithm:Usefrequent(k–1)-itemsetstogeneratecandidatefrequentk-itemsetsUsedatabasescanandpatternmatchingtocollectcountsforthecandidateitemsetsThebottleneckofApriori:candidategenerationHugecandidatesets:104frequent1-itemsetwillgenerate107candidate2-itemsetsTodiscoverafrequentpatternofsize100,e.g.,{a1,a2,…,a100},oneneedstogenerate21001030candidates.Multiplescansofdatabase:Needs(n+1)scans,nisthelengthofthelongestpatternMax-MinerMax-minerfindslongpatternsefficiently:themaximalfrequentpatternsInsteadofcheckingallsubsetsofalongpatterntrytodetectlongpatternsearlyScaleslinearlytothesizeofthepatternsMax-Miner:theideaf12341,21,31,42,32,43,41,2,31,2,41,3,41,2,3,42,3,4SetenumerationtreeofanorderedsetPruning:(1)setinfrequency(2)SupersetfrequencyEachnodeisacandidategroupgh(g)isthehead:theitemsetofthenodet(g)tail:anorderedsetthatcontainsallitemsthatcanappearinthesubnodesExample:h({1})={1}andt({1})={2,3,4}Max-minerpruningWhenwecountthesupportofacandidategroupg,wecomputealsothesupportforh(g),h(g)t(g)andh(g){i}foreachiint(g)Ifh(g)t(g)isfrequent,thenstopexpandingthenodegandreporttheunionasfrequentitemsetIfh(g){i}isinfrequent,thenremoveifromallsubnodes(justremoveifromanytailofagroupafterg)ExpandthenodegbyoneanddothesameThealgorithmMax-MinerSetcandidategroupsC{}SetofItemsetsF{Gen-Initial-Groups(T,C)}whileCnotemptydoscanTtocountthesupportofallcandidategroupsinCforeachginCs.t.h(g)Ut(g)isfrequentdoFFU{h(g)Ut(g)}SetcandidategroupsCnew{}foreachginCsuchthath(g)Ut(g)isinfrequentdoFFU{Gen-sub-nodes(g,Cnew)}CCnewremovefromFanyitemsetwithapropersupersetinFremovefromCanygroupgs.t.h(g)Ut(g)hasasupersetinFreturnFThealgorithm(2)Gen-Initial-Groups(T,C)scanTtoobtainF1,thesetoffrequent1-itemsetsimposeanorderingonitemsinF1foreachitemiinF1otherthanthegreatestitemsetdoletgbeanewcandidatewithh(g)={i}andt(g)={j|jfollowsiintheordering}CCU{g}returntheitemsetF1(antheCofcourse)Gen-sub-nodes(g,C)/*generationofnewitemsetsatthenextlevel*/removeanyitemifromt(g)ifh(g)U{i}isinfrequentreordertheitemsint(g)foreachiint(g)otherthanthegreatestdoletg’beanewcandidatewithh(g’)=h(g)U{i}andt(g’)={j|jint(g)andjisafteriint(g)}CCU{g’}returnh(g)U{m}wheremisthegreatestitemint(g)orh(g)ift(g)isemptyItemOrderingByre-orderingitemswetrytoincreasetheeffectivenessoffrequency-pruningVeryfrequentitemshavehigherprobabilitytobecontainedinlongpatternsPuttheseitemattheendoftheordering,sotheyappearinmanytailsMin

1 / 35
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功