关于数据挖掘中关联规则挖掘算法的研究

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

上海交通大学硕士学位论文关于数据挖掘中关联规则挖掘算法的研究姓名:马建庆申请学位级别:硕士专业:计算机应用技术指导教师:谢康林20040101AgrawalKDDAprioriRInterest4RHRSC-RCRHSAprioriIUAIUAFT-treeTheAssociationrulethatwasfirstlyadvancedbyAgrawalisthefocusofKDDresearchandstillahottopicamongthesespecialistswhodevotetodataminingresearchtoday.Basingonreadingandanalyzinganumberofreferencebooksandthesesthatinvolvetheassociationrulesminingalgorithm,Iformulateseveralalgorithmsofassociationrulesminingandthenanalyzetheircorrectionandefficiency.Atfirst,afteranalyzingtheAprioriAlgorithmanditsapplication,wefindsomeproblemsofAprioriAlgorithm.Tosolvetheseproblems,weintroducetheconceptofinteresttoeliminatethattheseuseless,evenmisleadingrulesareminedanddefinetheassociationrulesagain.Theformuladesignofinterestisrelativelyreasonableandfeasiblebecausetheinterestformula-RInterest=4RHRSC-*(RC+RHS)considersthefactorsofRCandRHSvaluescomprehensively.Consideringtheusers’interest,aneffectiveassociationrulesalgorithmisdesignedbasedonAporiorialgorithmanduserinterest.Inaddition,wealsodiscusshowtopartitiontheconceptlayersandsolvetheproblemsthatareraisedbythiskindofpartition.Secondly,webegintodiscusstheweightedassociationrulesandillustratefourweighedassociationrulesminingalgorithm:theminingalgorithmbasingontheseparatingsupportandweightedvalue;theweightedalgorithmbasingonApriori;theparallelalgorithm;andtheweightedalgorithmbasingonFP-growth.Ofcourse,thetimecomplexitiesofthefouralgorithmsarealsoanalyzed.Finally,thepaperdiscussestheadaptivealgorithmsrespectivelywhentheminimumsupportandconfidencethreshold,andthetransactionnumberofdatabasevary.Accordingtotheanalysisforthesealgorithms,theperformance-improvingalgorithmsofIUAandthealgorithmoffrequentitemsetsminingbasingonFT-treeareeffectivetodealwiththesevarieties.Inconclusion,thispapermainlydevelopsseveralassociationrulesalgorithmsthatinvolvetheconceptofinterestandweightedvalues,paralleldesignandalgorithmanalysis.KEYWORDS:associationrules,algorithm,datamining,interest,weightedassociationrules120041182kCkC1-kL∪∪RInterest4RHRSC-RCRHSRHRSC-RHRSC+1i2imi1t2tntit⊆≤≤1p∧2p∧∧np→1q∧2q∧∧mq1p2pnp⊂1q2qmq⊂1p2pnp∩1q2qmqf1L1-kLkC1-kLtCkCtCkLkCkL1-kL1l1-kL2l1-kL1l2l1l2l1l2l1l2l1l2l1-kLkCkC1-kL∉1-kL1L2LrLkCkC1-kLkCkLkCkCkL_c_m_m∑row_c∑colcmc/∪coffeecmc/∪1p∧2p∧∧tp∧1+tp∧2+tp∧∧np→1q∧2q∧∧kq∧1+kq∧2+kq∧mqtp1+tpnp⊂Ikq1+kqmq⊂I1p2pnp∩1q2qmqfip∈jq∈'ip∉'jq∉1-≤≤tini≤≤'1-≤≤kjmj≤≤'→milkRC∪RHSRInterest4RHRSC-RCRHSRCRHSRHRSC-RInterestRCRHS1RcoffeeΛ1RC1RSHRS11RIRIRInterest2RΛ2RC2RSHRS22RI3RΛ3RC3RSHRS33RI1R2R2RCHRS23RCHRS32RI3RI2R2R3R2R1R1R2R1RcoffeeΛ1RC1RSHRS11RI2RΛ2RC2RSHRS22RI2R1R1-kLkC1L1L1L1L∪¬1L1-kL1l1-kL2l1-kL1l2l∧1l2l∧∧1l2l∧1l2l∧1l≠¬2l1l2l1-kLkCkC⇒∪∪∪≥≥ss≥≥s⇒⇒→→→→→→'1RI1RI2RI0I1RI4**1111RHRHRRSSCC-11RHRSC2RI4**2222RHRHRRSSCC-22RHRSC'1RC1RC2RC'1RHS1RHS2RHS'1RI4**'1'1'1'1RHRHRRSSCC-1RI2RI2**2121RHRHRRSSCC-2**2121RHRHRRSSCC-'1RI1RI2RI0I0A00AiA010+iAnA01R→00AiR→iA0nR→nA0iRRiCRiHSRiCRiHSRiCRiHSRiCRiHS0IRiCRiHS0IRiCRiHS0A'00A'01A00A01A03A04A05A'1R→'00A'2R→'01A'RiC'RiHSRiCRiHSHRS'1HRS1HRS2'1RC1RC2RCHRS'2HRS4HRS5HRS6'2RC4RC5RC6RC'RiI4''HRiRiSC-HRiRiSC''+→→01C01I→02C02I→03C03I→04C04I→→RC01C02C03C04C⇒RC⇒RCRI2)(RC2)(RHSRC01C02C03C04CRHSHS01HS02HS03HS04⇒RI01C02C03C04C2HS01HS02HS03HS04201CHS0102CHS0203CHS0304CHS04RI01C2HS01202C2HS02203C2HS03204C2HS042RI01I02I03I04I⇒RI⇒RI→'00A'00C'00I'00A'00A00A01A06A'00C'00I0A'00A'01A00A01A06A03A04A05ARiCRiHSiRRiCRiHSRiCRiHSRiCRiHSRiCRiHSxA0xA0xA1xA1xA2RiCRiHSabIabxgxfba--∫4)()(22iRRiCRiHSRiCRiHSRInterest4RHRSC-RCRHSRHRSC-RHRSC+5102105P910100230101i2ini1i2inijijW≤jW≤⇒∑∪∈YXIjjW))((supYXport∪∑∪∈YXIjjW))((supYXport∪≥⇒∪⇒⇒AprioriApriori11×××≥∑∈×XijWjTsupWmin×∑∈XijWj÷≥1ri2riqkri-∑∈YijWj∑-=qkjrjw1),(supWminkYWT×9.0)8.03.0(74.0++×kLkCkCjWfiCfiLf1CkC1-kCkCkCkCkLkC∪kL1C1-kC1-kCkCkCkC1-kC1-kCkCkCkCkL×××1C1C1C1-kCkC2CkC2L2C∪kL1F2F1F2FkLjW1Ff≤iCfiLf1C≤kC1-kCkLkC∪kLkCkC∉1-kC≤∈1-kC×≤1C×1F1C1L2F1F2FkL2f≤iCfiLf≤kC1-kCkLkC∪kL1F2F1F∪kL1C1-kC1-kCkC={1i2i…,mi}⊆.,..,1-XDiW×iaia∈XminsupXD,1-FP-tree4node-namenode-countnode-linknode-parentHtable,item-nameitem-headFP-tree1FP-tree(1)DFFwL;DwLTrans[p|P]p1P;pPTPPNpPTTNNpNNpT×134abcdefghij0.80.60.90.20.10.70.30.40.30.2klmnopqrst0.40.10.60.20.60.60.30.50.30.2aW×××wLwLWFPGWFPTWFPTWaa×iWia∈iaiaaiabWTreebWTreefbWTreeWFPTafffffffiaiabiabiaiaWFPTiaiabiabiabiabiaWFPTWFPG1a100a1a→→100a1a2a1a2a1a2a3a1a100aWFPGWFPGWFPGWFPGkL1m1m's'kL2m's's's1m'kL∈kL≥'sk∪'kL'skLkLkL''kL'kL'kLkL∪''kLkC's1i2iki∀≤≤ji∈kL1i2iki∀≤≤ji∈''kL1c2c1c∪2c1c∩2cf1c⊂kL2c⊂''kL'kL'kL1kL2kL3kL1kC2kC3kC1kC0kL1kL0kL∪kL'kL1kL∪2kL∪3kL1kL∩2kLf1kL∩3kLf2kL∩3kLf1kC2kC1kC11-kLkL2kC21-kL3kC3kC≤≤1jL3kCjitemp.1item2item3item1jL2jkL-∈3kC∉'1-kL3kCikLikL1+ikL's''1L'1L1L∪''1L'1-kLf≠1kC11-kLkL2kC21-kL3kCf≤3kC3kC∪1jL∈1tC1kC1kC1tC∈1tC1tC2tC2kC∈2tC3tC3kC∈3tC0kL∈1kC≥'s1kL0kL∪kL2kL∈2kC≥'s3kL∈3kC≥'s'kL1kL∪2kL∪3kLk∪'kL's's1LakC'1-kLkL1kC1kCakCkL3kL3kCkL3kC3kC1L''1L3kC's1C1L∈tC1C1CtC∈tCtC01L∈1C≥'s11L01L∪1L'1-kLf≠kC'1-kLkL∈kCkC∉'1-kLkC∈tCkC∈kC''kL∈kC≥'s'kL''kL∪kLk∪'kL's'skLkCkLkCkL'1-kL1-kL's3kC11-kL31-kL21-kL31-kL31-kL31-kL1kC2kC3kC1L''1L32C1L''1L32CfiI1LjI''1L32C32C∪iIjIiI1LjI''1L32C32C∪iIjI≥11-kL21-kL31-kL3kC3kC1.it

1 / 74
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功