Research Advisor

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

FreeParallelDataMiningbyBinLiAdissertationsubmittedinpartialfulllmentoftherequirementsforthedegreeofDoctorofPhilosophyDepartmentofComputerScienceNewYorkUniversitySeptember,1998Approved:ProfessorDennisShashaResearchAdvisorcBinLiAllRightsReserved,1998Tothememoryofmyfather,YongnianLiTomymother,YanyunSuAndtoNatalieiiiAcknowledgmentsIwishtoexpressmydeepestgratitudetomyadvisor,ProfessorDennisShasha,forhavingintroducedmersttotheeldofparallelcomputingandlatertothewonderfulworldofdatamining.IamenormouslyindebtedtoDennisforallhisguidance,encouragement,andsupportalongtheway.Withouthim,thisthesiswouldnothavebeenpossible.IamverygratefultoProfessorRalphGrishmanandProfessorVijayKaramchetiforreadingmythesisandgivingmemanygoodsuggestionsthatgreatlyimprovedthepresentationandenhancedtheclarityofmywork.IamdeeplyappreciativetoProfessorEdmondSchonbergandProfessorErnestDavisforservingonmythesiscommittee.ProfessorRalphGrishmanandProfessorEdmondSchonbergalsoservedonmyoralexamcommittee.Mysupervisoratwork,Mr.DavidM.Jacobs,wasatremendoushelpduringthenalmonths.HewasextremelyexibleaboutmyworkingscheduleandgavemetimeowhenIneededwithouthesitation.ThankstoKarpJeongforhelpingmewritemyrstPLindaprogram;thankstoPeterWyckoforhisexcellentworkonPLindawhichIgreatlybenetedfrom;thankstoPeterPiatkoformanyvaluablediscussionsaboutmywork;thankstoChin-YuanChengforworkingsohardtobringNyuMinerintoreality.ivIwouldliketothankallthepeoplewhoatonetimesharedanocewithmeatCourant:ChiYao,MartinGarciaKeller,LiddyShriver,PeterPiatko,ArashBaratloo,EugeneAgichtein,Tying-LuLiu.Theywereagreathelp.IthankallmyfriendsandfellowstudentswhomademylifeatCourantmoreenjoyable.Inparticular,thankstoSaugataBasu,LiangCao,Chao-yuCheng,Churng-weiChu,XianghuiDuan,RonEven,ZhijunLiu,Hsing-kuoPao,LaxmiParida,JihaiQiu,SurenTalla,HanpingXu,YuanyuanZhao.IwouldalsoliketothankDr.JuZhangforhisinterestinmyworkandhisencouragement.Finally,thankstomysisterDr.HongLiandherhusbandDr.JianmingCaofortheirloveandsupport.vAbstractFreeParallelDataMiningBinLiNewYorkUniversity,1998ResearchAdvisor:ProfessorDennisShashaDataminingistheemergingeldofapplyingstatisticalandarticialintelligencetech-niquestotheproblemofndingnovel,useful,andnon-trivialpatternsfromlargedatabases.Thisthesispresentsaframeworkforeasilyandecientlyparallelizingdataminingalgorithms.Weproposeanacyclicdirectedgraphstructure,explorationdag(E-dag),tocharacterizethecomputationmodelofdataminingalgorithmsinclassicationrulemining,associationrulemining,andcombinatorialpatterndiscovery.AnE-dagcanbeconstructivelyformedinparallelfromspecicationsofadataminingproblem,thenaparallelE-dagtraversalisperformedontheytoecientlysolvetheproblem.TheeectivenessoftheE-dagframeworkisdemonstratedinbiologicalpatterndiscoveryapplications.Wealsoexploredataparallelismindataminingapplications.Thecross-validationandthewindowingtechniquesusedinclassicationtreealgorithmsfacilitateeasyde-velopmentofecientdatapartitioningprograms.Inthisspirit,wepresentanewclassicationtreealgorithmcalledNyuMinerthatguaranteesthateverysplitinaclassicationtreeisoptimalwithrespecttoanygivenimpurityfunctionandanygivenmaximumnumberofbranchesallowedinasplit.NyuMinercanbeeasilyparallelizedviusingthedatapartitioningtechnique.Thisthesisalsopresentsasoftwarearchitectureforrunningparalleldataminingprogramsonnetworksofworkstations(NOW)inafault-tolerantmanner.Thesoft-warearchitectureisbasedonPersistentLinda(PLinda),arobustdistributedparallelcomputingsystemwhichautomaticallyutilizeidlecycles.TemplatesareprovidedforapplicationprogrammerstodevelopparalleldataminingprogramsinPLinda.Par-allelizationframeworksandthesoftwarearchitectureformasynergythatmakesfreeecientdataminingrealistic.viiTableofContentsDedicationiiiAcknowledgmentsivAbstractviListofFiguresxivListofTablesxixChapter1Introduction11.1WhyFreeParallelDataMining?......................11.2ApproachandContributions........................31.3OrganizationofThesis............................5Chapter2RelatedWork72.1ClassicationRuleMining..........................72.1.1AnExample.............................72.1.2ClassicationRules..........................92.1.3DecisionTrees............................9viii2.1.4BuildingDecisionTrees.......................102.1.5AttributeSelection..........................102.1.6ParallelizationAttempts.......................122.2AssociationRuleMining...........................142.2.1AnExample.............................142.2.2AssociationRules...........................152.2.3PropertiesofFrequentSets.....................152.2.4BasicSchemeforAssociationRuleMining.............162.2.5StateoftheArt...........................172.2.6ParallelizationAttempts.......................182.3PatternDiscoveryinCombinatorialDatabases..............192.3.1AnExample.............................192.3.2PatternDiscovery..........................192.3.3SequencePatternDiscovery.....................202.3.4SequencePatternDiscoveryAlgorithm...............202.4PlatformsforParallelDataMining.....................222.4.1NetworkedWorkstati

1 / 158
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功