基于周期性矿业的时间序列数据库约束研究(IJCNIS-V4-N10-4)

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

I.J.ComputerNetworkandInformationSecurity,2012,10,37-46PublishedOnlineSeptember2012inMECS()DOI:10.5815/ijcnis.2012.10.04Copyright©2012MECSI.J.ComputerNetworkandInformationSecurity,2012,10,37-46ConstraintBasedPeriodicityMininginTimeSeriesDatabasesDr.Ramachandra.V.Pujeri,G.M.KarthikVice-Principal,KGiSLInstituteofTechnology,Saravanampatti,Coimbatore-641035,TamilNadu,INDIAAssistantProfessor,CSEDept.,SACSMAVMMEngineeringCollege,Madurai-625301,TamilNadu,INDIAsriramu.vp@gmail.com,gmkarthik16@gmail.comAbstract—Thesearchfortheperiodicityintime-seriesdatabasehasanumberofapplication,isaninterestingdataminingproblem.Inrealworlddatasetaremostlynoisyandrarelyaperfectperiodicity,thisproblemisnottrivial.Periodicityisverycommonpracticeintimeseriesminingalgorithms,sinceitismorelikelytryingtodiscoverperiodicitysignalwithnotimelimit.WeproposeanalgorithmusesFP-treeforfindingsymbol,partialandfullperiodicityintimeseries.WedesignedthealgorithmcomplexityasO(kN),whereNisthelengthofinputsequenceandkislengthofperiodicpattern.Wehaveshownouralgorithmisfixedparametertractablewithrespecttofixedsymbolsetsizeandfixedlengthofinputsequences.Experimentresultsonbothsyntheticandrealdatafromdifferentdomainshaveshownouralgorithms‘timeefficientandnoise-resilientfeature.Acomparisonwithsomecurrentalgorithmsdemonstratestheapplicabilityandeffectivenessoftheproposedalgorithm.IndexTerms—DataMining,CBPM,FP-tree,Periodicitymining,Timeseriesdata,NoiseresilientI.INTRODUCTIONAcollectionofdataaregatheredandobservedatuniformintervaloftimetoreflectcertainbehaviorofanentity.Atimeseriesismostlydiscretizedbeforeitisanalyzed[8],[9],[13],[18],and[19].Severalexampleoftimeseriessuchasfrequentlysoldproductsinaretailmarket,frequentregularintervalpatterninDNAsequence,stockgrowth,powerconsumption,computernetworkfaultanalysis,transactionsinasuperstore,geneexpressiondataanalysis[7],[12],[22],[23]etc.Intheaboveexamples,weobservethattheoccurrenceperiodicityplaysanimportantroleindiscoveringsomeinterestingfrequentpatternsinawidevarietyofapplicationareas.Identifyingrepeating(periodic)patternscouldrevealimportantobservationsaboutthebehaviorandfuturetrendsofthecaserepresentedbythetimeseries[35],andhencewouldleadtomoreeffectivedecisionmaking.Thegoaloftimeseriesanalysisistofindwhetherandhowfrequentaperiodicpattern(fullorpartial)isrepeatedwithinthedata.Intimeseriesissaidtohavethreetypesofperiodicpatterns(symbol/Sequence/Segment)canbedetected[26].Forexample,intimeseriescontainthehourlynumberoftransactionsinretailstore;themappingdifferentrangesoftransactions(isreferredasdiscretizationprocess);a:{0}transactions,b:{1-300}transactions,c:{301-600}transactions,d:{601-1200}transactions,e:{1200}transactions.Basedonthismapping,thetimeseriesT‘=0,212,535,0,398,178,0,78,0,0,102,423canbediscretizedintoT=abdacbabaabc.AtleastonesymbolisrepeatedperiodicallyintimeseriesTisreferredasSymbolperiodicity.ForexampleT=abdacbabaabc,symbol‘a’isperiodicwithperiodicityp=3,startingatpositionzero.Sequenceperiodicorpartialperiodicpatternconsistsofmorethanonesymbol,maybeperiodicinatimeseries.ForexampleT=abdacbabaabc,symbol‘ab’isperiodicwithp=5startingatpositionzero.Inwholetimeseries,arepetitionofpatternorsegmentiscalledsegmentorfull-cycleperiodicity.ForexampleT=abdcabdcabdchassegmentperiodicityofp=5startingatpositionzero.Realtimeexamplesaremostlynotcharacterizedbyperfectperiodicityintimeseries.Atimeseriesissaidtohavethreetypeofperiodicpattern:1)symbolperiodicity,2)sequenceperiodicityorpartialperiodicpattern,and3)segmentorfull-cycleperiodicity[26].Thedegreesofperfectioncalculatedbyconfidence,andaremostlycharacterizedbythepresenceofnoiseinthedata.Manyexistingalgorithms[8],[9],[13],[17]detectsperiodsthatspanthroughentiretimeseries.Somealgorithmsdetectalltheabovementionedthreetypeofperiodicity,alongwithnoisewithinsubsectionoftimeseries,separatelyforeachpatterns[26].Comparedtothis,weshowthatourConstraintBasedPeriodicityMining(CBPM)techniqueismoreefficientandflexible.WealsodemonstratethroughempiricalevaluationthatCBPMismorescalableandfasterthanexistingmethods.Weproposeanewefficientpatternenumerationapproachonideasoffrequentpatternminingtechniques.First,weconstructaTRIE–likedatastructurecalledconsensustreewhichexploresthespaceofallmotifs,andenablesahighlyparallelizedsearchalongthetreemotif.Thegrowthofthetreeisrestrainedbyprovidingadditionalminingconstraints.Theconsensustreeisfixedandanchoredwithsymbolsetsizeandlengthofinputsequence.Theconstructionofconsensustreedetectssymbol,sequence,andsegmentpatternswithoutperiodicity,withinsubsectionoftheseries.The38ConstraintBasedPeriodicityMininginTimeSeriesDatabasesCopyright©2012MECSI.J.ComputerNetworkandInformationSecurity,2012,10,37-46additionalconstraint(namelyuser-specifiedlevelandruleconstraint)willpruneandeliminateredundantpatterns.Secondly,thealgorithmlooksforallperiodsstartingfromallpositionsavailableinaparticularnodeofconsensustree.Allthenodeoftheconsensustreeexistsbasedonconfidencegreaterthanorequaltotheuser-specifiedperiodicitythreshold.Wemakethef

1 / 10
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功