Email:jiangshengyi@163.comWeb2006332005182004152003152002620013200011TriggerManOpenCQ(GeorgiaTech.)Niagara-CQ(WisconsinMadison)CACQAurora(MIT/Brown/Brandies)AdaptiveCQTelegraphCQ(U.C.Berkeley)STREAM(Stanford)User/ApplicationUser/ApplicationRegisterRegisterQueryQueryStreamQueryProcessorResultsResultsScratchSpaceScratchSpace(Memoryand/orDisk)(Memoryand/orDisk)DataStreamManagementSystem(DSMS)2Sketchexponentialhistogram,EH/Stickysampling/lossycountingCountingBloomfilterfalsepositive3K-meansFayyadetal./Guhaetal.Hanetal.HeoffdingtreeVFDTGibbonsetal.VFDTCVFDTGibbonsetal.emsembleHanetal.4timeseriessequentialpattern…Muthukrishnanetal.changeGehrkeetal.burstShashaetal.,KleinbergShashaetal.……(1)2StandfordSTREAM(Stanfordstreamdatamanager)projectR.MotwaniUIUCMAIDSminingalarmingincidentsindatastreamC.AggarwalJ.Han(2)/countBloomfilter[CIKM’2003]falsepositive[VLDB’2004][DASFAA’2003][SDM'2006](burst)[DASFAA'2005][ICDE'2006](toolkit)(3)UIUCMAIDS~hanjCharacteristicsofDataStreamsDataStreamsDatastreams—continuous,ordered,changing,fast,hugeamountTraditionalDBMS—datastoredinfinite,persistentdatasetsdatasetsCharacteristicsHugevolumesofcontinuousdata,possiblyinfiniteFastchangingandrequiresfast,real-timeresponseRandomaccessisexpensive—singlelinearscanalgorithm(canonlyhaveonelook)StoreonlythesummaryofthedataseenthusfarMoststreamdataareatprettylow-levelormulti-dimensionalinnature,needsmulti-levelandmulti-dimensionalprocessingStreamDataApplicationsTelecommunicationcallingrecordsBusiness:creditcardtransactionflowsNetworkmonitoringandtrafficengineeringFinancialmarket:stockexchangeEngineering&industrialprocesses:powersupply&manufacturingSensor,monitoring&surveillance:videostreamsSecuritymonitoringWeblogsandWebpageclickstreamsMassivedatasets(evensavedbutrandomaccessistooexpensive)…………DBMSversusDSMSPersistentrelationsOne-timequeriesRandomaccess“Unbounded”diskstoreOnlycurrentstatemattersNoreal-timeservicesRelativelylowupdaterateDataatanygranularityAssumeprecisedataAccessplandeterminedbyqueryprocessor,physicalDBdesignTransientstreamsContinuousqueriesSequentialaccessBoundedmainmemoryHistoricaldataisimportantReal-timerequirementsPossiblymulti-GBarrivalrateDataatfinegranularityDatastale/impreciseUnpredictable/variabledataarrivalandcharacteristicsAck.FromMotwani’sPODStutorialslidesChallengesofStreamDataProcessingMultiple,continuous,rapid,time-varying,orderedstreamsMainmemorycomputationsQueriesareoftencontinuousEvaluatedcontinuouslyasstreamdataarrivesAnswerupdatedovertimeQueriesareoftencomplexBeyondelement-at-a-timeprocessingBeyondstream-at-a-timeprocessingBeyondrelationalqueries(scientific,datamining,OLAP)Multi-level/multi-dimensionalprocessinganddataminingMoststreamdataareatprettylow-levelormulti-dimensionalinnatureProcessingStreamQueriesQuerytypesOne-timequeryvs.continuousquery(beingevaluatedcontinuouslyasstreamcontinuestoarrive)Predefinedqueryvs.ad-hocquery(issuedon-line)UnboundedmemoryrequirementsForreal-timeresponse,mainmemoryalgorithmshouldbeusedMemoryrequirementisunboundedifonewilljoinfuturetuplesApproximatequeryansweringWithboundedmemory,itisnotalwayspossibletoproduceexactanswersHigh-qualityapproximateanswersaredesiredDatareductionandsynopsisconstructionmethodsSketches,randomsampling,histograms,wavelets,etc.MethodsforApproximateQueryAnsweringSlidingwindowsOnlyoverslidingwindowsofrecentstreamdataApproximationbutoftenmoredesirableinapplicationsBatchedprocessing,samplingandsynopsesBatchedifupdateisfastbutcomputingisslowComputeperiodically,notverytimelySamplingifupdateisslowbutcomputingisfastComputeusingsampledata,butnotgoodforjoins,etc.SynopsisdatastructuresMaintainasmallsynopsisorsketchofdataGoodforqueryinghistoricaldataBlockingoperators,e.g.,sorting,avg,min,etc.BlockingifunabletoproducethefirstoutputuntilseeingtheentireinputStreamDataMiningvs.StreamQueryingStreammining—AmorechallengingtaskinmanycasesItsharesmostofthedifficultieswithstreamqueryingButoftenrequiresless“precision”,e.g.,nojoin,grouping,sortingPatternsarehiddenandmoregeneralthanqueryingItmayrequireexploratoryanalysisNotnecessarilycontinuousqueriesStreamdataminingtasksMulti-dimensionalon-lineanalysisofstreamsMiningoutliersandunusualpatternsinstreamdataClusteringdatastreamsClassificationofstreamdataChallengesforMiningDynamicsinDataStreamsMoststreamdataareatprettylow-levelormulti-dimensionalinnature:needsML/MDprocessingAnalysisrequirementsMulti-dimensionaltrendsandunusualpatternsCapturingimportantchangesatmulti-dimensions/levelsFast,real-timedetectionandresponseComparingwithdatacube:SimilarityanddifferencesStream(data)cubeorstreamOLAP:Isthisfeasible?Canweimplementitefficiently?Multi-DimensionalStreamAnalysis:ExamplesAnalysisofWebclickstreamsRawdataatlowlevels:seconds,webpageaddresses,userIPaddresses,…Analystswant:changes,trends,unusualpatterns,atreasonablelevelsofdetailsE.g.,AverageclickingtrafficinNor