SPARK STREAMING

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSparkSQLSparkStreamingMLlib(machinelearning)GraphX(graph)SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW••SPARKSTREAMINGOVERVIEW••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW•••••••SPARKSTREAMINGOVERVIEW••••Kafkaprovidesseamlessintegrationbetweeninformationofproducersandconsumerswithoutblockingtheproducersoftheinformation,andwithoutlettingproducersknowwhothefinalconsumersare.•Eachconsumerkeepscontrolofitsownoffset(read)•OndemandtopiccreationSPARKSTREAMINGOVERVIEW•ETLandELT,widecatalogofsourcesandsinks•Flexibledesignoftopologiesandagentdeploymentstrategies.•Datatransformation,thankstointerceptors.••SPARKSTREAMINGOVERVIEWreadClobreadCSVreadLinereadMultiLinereadAvroreadJsonaddCurrentTimeaddLocalHostgeoIPfindReplaceSplitgenerateUUIDdecompressIfextractJsonPathsdetectMimeTypexqueryextractURIComponentsxsltGrok(regularexpressions)execspoolingloggerSPARKSTREAMINGOVERVIEW••••••••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWCASSANDRAKafkaSTRATIODEEPSTRATIODEEP•••••••SPARKSTREAMINGOVERVIEW•••••Shark(SQL)SparkStreamingMllib(machinelearning)GraphX(graph)SPARKSTREAMINGOVERVIEWRDD,whatisthat?SPARKSTREAMINGOVERVIEWRDD,whatisthat?SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW••SPARKSTREAMINGOVERVIEW••••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW?SPARKSTREAMINGOVERVIEWSparkStreaming:OverallviewSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSparkStreaming:OverallviewDiscretizedStreamorDStream.SPARKSTREAMINGOVERVIEWDiscretizedStreamorDStream.SPARKSTREAMINGOVERVIEWDiscretizedStreamorDStream.SPARKSTREAMINGOVERVIEWOverallviewSPARKSTREAMINGOVERVIEWInputDStreamsandReceivers.•Basic(distributedwithSparkStreaming).•Advanced(availableasdependency).SPARKSTREAMINGOVERVIEWBasicsources•FileStream.•Sockets.•Actors(Akka).•QueueRDDs(Testing).SPARKSTREAMINGOVERVIEWAdvancedsourcesSPARKSTREAMINGOVERVIEWDoItYourself•CodeonStart()•CodeonStop()•Codereceive()•CustomReceiverready!SPARKSTREAMINGOVERVIEW•map(func),flatMap(func),filter(func),count()•repartition(numPartitions)•union(otherStream)•reduce(func),countByValue(),reduceByKey(func,[numTasks])•join(otherStream,[numTasks]),cogroup(otherStream,[numTasks])•transform(func)•updateStateByKey(func)•window(windowLength,slideInterval)•countByWindow(windowLength,slideInterval)•reduceByWindow(func,windowLength,slideInterval)•reduceByKeyAndWindow(func,windowLength,slideInterval,[numTasks])•countByValueAndWindow(windowLength,slideInterval,[numTasks])•print()•foreachRDD(func)•saveAsObjectFiles(prefix,[suffix])•saveAsTextFiles(prefix,[suffix])•saveAsHadoopFiles(prefix,[suffix])SPARKSTREAMINGOVERVIEW•••••••••••SPARKSTREAMINGOVERVIEW••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW••SPARKSTREAMINGOVERVIEW••••••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW•••••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW•Statefultransformations(updateStateByKey,reduceByKeyAndWindow).•Asfault-tolerancemechanism,whendrivercrashes.HDFSismandatoryifyouaregoingtouseoperationsthatrequirescheckpointing.SPARKSTREAMINGOVERVIEWConfigurationparameters•spark.streaming.receiver.maxRate•spark.streaming.concurrentJobs•spark.streaming.receiver.writeAheadLogs.enable•spark.streaming.unpersistSPARKSTREAMINGOVERVIEWeachnodehasmutablestateandforeachrecordtheyhavetoupdatestate&sendnewrecordsSPARKSTREAMINGOVERVIEW•••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW•••••••SPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEWSPARKSTREAMINGOVERVIEW•••••SPARKSTREAMINGOVERVIEW

1 / 78
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功