Big Data Analytics Beyond Hadoop

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

BigDataAnalyticsBeyondHadoopThispageintentionallyleftblankBigDataAnalyticsBeyondHadoopReal-TimeApplicationswithStorm,Spark,andMoreHadoopAlternativesVijaySrinivasAgneeswaran,Ph.D.AssociatePublisher:AmyNeidlingerExecutiveEditor:JeanneGlasserLevineOperationsSpecialist:JodiKemperCoverDesigner:ChutiPrasertsithManagingEditor:KristyHartSeniorProjectEditor:LoriLyonsCopyEditor:CheriClarkProofreader:AnneGoebelSeniorIndexer:CherylLenserCompositor:NonieRatcliffManufacturingBuyer:DanUhrig©2014byVijaySrinivasAgneeswaranPearsonEducation,Inc.UpperSaddleRiver,NewJersey07458Forinformationaboutbuyingthistitleinbulkquantities,orforspecialsalesopportuni-ties(whichmayincludeelectronicversions;customcoverdesigns;andcontentparticulartoyourbusiness,traininggoals,marketingfocus,orbrandinginterests),pleasecontactourcorporatesalesdepartmentatcorpsales@pearsoned.comor(800)382-3419.Forgovernmentsalesinquiries,pleasecontactgovernmentsales@pearsoned.com.ForquestionsaboutsalesoutsidetheU.S.,pleasecontactinternational@pearsoned.com.Companyandproductnamesmentionedhereinarethetrademarksorregisteredtrade-marksoftheirrespectiveowners.ApacheHadoopisatrademarkoftheApacheSoftwareFoundation.Allrightsreserved.Nopartofthisbookmaybereproduced,inanyformorbyanymeans,withoutpermissioninwritingfromthepublisher.PrintedintheUnitedStatesofAmericaFirstPrintingApril2014ISBN-10:0-13-383794-7ISBN-13:978-0-13-383794-0PearsonEducationLTD.PearsonEducationAustraliaPTY,Limited.PearsonEducationSingapore,Pte.Ltd.PearsonEducationAsia,Ltd.PearsonEducationCanada,Ltd.PearsonEducacióndeMexico,S.A.deC.V.PearsonEducation—JapanPearsonEducationMalaysia,Pte.Ltd.LibraryofCongressControlNumber:2014933363ThisbookisdedicatedatthefeetofLordNataraja.ThispageintentionallyleftblankContentsForeword.....................................ixAbouttheAuthor..............................xviiChapter1Introduction:WhyLookBeyondHadoopMap-Reduce?...................................1HadoopSuitability................................3BigDataAnalytics:EvolutionofMachineLearningRealizations..............................9ClosingRemarks.................................17References......................................17Chapter2WhatIstheBerkeleyDataAnalyticsStack(BDAS)?.................................21MotivationforBDAS.............................21BDASDesignandArchitecture.....................26Spark:ParadigmforEfficientDataProcessingonaCluster.....................................31Shark:SQLInterfaceoveraDistributedSystem.......42Mesos:ClusterSchedulingandManagementSystem...46ClosingRemarks.................................52References......................................54Chapter3RealizingMachineLearningAlgorithmswithSpark.....................................61BasicsofMachineLearning........................61LogisticRegression:AnOverview...................67LogisticRegressionAlgorithminSpark...............70SupportVectorMachine(SVM)....................74PMMLSupportinSpark..........................79MachineLearningonSparkwithMLbase............90References......................................91viiiBIGDATAANALYTICSBEYONDHADOOPChapter4RealizingMachineLearningAlgorithmsinRealTime...................................93IntroductiontoStorm.............................93DesignPatternsinStorm.........................102ImplementingLogisticRegressionAlgorithminStorm.......................................107ImplementingSupportVectorMachineAlgorithminStorm.......................................110NaiveBayesPMMLSupportinStorm..............113Real-TimeAnalyticApplications...................116SparkStreaming................................124References.....................................126Chapter5GraphProcessingParadigms.....................129Pregel:Graph-ProcessingFrameworkBasedonBSP........................................130OpenSourcePregelImplementations...............134GraphLab.....................................138References.....................................156Chapter6Conclusions:BigDataAnalyticsBeyondHadoopMap-Reduce...........................161OverviewofHadoopYARN.......................162OtherFrameworksoverYARN....................165WhatDoestheFutureHoldforBigDataAnalytics?...166References.....................................169AppendixACodeSketches................................171CodeforNaiveBayesPMMLScoringinSpark.......171CodeforLinearRegressionPMMLSupportinSpark.......................................182PageRankinGraphLab..........................186SGDinGraphLab..............................191Index.......................................209ForewordOnepointthatIattempttoimpressuponpeoplelearningaboutBigDataisthatwhileApacheHadoopisquiteuseful,andmostcertainlyquitesuccessfulasatechnology,theunderlyingpremisehasbecomedated.Considerthetimeline:MapReduceimplementationbyGooglecamefromworkthatdatesbackto2002,publishedin2004.Yahoo!begantosponsortheHadoopprojectin2006.MRisbasedontheeconomicsofdatacentersfromadecadeago.Sincethattime,somuchhaschanged:multi-coreprocessors,largememoryspaces,10Gnetworks,SSDs,ands

1 / 48
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功