Serengeti-虚拟化你的大数据应用(VMWare)41

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

©2009VMwareInc.AllrightsreservedSerengeti-虚拟化你的大数据应用蔺永华Vmware,Inc.Agenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AToday’sBigDataSystem:ETLUnstructuredData(HDFS)RealTimeStructuredDatabaseBigSQLDataParallelBatchProcessingRealTimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AChallengesToUseHadoopinphysicalinfrastructureDeployment•Difficulttodeploy,costseveralpeopleforseveraldaysevenmonths•DifficulttotuneclusterperformanceLowEfficiency•Hadoopclustersaretypicallynot100%utilizedacrossallhardwareresources.•DifficulttoshareresourcessafelybetweendifferentworkloadSinglePointofFailure•SinglepointoffailureforNameNodeandJobtracker•NoHAforHive,HCatalog,etc.WhyVirtualizeHadoop?-GetyourHadoopclusterinminutes1/1000humanefforts,LeastHadoopoperationknowledgeFullyautomatedprocess,10minutestogetaHadoop/HBaseclusterfromscratchServerpreparationOSinstallationAutomatebySerengetionvSpherewithbestpracticeNetworkConfigurationHadoopInstallationandConfigurationManualprocess,costdaysWhyVirtualizeHadoop?-ConsolidatesprawlingclustersClustersshareserverswithstrongisolation•SingleHardwareInfrastructure•UnifiedoperationsOptimize•SharedResources=higherutilization•Elasticresources=fasteron-demandaccessHadoopDevHadoopProdHBaseClusterSprawlingSinglepurposeclustersforvariousbusinessapplicationsleadtoclustersprawl.ClusterConsolidationSimplifyFinanceHadoopVirtualizationPlatformHadoopDevHadoopProdHBase...PortalHadoopPortalHadoop30%CAPEXDown50%+resourcesaresittingidlewhilehighpriorityjobisburningupitscluster.Utilizeallresourcesfrompoolondemand.DynamicelasticscalingonsharedresourcepoolWhyVirtualizeHadoop?–Utilizeallyourresourcestosolvethepriorityproblem3XfastertogetanalyticresultsvSphereHighAvailability(HA)-protectionagainstunplanneddowntimeOverview•ProtectionagainsthostandVMfailures•Automaticfailuredetection(host,guestOS)•Automaticvirtualmachinerestartinminutes,onanyavailablehostincluster•OSandapplication-independent,doesnotrequirecomplexconfigurationchanges(Coordination)ZookeeprManagementServerHighAvailabilityfortheHadoopStack(HadoopDistributedFileSystem)HBase(Key-Valuestore)HDFSMapReduce(JobScheduling/ExecutionSystem)Pig(DataFlow)HiveBIReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalogMDBServerXXHAHAAppOSAppAppOSOSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX•Zerodowntime,zerodatalossfailoverforallvirtualmachinesincaseofhardwarefailures•IntegratedwithVMwareHA/DRS•Nocomplexclusteringorspecializedhardwarerequired•SinglecommonmechanismforallapplicationsandoperatingFTvSphereFaultToleranceprovidescontinuousprotectionOverview•SingleidenticalVMsrunninginlocksteponseparatehostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsinHadoopclustersAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AEasyandrapiddeploymentandmanagementOpensourceprojectlaunchedinJune2012,0.8isreleasedatApr.andwillrelease0.9atJun.ToolkitthatleveragevirtualizationtosimplifyHadoopdeploymentandoperationsDeployaclusterin10MinutesfullyautomatedCustomizeHadoopandHBaseclusterAutomatedclusteroperationComewitheco-systemcomponentsSupportallpopularHadoopDistributionsSerengetiDemo:10minutestoaHadoopclusterwithSerengetiAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&ACommonquestionsaboutvirtualizationLocalDisk•••••Canlocaldiskbeusedinvirtualizationenvironment?FlexibilityandScalabilityHowtoflexiblescheduleresourcesbetweenclustersanddifferentapplicationsasmentionedabove?DatastabilityInvirtualenvironment,howcanwedistributedataacrosshostandrack?DatalocalityHadoopwillschedulecomputetasksnearbythedata,toreducenetworkIOfordataR/W.Canvirtualenvironmentgetthesameresult?PerformanceHowabouttheperformanceinvirtualenvironment?Agenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&ACanIuselocaldiskeasily?OtherVMOtherVMOtherVMOtherVMOtherVMOtherVMOtherVMOtherVMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtendVirtualStorageArchitecturetoIncludeLocalDiskSharedStorage:SANorNAS•Easytoprovision•AutomatedclusterrebalancingHybridStorage•SANforbootimages,otherworkloads•LocaldiskforHadoop&HDFSHostHostHostHostHostHostHowtoflexiblescalein/scaleoutHowtoflexiblescheduleresourcesbetweenclustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM-*VMlifecycledeterminedbyDatanode-*LimitedelasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters-*Separatecompute-fromdata-*Removeelasticconstrain-byDatanode-*Elasticcompute-*Raiseutilization-*Separa

1 / 41
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功