结构大数据架构26

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

解構⼤大數據架構⼤大數據系統的伺服器與網路資源規劃“Howtoeatanelephant–onebyteatatime”CPLi李俊邦EnterpriseTechnologistEnterpriseSolutions&Alliances,GreaterChinaDell2議程1. 不同的伺服器⾓角⾊色1. Manager2. NameNodes3. EdgeNodes4. DataNodes2. HadoopCluster設計3. Etu+Dell4. Futures/Roadmap5. Questions?3ServerRoles-Manager• 系統安裝圖形介⾯面/主控台• ⼤大多安裝在EdgeNode• 常⾒見版本– ClouderaManager– ApacheAmbari4ServerRoles–NameNodes• 存放HDFS的metadata• JobManagerforYARNdata-processingframework• Primary– Heartbeatsfromdatanodes– 10thheartbeatisablockreportfromwhichitgeneratesmetadata• Standby– Checksineveryhourtomirrormetadata/blockmap– Notahot-spare–requiresmanualfail-over• HighAvailability(HA)canbeaddedinsomedistributions– ResultsinadedicatedHAnodethatactsasawitnesstotheNameNodecluster5ServerRoles-EdgeNodes• 資料進出Hadoop叢集的主要端⼝口• 可擴展• Hadoop叢集裡唯⼀一的多網段節點PowerEdge  R730  –  Name  NodePowerEdge  R730  –  Standby  Name  NodePowerEdge  R730  –  Edge  Node(s)PowerEdge  R730  –  HA  NodeCorporate  NetworkData  NetworkCorporateData  NetworkData  NetworkData  NetworkData  NetworkPowerEdge  R730XD  –  Data  NodesData  Network6ServerRoles-DataNode• HDFS的主要存放處• 執⾏行YARN資源管理所指定的資料處理• 主要屬性– 記憶體› 標配64GB› 更多服務(Impala/Spark)需要更多記憶體– 很多的本地硬碟(JBOD/Non-RAIDmode)› SFF(2.5”)forperformance-basedworkloads› LFF(3.5”)forcapacity-centricworkloads– CPUs–legacyrecommendationof1:1core:spindleratio› SSDs,fasterHDD(10K+),andin-memoryworkloadsmakethislessofanissue› 10and12corearethebestpracticedefaultHadoopClusterDesign8HadoopClusterDesign–HardwareConsiderations9HadoopClusterDeployment–InstallationBestPractices• Usepre-built,assembled&cabledracksfromvendor• ⾃自動佈署⼯工具(ex:OpenCrowbar)• Purchasenodesinstandardsizegroupsforeasycapacitygrowthandordering,notinsinglenodeincrements– Commonincrementsare½orfullrackforeasydeploymentandsizing• Foreachtypeofhardware,purchasesparecomponentstokeeponsiteforeasy,rapidrepair10CoreHadoopUseCases歸檔⾼高硬碟/CPU⽐比記憶體使⽤用低法規需求⻑⾧長期歸檔資料處理⾼高硬碟/CPU⽐比記憶體使⽤用中等DWoffloadETLoffloadEDH質量分析ITLog分析分析⾼高核⼼心數記憶體使⽤用⾼高市場分析詐欺預防網路分析11CommonHadoopUseCasetoEcosystemToolMapping12HadoopUseCasetoRatioMapping歸檔1:2:1資料處理1:4:1分析2:8:1CPU(Cores):Memory(GB):Disk(數量)–DataNode13NodeConsiderationsDellPowerEdgeR730DellPowerEdgeR730DellPowerEdgeR730DellPowerEdgeR730xd14NodeConsiderations15HDFSCapacity• HDFSprotectsinformationthroughreplicationofthedatabetweennodes,thedefaultReplicationFactoris3,butisconfigurable.• HDFSRawCapacity=NumberofComputeNodesxNumberofDrivesxCapacityofDrives• HDFSUsableCapacity=HDFSRawCapacity/ReplicationFactor16BigDataNetworkingBestPractices• TraditionalEthernetisusedsinceit’saffordableandalreadyprevalent.• 1GbEnetworkingwasusedinitiallyinearlydraftsofthesolutionbutwiththereductionincostit’smuchmoreefficienttogowith10GbE.• Multipleportsareteamedbothforredundancyandthroughput.LACPorsoftwarebondingarethemostcommonmethods.• IPv4ismostwidelyused.IPv6haslimitedsupportattheOSandHadooplevel.17AttributesofaGoodSwitchforBigData• Non-blockingbackplane• Deepper-portpacketbuffers(sharedbuffersdonotworkwell).Duringsort/shufflephasesofmap/reduceoperationsnetworktrafficissochaoticthatitcansaturateanyandallsharedbuffers,impactingmultiplehost’snetworkperformance.• Goodchoices:– 1GbE› S55› S60– 10GbE› S4810› S5000– 40GbE› Z9000› Z9500› S600018DellHadoopSolutionLogicalDiagram19Scale-outAggregationLayer20DellPointsofIntegration• VLT/VRRPisaveryaffordablewaytoteamswitchesbothattheToRandtheaggregationtiers.ThismakestheDellNetworkingForce10switchesagreatchoice.• ActiveFabricManager– SpeedsupthecreationandadministrationoftherequiredVLT/VRRPconfigurationontheswitches.– Helpswithcapacity-planningascustomerscale21BigDataNetworkingFutures• 40GbEonboardLOMswillbegintobeusedforhigh-volumeclusters.Rightnowthecost:benefitratioisn’tthereyet.• AsHPCandBigDataconverge,we’llstarttoseetheuseofIBfornode-to-nodeconnectivity.• In-memory(Spark/Impala)workloadsarereducingthebottlenecksthatusedtoexistatthediskandnowmovetotheprocessorandnetwork.Expectcustomerstobelookingtoincreasecorecountsandnetworkspeedtoovercomethis.@Dell_EnterpriseEnterpriseSolutionsEtu+Dell=completeHadoop/BigDatasolutionproviderBestofbreedClouderapartners-EtuAnalyticsoftwaresolutionsforBigDataDellProfessionalServicesforBigDataDellPowerEdge13GserversDellNetworkingsolutionsInstallationandconfigurationserviceCompleteend-to-endimplementationDiscoverPlanImplementInvestigate232.Store1.Integrate4.Act3.AnalyzeSolutionarchitectureAnalyticaloutputToadDataPointDesktop–integrate,cleanseDellBoomiCloud–integrate,correlateToadIntelligenceCentralDataaggregationandvirtualizationDellSTATISTICACustomerdataOrderdataEventsStockmarketdataAdvancedAnalyticsMarketingcampaignsDellStatisticaBigDataDesktop–crawl,saveSocialMedia24Futures• SpeedImprovementsinMap/Reduce• Morein-memoryworkloads– PossiblemovetoSparktoreplaceMap/Reduce• VirtualizedHadoop– VMWareBigDataExtensions– OpenstackSahara– MicrosoftHDInsights(Hortonworks)25DellIn-MemoryApplianceforClouderaEnterpriseConfigurationsataglanceMid-SizeConfiguration16NodeClusterPowerEegeR720-4InfrastructureNodeswithProSupportPowerEdgeR720XD-12DataNodesw

1 / 26
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功