Hadoop的英特尔之道

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SoftwareandServicesGroupHadoop:theIntelWay(Hadoop的英特尔之道)BringNewAnalyticsCapabilitiestoHadoopStack何京翔英特尔亚太研发有限公司总经理‹#›Security&TrustWorkloadConsolidationCloudandIOT:MoreUsers,MoreDevice,MoreDataImmersiveExperiencesCloudConnectivityDataAnalyticsSoftwareandServicesGroupOpenCloudArchitecture‹#›SoftwareandServicesGroupIntel'sVisionThisdecadewewillcreateandextendcomputingtechnologytoconnectandenrichthelivesofeverypersononearthSoftwareandServicesGroup‹#4›SensorReadingLogTableImageDocument…ExistingIT&DataRDBMSEDWDataMartsSystemsBI…AllofYourBigData(Structured&Unstructured)OurBigDataGoal:MakeHadooptheFoundationofNext-GenDataAnalyticsPlatformDataMiningandAnalyticsBusinessIntelligenceStatisticModelingMachineLearning…‹#5›HBaseHDFSHiveBaseStations3GInstantaneousqueryof3GrecordsbysubscribersSoftwareandServicesGroupUserSegmentationMapReduceETLHadoopinTelecomCarrierNetworkOptimizationsHiveInstantaneousquery(e.g.,roadimage)LegacyapplicationsMapReduceHBaseStreamprocessing(e.g.,real-timeroadconditions)SoftwareandServicesGroup‹#6›HadoopinSmartCityDatamining(e.g.,vehicletracking)Hadoop的英特尔之道更易用(ReducedComplexity)更高效企业级解决方案Enterprise-GradeSolution即时分析(InstantaneousAnalysis)英特尔Hadoop发行版•稳定的企业级软件产品•针对垂直行业的功能增强前沿技术开发AdvancedDevelopment“ProjectPanthera”•Advanceddevelopmentandpath-finding•Opensourceandcommunitydriven(ImprovedEfficiency)BringNewAnalyticsCapabilitiestoHadoopStackSoftwareandServicesGroup‹#7›英特尔Hadoop发行版优化的大数据处理软件产品英特尔HadoopManager安装、部署、配置、监控、告警和访问控制利用硬件新技术进行优化针对行业的功能增强,应对不同行业的大数据挑战数据分析、统计和挖掘Mahout机器学习R数据统计HivePig数据流处理语言可靠的分布式文件系统SoftwareandServicesGroup‹#8›稳定的企业级Hadoop发行版为Hadoop提供即时数据处理能力数据处理工具集fromRevolutionAnalytics交互式数据仓库MapReduce稳定高效的分布式计算框架分布式、高维数据库HBaseHBase0.94的改进和创新,提供即时数据处理HDFSSqoop关系数据ETL工具Flume日志收集工具Zookeeper分布式协作服务SQLengineforHive/MapReduce•BetterintegrationwithexistinginfrastructureusingSQLHBase•Documentsemantics&significantlyspeedupqueryprocessingonHBaseSoftwareandServicesGroup‹#9›•EfficientutilizationofnewHWplatformtechnologies…“ProjectPanthera”OpensourceinitiativestoenableadvancedanalyticscapabilitiesonHadoop‹#›SoftwareandServicesGroup即时分析(InstantaneousAnalysis)10InstantaneousanalysiswithgreatlyenhancedHBase•StreamnewdataintoHBaseforanalysisinrealtime•Supporthighupdaterateworkloads(tokeepthesystemalwaysuptodate)•Allowverylowlatency,onlinedataserving•Etc.‹#›11InteractiveQueryonHBase(英特尔Hadoop发行版)10XfasterthanMapReduceForcertainqueriesonHBase(e.g.,group-byaggregation)HBaseQueryEngineLayer•••Fast,distributedaggregationsdirectlyinsideHBaseParallelscanningovermultipleregionsAdvanced,distributedfiltering(CRC32comparator,fuzzyrowfilter,etc.)HBaseQueryEngineasNewHiveBackend•Most“SELECT”automaticallyoptimizedtouseHBaseQueryEngine“WHERE”usingadvancedscanner/filter“GROUP-BY”usingdistributed•aggregations“JOIN”stillsgotoMapReduceSoftwareandServicesGroup‹#›12ADocumentStoreonHBase(“ProjectPanthera”)Up-to3xstoragereductionand3xqueryspeedupForHive/MapReducequeryprocessingonHBase(See)DOT(DocumentOrientedTable)onHBase••••EachrowcontainsacollectionofdocumentsEachdocumentcontainsacollectionoffieldsAdocumentismappedtoaHBasecolumnandserializedusingAvroCompletetransparenttoexistingHBaseapplicationsSoftwareandServicesGroup‹#›SoftwareandServicesGroup更易用(ReducedComplexity)13•BetterdataminingandstatisticscapabilitiesFull-textindexingandsearchStatisticmodelingwithRlanguage•BetterintegrationwithexistinginfrastructuresGeo-distributeddatacentersFullSQLsupportforOLAP‹#›14Full-TextIndexingandSearch(英特尔Hadoop发行版)Full-textindexingandnearreal-timesearchforadvanceddatamining(E.g.,logandclickstreamanalysis,healthcarerecordanalysis,etc.)Incrementalfull-textindexingonHBase•Full-textindexingforsemi-structureddata(text,strings,numbers,etc.)•Indexincrementallybuiltwhenrecordsinsertedorupdated•Supportveryhighdatainsertion/updaterateNearreal-timesearch•Distributed,keywordorlogicalexpressionbasedsearch•ZerodelayofsearchinglatestdatathatarejustinsertedSoftwareandServicesGroup‹#›SoftwareandServicesGroupBringRStatisticsintoHadoop(英特尔Hadoop发行版)15DistributedStatisticModelingonHadoopusingRlanguage16DataCenterAVirtualBigTableCross-DatacenterBigTable/HBase(英特尔Hadoop发行版)AvirtualBigTableoverlaidoverexistinggeo-distributeddatacenters••GlobaltableviewDatastoredingeo-distributed••datacentersBetterlocality&higheravailabilityDatatransfereliminatedthroughdistributedaggregationDataCenterCDataCenterBAsyncReplicationSoftwareandServicesGroup‹#›…17AnanalyticalSQLengineforHive/MapReduce(“ProjectPanthera”)Goal:ProvideFullSQLsupportforOLAPinHadoopRequiredbybusinessusers,enterpriseapplications,3rdpartytools(e.g.,BIapplications),etc.(See)HiveParserHive-ASTHiveQLDriverQuery*‹#›(OpenSource)SQLParser*SQL-ASTSQL-ASTAnalyzer&TranslatorSubqueryMulti-TableUnnestingSELECT…HiveSemanticAnalyzerINTERSECTMINUSSupportSupportHadoopMRSQL

1 / 23
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功