培乐园-海量数据之架构和处理5

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

2.Infrastructure:MapReduce2.Infrastructure:HBASE2.Infrastucture:HBASE2.Infrastructure:design•Scalability,Reliability,Performance,Throughput,LatencyScalability,Reliability,Performance,Throughput,LatencyScalability,Reliability,Performance,Throughput,LatencyScalability,Reliability,Performance,Throughput,Latency•Design:Design:Design:Design:–Partitioning/Sharding–Consistenthash–Consistencymodel–DataModels–Storagelayouts–LogStructuredMergeTree(bigtable)•Notes:Notes:Notes:Notes:–StrictConsistency–EventualConsistency–TimestampandVectorClocks–Gossip–Primarykey-value/blob/structure/semi-strucure–SecondaryIndexes–Tables/Namespaces–Multi-versionStorage–Row-based,Column-basedstorage–BloomFilters3.BigData3.BigData:hypergrowth•Reuters-21578:about10Kdocs(ModApte)•RCV1:about807Kdocs•LinkedInjobtitledata:about100MdocsBekkermanetal,SIGIR2001Bekkermanetal,SIGIR2001Bekkermanetal,SIGIR2001Bekkermanetal,SIGIR2001Bekkerman&Scholz,CIKM2008Bekkerman&Scholz,CIKM2008Bekkerman&Scholz,CIKM2008Bekkerman&Scholz,CIKM2008BekkermanBekkermanBekkermanBekkerman&&&&GavishGavishGavishGavish,KDD2011,KDD2011,KDD2011,KDD2011FromKDD20113.BigData:hypergrowthhoursdaysmonthsyears3.BigData:hypergrowth•Bigness–Volume,Velocity,Size•Structure–Variety,Variability,Complexity3.BigData:MachineLearning•Thousandinstances–Manually•Millioninstances–preprocessing,modeling•Billioninstances–distributedstorage/computing,modelingparallelization•Trillioninstances–……3.BigData:GoogleYoutube•Data:XPB,Trillionrowtables•Query:oracle-mysql-columnIO•ETL:python-sawzall+tenzing+python•Reporting:microstrategy-ABI3.BigData:google•StructureStructureStructureStructure–Relational(HostedSQL)–Record-oriented(Bigtable)–Nested(ProtocolBuffer)–Graphs(Pregel)•AnalysisAnalysisAnalysisAnalysis–Numbercrunching(MR,Flumejava)–Adhoc(Dremel,BigQuery)–Precisevs.Estimate(Sawzall)–Modelgeneration&predication(PredicationAPI)•CoreFeaturesCoreFeaturesCoreFeaturesCoreFeatures–RESTfull–Partitions/Buckets–AccessControl/Auth–Scalable,Fast,Simple3.BigData:teradata3.BigData:teradata3.BigData:warehouse�3.BigData:warehouse��3.BigData:Warehousecollection,hdfs,table,...,storagescollection,hdfs,table,...,storagescollection,hdfs,table,...,storagescollection,hdfs,table,...,storagesAdhocAdhocAdhocAdhocqueryqueryqueryqueryReportingReportingReportingReportingModelingModelingModelingModelingDashboardDashboardDashboardDashboardPre-ProcessedPre-ProcessedPre-ProcessedPre-ProcessedFactTableFactTableFactTableFactTableDWMartsDWMartsDWMartsDWMartsRawLogRawLogRawLogRawLogColdDatasetColdDatasetColdDatasetColdDatasetHotDatasetHotDatasetHotDatasetHotDatasetETLETLETLETLBIToolsBIToolsBIToolsBIToolsDataDataDataDataDiscoveryDiscoveryDiscoveryDiscoveryVisualizatioVisualizatioVisualizatioVisualizationnnn............collectioncollectioncollectioncollection&backup&backup&backup&backupCashflowCashflowCashflowCashflowanalysisanalysisanalysisanalysisTreasuryTreasuryTreasuryTreasuryMarketingMarketingMarketingMarketingCustomerCustomerCustomerCustomerServiceServiceServiceServiceChannelChannelChannelChannelManagmntManagmntManagmntManagmntRisklRisklRisklRisklManagmntManagmntManagmntManagmntAccounts/Accounts/Accounts/Accounts/MMMMisisisisExposureExposureExposureExposureAnaysisAnaysisAnaysisAnaysisProductProductProductProductAnaysisAnaysisAnaysisAnaysisUserUserUserUserBehaviorBehaviorBehaviorBehaviorCompetitorCompetitorCompetitorCompetitor3.BigData:Hive3.BigData:Hive•DataModel–Tables–Partitions–Buckets3.BigData:practice4.Cloud4.Cloud•Softwareservices&businessmodelsSoftwareservices&businessmodelsSoftwareservices&businessmodelsSoftwareservices&businessmodels–SaaS(“softwareasaservice”)•Salesforce,zaho,evernote,dropbox–PaaS(“Platformasaservice”)•Appengine,heroku–IaaS(“Infrastuctureasaservice”)•AmazonEC2,Rackspacecloudserves–BPaaS(“BusinessProcessasaservice”)•MajorPlayersMajorPlayersMajorPlayersMajorPlayers–Users:•Google,Facebook,Microsoft–Services:•Amazon,Microsoft,Rackspace,HP,SAP,ORACLE•Box.net,Dropbox–Infrastructureandequipmentproviders•Juniper,HP,Cisco,Intel,SAP,Oracle4.Cloud

1 / 22
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功