思科中国百度文库:=ciscochina思科互动网络主页:大数据技术研讨会观看同期在线研讨会:=9082&KeyCode=000238223&ad_id=bdc100思科中国百度文库:=ciscochina思科互动网络主页:•大数据时代•大数据技术综述•思科CPA大数据架构思科中国百度文库:=ciscochina思科互动网络主页:思科中国百度文库:=ciscochina思科互动网络主页:大数据不仅仅是指数据本身,还包括一系列用来收集、管理、挖掘、分析海量信息并解决复杂问题的技术:AccordingtoIDC“Bigdatarefersnotonlytodataitselfbutalsotoasetoftechnologiesdesignedtocollect,manage,mine,andanalyzelargecollectionsofinformationtosolvecomplexproblems.”IDCAtarecentBigDataandHighPerformanceComputingSummitinBostonhostedbyAmazonWebServices(AWS),datascientistJohnRausermentionedasimpledefinition:任何大到一台计算机处理不过来的数据就是大数据,Anyamountofdatathat'stoobigtobehandledbyonecomputer.Somesaysthat'stoosimplistic.Otherssayit'sspoton.AmazonWebServices(AWS)“Bigdata”是指数据集合的尺寸超过典型数据库软件工具的捕捉、存储、管理和分析能力。referstodatasetswhosesizeisbeyondtheabilityoftypicaldatabasesoftwaretoolstocapture,store,manage,andanalyze.MGIalsosaysandprovesstrongevidencethatbigdatacanplayasignificanteconomicroletothebenefitnotonlyofprivatecommercebutalsoofnationaleconomiesandtheircitizens.Datacancreatesignificantvaluefortheworldeconomy,enhancingtheproductivityandcompetitivenessofcompaniesandthepublicsectorandcreatingsubstantialeconomicsurplusforconsumers.McKinseyGlobalInstituteFoundationResearchandAnalyticsTeam思科中国百度文库:=ciscochina思科互动网络主页:信息技术渗入人类生活思科中国百度文库:=ciscochina思科互动网络主页:•欺诈•离线审计•公众搜索关键字•网络泄密怀疑事件•行为分析•犯罪事情预想•燃料节省预测未来•日常成本估算•重新开发的区域•不了解的产品•名誉了解现在收集保管处理分析实行BigData思科中国百度文库:=ciscochina思科互动网络主页:内存/MPP数据库大数据GPU图像处理ServerServerServerServer????ApplicationsServerProcessingI/ONetworkStorage用户界面企业应用系统BusinessFunctionsVirtualServerVirtualServerPhisicalServerPhisicalServer智能资源调度机制通用平台资源池Stand-By思科中国百度文库:=ciscochina思科互动网络主页:思科中国百度文库:=ciscochina思科互动网络主页:“BigData”StoreandAnalyze“BigData”Real-TimeCapture,ReadandUpdateOperationsNoSQLApplicationSalesProductsProcessInventoryFinancePayrollShippingTrackingAuthorizationCustomersProfileMachinelogsSensordataCalldatarecordsWebclickstreamdataSatellitefeedsGPSdataSalesdataBlogsEmailsPicturesVideo结构化数据非结构化数据TransactionHBaseOracleNoSQLDBCassandraMongoDBCouchDBRedisMembaseNeo4jTransactionOracleDB2SQLServerMySQLSAPHANAAnalyzeGreenPlumNetezzaSAPHANAAnalyzeHadoopMapR思科中国百度文库:=ciscochina思科互动网络主页:(RDBMS)(NoSQLDB)思科中国百度文库:=ciscochina思科互动网络主页:DataisnotcentrallylocatedDataisstoredacrossalldatanodesintheclusterDataisstoredinlargeblocks(128MBorlarger)DataisstoredreliablybyreplicationBlock1Block2Block3Block4Block5Block6Block1Block2Block2Block3Block1Block3Block4Block5Block5Block6Block4Block6思科中国百度文库:=ciscochina思科互动网络主页:两种主要业务模式:BIandETLThecomplexityofthejob(mapandreduce)varygreatlydependingontheusecaseandhavealargeimpactonthenetwork.ProgramswritefunctionsforMapsandReduceandtheircomplexityvaries对网络的挑战:BurstIOandStableconnection思科中国百度文库:=ciscochina思科互动网络主页:(SimilartowhatRedhatdoesforLinux)–ServicesandsupportmodelSpin-outfromYahoo.ServicesandsupportmodelforApacheHadoop.MaincustomerissupportingYahoo.Rewrotehadoopwithmanyoptimizations(rewroteHDFSintoaC++Filesystemanddistributedthemetadata)EMCGreenplumhadoopdistribution.UsesMapR.HadoopDistributionandNoSQLlikeofferingtobeannouncedatthisyearsOracleOpenworld.VerysimilartoHBASE/otherNoSQLofferings.BasedofBerkeleyDB.OtherNoSQL-likeofferingsVariousOthers思科中国百度文库:=ciscochina思科互动网络主页:…主数据库应用服务器区域读写分离思科中国百度文库:=ciscochina思科互动网络主页:高价值,高密度,复杂的数据低价值,简单的数据复杂的数据关系非常简单的数据关系支持标准SQL语法典型NoSQL–没有表连接,没有查询和更新的标准。完全取决于具体实现。以Schema为中心没有固定的Schema,支持非结构化或者半结构化的数据为按比例增长设计分布式存储和处理以数据库为中心以应用/开发者为中心不是用来取代RDBMSCiscoConfidential©2010Ciscoand/oritsaffiliates.Allrightsreserved.16Segment服务器查询处理&数据存储......Master服务器执行计划生成和调度HadoopMapReduce数据源Loading,streaming,etc.内联网络ExternalFiles,URLs,Hadoop(HDFS),WebServices(includingfromotherDBs),O/SPipes(includingfro