Internet-ScaleComputingEdwardChangDirectorofResearchGoogleChinaOutline•SomeExtremesandScales•DesignChallenges•GoogleCloudComputingBuildingBlocks•NewTechnologiesSampleHierarchy•Server–16GBDRAM;160GBSSD;5x1TBdisk•Rack–40servers–48portGigabitEthernetswitch•Warehouse–10,000servers(250racks)–2KportGigabitEthernetswitchStorage---OneServerStorage---OneRackStorage---OneCenterOutline•SomeExtremesandScales•DesignChallenges•GoogleCloudComputingBuildingBlocks•NewTechnologiesDesignChallenges•Newprogrammingmodels:–Parallel;Flash(SSD);GPUs?•Useenergyefficiently–Hardware,software,warehouse–Encode/compress/transmitdatawell•FaultRecovery–Dealwithstragglers–Hardware/softwarefaults–HeavytailTheEightFallacies(PeterDeutsch)•Thenetworkisreliable•Latencyiszero•Bandwidthisinfinite•Thenetworkissecure•Topologydoesn’tchange•Thereisoneadministrator•Transportcostiszero•ThenetworkishomogeneousMap-reducemodel•Distributed,statelesscomputation•Built-infailurerecovery•Built-inloadbalancing•Networkandstorageoptimizations•Built-insortofintermediatevalues•Variousinterfaces(filesystem,etc.)•ProtocolbuffersforstructureddataBigtable•Sparse,distributed,multi-dimensional–sortedmap•Columnoriented(roughly,columnsfor–OLAP,rowsforOLTP)•Heavyuseofcompression•Haslocks,butdesignedformany–queries,notfortransactionsPregel•Graphprocessing•Bulksynchronousparallelmodel•Messagepassingtovertexes•Billionsofvertexes,edges•“Thinklikeavertex”EnergyConsumptionRegionalGrowthEnergyEnergyDistributionFaultRecovery•99.9%uptime=9hoursdown/year•A10,000serverwarehousecanexpect–0.25cooling/powerfailure(alldown;day)–1PDUfailure(500down;6hours)–20rackfailures(40down;1hour)–3routerfailures(1hour)–1,000serverfailure–1,000sdiskfailures–etc.,etc.,etc.PlanningforRecovery•Replication•Sharding•Checkpoints•Monitors/Heartbeats•Ifpossible:–Looseconsistency–Approximateanswers–IncompleteanswersGFS:Thebasics•Ourfirstcluster-levelfilesystem(2001)•Designedforbatchapplicationswithlargefiles•Singlemasterformetadataandchunkmanagement•Chunksaretypicallyreplicated3xforreliability•GFSlessons:–Scaledtoapproximately50Mfiles,10P–Largefilesincreasedupstreamapp.complexity–Notappropriateforlatencysensitiveapplications–ScalinglimitsaddedmanagementoverheadNewFS:Colossus•Next-generationcluster-levelfilesystem•Automaticallyshardedmetadatalayer•DatatypicallywrittenusingReed-Solomon(1.5x)•Client-drivenreplication,encodingandreplication•Metadataspacehasenabledavailabilityanalyses•WhyReed-Solomon?–Cost.Especiallyw/crossclusterreplication.–FielddataandsimulationsshowimprovedMTTF–Moreflexiblecostvs.availabilitychoicesMoreTechniques•StorageSoftware:DataPlacement•End-userlatencyreallymatters•Applicationcomplexityislessifclosetoitsdata•Countrieshavelegalrestrictionsonlocatingdata•Thingstothinkabout:•Howdowemigratecodewithdata?•Howdoweforecast,planandoptimizedatamoves?•Yourcomputerisalwayscloserthanthecloud.WininScale•GoogleTranslate•GoogleVoice•TrendsandPreventiveActionsAcknowledgement•Thankstothepublicslidesof–PeterNorvig–StuartFeldman