©SpinnakerLabs,Inc.GoogleClusterComputingFacultyTrainingWorkshopModuleVI:DistributedFilesystemsThispresentationincludescoursecontent©UniversityofWashingtonSomeslidesdesignedbyAlexMoschuk,UniversityofWashingtonRedistributedundertheCreativeCommonsAttribution3.0licenseAlltherest:©SpinnakerLabs,Inc.Outline•Filesystemsoverview•NFS&AFS(AndrewFileSystem)•GFS©SpinnakerLabs,Inc.FileSystemsOverview•Systemthatpermanentlystoresdata•Usuallylayeredontopofalower-levelphysicalstoragemedium•Dividedintologicalunitscalled“files”–Addressablebyafilename(“foo.txt”)–Usuallysupportshierarchicalnesting(directories)©SpinnakerLabs,Inc.FilePaths•Afilepathjoinsfile&directorynamesintoarelativeorabsoluteaddresstoidentifyafile–Absolute:/home/aaron/foo.txt–Relative:docs/someFile.doc•Theshortestabsolutepathtoafileiscalleditscanonicalpath•ThesetofallcanonicalpathsestablishesthenamespaceforthefilesystemWhatGetsStored•Userdataitselfisthebulkofthefilesystem'scontents•Alsoincludesmeta-dataonadrive-wideandper-filebasis:Drive-wide:AvailablespaceFormattinginfocharacterset...Per-file:nameownermodificationdatephysicallayout...©SpinnakerLabs,Inc.High-LevelOrganization•Filesareorganizedina“tree”structuremadeofnesteddirectories•Onedirectoryactsasthe“root”•“links”(symlinks,shortcuts,etc)providesimplemeansofprovidingmultipleaccesspathstoonefile•Otherfilesystemscanbe“mounted”anddroppedinassub-hierarchies(otherdrives,networkshares)©SpinnakerLabs,Inc.Low-LevelOrganization(1/2)•Filedataandmeta-datastoredseparately•Filedescriptors+meta-datastoredininodes–Largetreeortableatdesignatedlocationondisk–Tellshowtolookupfilecontents•Meta-datamaybereplicatedtoincreasesystemreliability©SpinnakerLabs,Inc.Low-LevelOrganization(2/2)•“Standard”read-writemediumisaharddrive(othermedia:CDROM,tape,...)•Viewedasasequentialarrayofblocks•Mustaddress~1KBchunkatatime•Treestructureis“flattened”intoblocks•Overlappingreads/writes/deletescancausefragmentation:filesareoftennotstoredwithalinearlayout–inodesstoreallblockidsrelatedtofileFragmentationABC(freespace)ABC(freespace)AA(freespace)C(freespace)AADC(free)ADDesignConsiderations•Smallerinodesizereducesamountofwastedspace•Largerinodesizeincreasesspeedofsequentialreads(maynothelprandomaccess)•Shouldthefilesystembefasterormorereliable?•Butfasteratwhat:Largefiles?Smallfiles?Lotsofreading?Frequentwriters,occasionalreaders?©SpinnakerLabs,Inc.FilesystemSecurity•Filesystemsinmulti-userenvironmentsneedtosecureprivatedata–NotionofusernameisheavilybuiltintoFS–Differentusershavedifferentaccesswritestofiles©SpinnakerLabs,Inc.UNIXPermissionBits•Worldisdividedintothreescopes:–User–Thepersonwhoowns(usuallycreated)thefile–Group–Alistofparticularuserswhohave“groupownership”ofthefile–Other–Everyoneelse•“Read,”“write”and“execute”permissionsapplicableateachlevel©SpinnakerLabs,Inc.UNIXPermissionBits:Limits•Onlyonegroupcanbeassociatedwithafile•Nohigher-ordergroups(groupsofgroups)•Makesitdifficulttoexpressmorecomplicatedownershipsets©SpinnakerLabs,Inc.AccessControlLists•Moregeneralpermissionsmechanism•ImplementedinWindows•Richernotionofprivilegesthanr/w/x–e.g.,SetPrivilege,Delete,Copy…•Allowforinheritanceaswellasdenylists–Canbecomplicatedtoreasonaboutandleadtosecuritygaps©SpinnakerLabs,Inc.ProcessPermissions•Importantnote:processesrunningonbehalfofuserXhavepermissionsassociatedwithX,notprocessfileownerY•Soifrootownsls,useraaroncannotuselstopeekatotherusers’files•Exception:specialpermission“setuid”setstheuser-idassociatedwitharunningprocesstotheowneroftheprogramfile©SpinnakerLabs,Inc.DiskEncryption•Datastoragemediumisanothersecurityconcern–Mostfilesystemsstoredataintheclear,relyonruntimesecuritytodenyaccess–Assumesthephysicaldiskwon’tbestolen•Thediskitselfcanbeencrypted–Hopefullybyusingseparatepasskeysforeachuser’sfiles–(Challenge:howdoyouimplementreadaccessforgroupmembers?)–Metadataencryptionmaybeaseparateconcern©SpinnakerLabs,Inc.DistributedFilesystems•Supportaccesstofilesonremoteservers•Mustsupportconcurrency–Makevaryingguaranteesaboutlocking,who“wins”withconcurrentwrites,etc...–Mustgracefullyhandledroppedconnections•Canoffersupportforreplicationandlocalcaching•Differentimplementationssitindifferentplacesoncomplexity/featurescale©SpinnakerLabs,Inc.NFS•Firstdevelopedin1980sbySun•PresentedwithstandardUNIXFSinterface•Networkdrivesaremountedintolocaldirectoryhierarchy–YourhomedirectoryonattuisNFS-driven–Type'mount'sometimeatthepromptifcurious©SpinnakerLabs,Inc.NFSProtocol•Initiallycompletelystateless–OperatedoverUDP;didnotuseTCPstreams–Filelocking,etc,implementedinhigher-levelprotocols•ModernimplementationsuseTCP/IP&statefulprotocolsServer-sideImplementation•NFSdefinesavirtualfilesystem–Doesnotactuallymanagelocaldisklayoutonserver•ServerinstantiatesNFSvolumeontopoflocalfilesystem–Localharddrivesmanagedbyconcretefilesystems(EXT,ReiserFS,...)–OthernetworkedFS'smountedinby...?HardDrive1User-visiblefilesystemNFSserverNFSclientHardDrive2EXT2fsReiserFSHardDrive1HardDrive2EXT3fsEXT3fsServerfilesystem©SpinnakerLabs,Inc.NFSLocking•NFSv4supportsstatefullockingoffiles–Cli