Sawmill:ALoggingFileSystemforaHigh-PerformanceRAIDDiskArraybyKennethWilliamShirriffB.Math.(UniversityofWaterloo)1986M.S.(UniversityofCaliforniaatBerkeley)1990AdissertationsubmittedinpartialsatisfactionoftherequirementsforthedegreeofDoctorofPhilosophyinComputerScienceintheGRADUATEDIVISIONoftheUNIVERSITYofCALIFORNIAatBERKELEYCommitteeincharge:ProfessorJohnOusterhout,ChairProfessorRandyKatzProfessorRonaldWolff1995Sawmill:ALoggingFileSystemforaHigh-PerformanceRAIDDiskArrayCopyright©1995byKennethWilliamShirriffAllrightsreserved1AbstractSawmill:ALoggingFileSystemforaHigh-PerformanceRAIDDiskArraybyKennethWilliamShirriffDoctorofPhilosophyinComputerScienceUniversityofCaliforniaatBerkeleyProfessorJohnOusterhout,ChairThewideningdisparitybetweenprocessorspeedsanddiskperformanceiscausinganincreasingI/Operformancegap.Onemethodofincreasingdiskbandwidthisthrougharraysofmultipledisks(RAIDs).Inaddition,topreventthefileserverfromlimitingdiskperformance,newcontrollerarchitecturesconnectthedisksdirectlytothenetworksothatdatamovementbypassesthefileserver.Thesedevelopmentsraisetwoquestionsforfilesystems:howtogetthebestperformancefromaRAID,andhowtousesuchacontrollerarchitecture.Thisthesisdescribestheimplementationofahigh-bandwidthlog-structuredfilesystemcalled“Sawmill”thatusesaRAIDdiskarray.SawmillrunsontheRAID-IIstoragesys-tem;thisarchitectureprovidesafastdatapaththatmovesdatarapidlyamongthedisks,high-speedcontrollermemory,andthenetwork.Byusingalog-structuredfilesystem,SawmillavoidsthehighcostofsmallwritestoaRAID.SmallwritesthroughSawmillareafactorofthreefasterthanwritestotheunderly-ingRAID.Sawmillalsousesnewtechniquestoobtainbetterbandwidthfromalog-struc-turedfilesystem.Byperformingdisklayout“on-the-fly,”ratherthanthroughablockcacheasinpreviouslog-structuredfilesystems,theCPUoverheadofprocessingcacheblocksisreducedandwritetransferscantakeplaceinlarge,efficientunits.2Thethesisalsoexamineshowafilesystemcantakeadvantageofthedatapathandcon-trollermemoryofastoragesystemsuchasRAID-II.Sawmillusesastream-basedapproachinsteadofablockcachetopermitlarge,efficienttransfers.Sawmillcanreadatupto21MB/sandwriteatupto15MB/swhilerunningonafairlyslow(9SPECmarks)Sun-4workstation.Incomparison,existingfilesystemsprovidelessthan1MB/sontheRAID-IIarchitecturebecausetheyperforminefficientsmallopera-tionsanddon’ttakeadvantageofthedatapathofRAID-II.Inmanycases,Sawmillper-formanceislimitedbytherelativelyslowserverCPU,suggestingthatthesystemwouldbeabletohandlelargerandfasterdiskarrayssimplybyusingafasterprocessor.iiiTableofContentsCHAPTER1.Introduction.........................................................................................11.1.Contributions.............................................................................................21.2.Outlineofthedissertation..........................................................................3CHAPTER2.Background..........................................................................................52.1.Diskarrays.................................................................................................72.1.1.Writesandparityupdatecosts................................................................92.1.3.RAIDsummary.....................................................................................132.2.High-performancestoragesystems..........................................................132.2.1.Mainframemassstoragesystems..........................................................132.2.2.Multiprocessorfileservers....................................................................142.2.3.Replicatedfileservers...........................................................................142.2.4.Stripingacrossservers...........................................................................152.3.TheRAID-IIstoragearchitecture............................................................162.4.Thefilesystem.........................................................................................182.4.1.TheUnixfilesystemmodel..................................................................202.4.2.NFS........................................................................................................232.4.3.Sprite.....................................................................................................232.5.Filelayouttechniques..............................................................................262.5.1.Read-optimizedfilesystems.................................................................262.5.2.Write-optimizedfilesystems................................................................282.6.TheLog-structuredFileSystem(LFS)....................................................292.6.1.Readingandwritingthelog..................................................................292.6.2.Managingfreespaceandcleaningthelog............................................312.6.3.Crashrecovery.......................................................................................322.6.4.BSD-LFS.....................