DoesCodeDecay?AssessingtheEvidencefromChangeManagementDataStephenG.Eick,ToddL.Graves,AlanF.Karr,J.S.Marron,andAudrisMockusTechnicalReportNumber81March,1998NationalInstituteofStatisticalSciences19T.W.AlexanderDrivePOBox14006ResearchTrianglePark,NC27709-4006|whichisnecessarytoaddnewfunctionality,accommodatenewhardwareandrepairfaults|becomesincreasinglydi cultovertime.Inthispaperweapproachthisphenomenon,whichwetermcodedecay,scienti callyandstatistically.Wede necodedecay,andproposeanumberofmeasurements(codedecayindices)onsoftware,andontheorganizationsthatproduceit,thatserveassymptoms,riskfactorsandpredictorsofdecay.Usinganunusuallyrichdataset(the fteen-plusyearchangehistoryofthemillionsoflinesofsoftwareforatelephoneswitchingsystem),we ndmixedbutonthewholepersuasivestatisticalevidenceofcodedecay,whichiscorroboratedbydevelopersofthecode.Suggestiveindicationsthatperfectivemaintenancecanretardcodedecayarealsodiscussed.S.G.EickiswithBellLaboratories.T.L.GravesiswiththeNationalInstituteofStatisticalSciencesandBellLaboratories.A.F.KarriswiththeNationalInstituteofStatisticalSciences.J.S.MarroniswiththeUniversityofNorthCarolinaatChapelHill.A.MockusiswithBellLaboratories.EICK,GRAVES,KARR,MARRON,ANDMOCKUS:DOESCODEDECAY?101I.IntroductionBecausethedigitalbitsthatde neitareimmutable,softwaredoesnotageor\wearoutintheconventionalsense.Intheabsenceofchangetoitsenvironment,softwarecanfunctionessentiallyforeverasitwasoriginallydesigned.However,changeisnotabsentbutubiquitous,intwoprincipalsenses.First,thehardwareandsoftwareenvironmentssurroundingasoftwareproductdochange:forexample,hardwareisupgraded,ortheoperatingsystemisupdated.Second,andequallyimportant,therequiredfunctionality|bothfeaturesandperformance|changes,sometimesabruptly.Forexample,atelephonesystemmust,overtime,o ernewfeatures,becomemorereliableandrespondfaster.Necessarily,then,thesoftwareitselfmustbechanged,throughanongoingprocessofmaintenance.Aspartofourexperiencewiththeproductionofsoftwareforalargetelecommunicationssystem,wehaveobservedanearlyunanimousfeelingamongdevelopersofthesoftwarethatthecodedegradesthroughtime,andmaintenancebecomesincreasinglydi cultandexpensive.Whetherthiscodedecayisreal,howitcanbecharacterized,andtheextenttowhichitmattersarethequestionsweaddressinthispaper.Theresearchreportedhereisbasedonanuncommonlyrichdataset:theentirechangemanagementhistoryofalarge, fteen-yearoldreal-timesoftwaresystemfortelephoneswitches.Currently,thesystemcomprises100,000,0001linesofsourcecode(inC/C++andaproprietarystatedescriptionlanguage)and100,000,000linesofheaderandmake les,organizedintosome50majorsubsystemsand5,000modules.(Forourpurposes,amoduleisadirectoryinthesourcecode lesystem,sothatacodemoduleisacollectionofseveral les.Thisterminologyisnotstandard.)Eachreleaseofthesystemconsistsofsome20,000,000linesofcode.Morethan10,000softwaredevelopershaveparticipated.Webegin,inxII,withabriefdiscussionofthesoftwarechangeprocessandthechangemanagementdatawithwhichwework.Thehandling,explorationandvisualizationofthesedataareimportantissuesintheirownright,andaretreatedin[1].InxIII,weproposeaconceptualmodelforcodedecay:aunitofcode(inmostcases,amodule)isdecayedifitishardertochangethanitshouldbe,measuredintermsofe ort,intervalandquality.Associatedwiththemodelisacompellingmedicalmetaphorofsoftwareaspatient,whichenablesonetoreasonintermsofcauses,symptoms,riskfactorsandprognoses.Thescienti clinkbetweenthemodelandtheconclusionsisaseriesofcodedecayindices(CDIs)presentedinxIV,whichquantifysymptomsorriskfactors(andsoarelikemedicaltests)orpredictkeyresponses(aprognosis).Theindicesintroducedherearedirectlyrelevanttothestatisticalanalysesthatfollow;manyotherscouldbeformulatedandinvestigated.Ourfourprincipalresultstreatspeci cmanifestationsofdecay.Threeoftheseresultsareevidencethatcodedoesdecay:(1)thespanofchanges,whichisshowntoincreaseovertime(xV-A);(2)breakdownofmodularity,whichisexhibitedbymeansofnetwork-stylevisualizations(xV-B);(3)faultpotential,thelikelihoodofchangestoinducefaultsinthesoftwaresystem{inxV-C,weshowthatthedistributionoffaultsisexplainedbythedistributionoflarge,recentchangestothesoftware.Thefourthquanti estheimpactofdecay,intheformof(4)predictionofe ortrequiredtomakeachange,usingcodedecayindicesthatencapsulatecharacteristicsofchanges(xV-D).RelatedWorkEarlyinvestigationsofaginginlargesoftwaresystems,byBeladyandLehman[2],[3],[4],reportedthenear-impossibilityofaddingnewcodetoanagedsystemwithoutintroducingfaults.Worksuchas[5]onsoftwaremaintenanceforCobolprogramsrunningonanIBMonlinetransactionprocessingsystemaddressedprogramcomplexity,modularityandmodi cationfrequencyasexplanatoryvariables,butfoundthatthesevariablesaccountedonlyfor12%ofthevariationintherepairmaintenancerate.1Numbersareapproximate.102IEEETRANSACTIONSON