Method-Level Code Clone Detection on Transformed A

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

Method-LevelCodeCloneDetectiononTransformedAbstractSyntaxTreesUsingSequenceMatchingAlgorithmsKevinGreenankmgreen@soe.ucsc.eduDepartmentofComputerScienceUniversityofCalifornia-SantaCruzMarch16,2005AbstractCurrentresearchshowsthatalargefractionofsourcecodeinmanylarge-scaleapplicationscontainscodeclones[4].Theexistenceofcodeclonescanintroducemanyinstabilitieswithinasoftwareapplication,suchasunnecessaryduplicates.Theseinstabilitiescanover-complicateroutinemaintenancetasks,sinceachangeinonemethodmayleadtochangesacrossmanymethods.Inaddition,unnecessaryduplicatescanpotentiallyinducethespreadofbugsandpreventchangesfrombeingpropagated[6].Therefore,inordertopreventandtreattheseinstabilities,wemustfigureoutwherepotentialclonesoccur.Aproperlyannotatedabstractsyntaxtreesuppliesagreatdealofinformationaboutthestructureofanapplication’ssourcecode.Thesetreescanbeeasilytransformedintoasequenceofsubstrings.Borrowingafewideasfrombiologicalsequencealign-ment,similaritiesbetweentransformedsubtreescanbeidentified.Onceidentified,anapplicationarchitectormaintainercanaccept(orreject)thesimilarblocksofcodeasa(non-)clone.Giventheuseofsubtreetransformation,Iwillpresentthreecodeclonealgorithmsinthispaper.Theresearchpresentedservesasastartingpointforcodeclonedetectionusingtransformedsubtreesandsequencesimilarity.Thus,myresultswilljustifyintuitionandopenafewdoorstofutureworkintheareaofcodeclonedetection.1IntroductionWhatexactlyisacodeclone?Usingavariationofthedefinitionpresentedin[8],semi-formally,twomethodsaresaidtobeclonesiftheyareidenticalornear-identical.Thewordidenticalseemsprettyvague.Infact,mostoftheliteratureoncodecloneanalysisdoesnotgiveaconcretedefinitiontothewordidenticalwithrespecttocodecloneanalysis.Basically,ifthecardinalityoftheintersectionoftwoprogramentitiesexceedsaprescribedthreshold,thetwoentitiesareclonecandidates.Thesecandidatesareusuallyrejectedoracceptedthroughsomesortofcontextualanalysis.Variousstudiessuggestthatmanyprogrammersinthesoftwaredevelopmentindustryresorttocopyandpastetechniques,whichisgenerallyusedasaformofreuse[6].Thisformofreuseusuallyresultsincode1clones.Unfortunately,practicingcopyandpastetechniquescancreateverycomplicatedmaintenancetasks[9].Inaddition,amongmanyotherfactors,codeclonescanariseasaresultofdesigndecisions,poorcohesionbetweenmodulesandpoorcommunicationbetweendevelopers.Codeclonescanbedetectedonanexactmatchbasis.Unfortunately,usingexactmatchcriteriaforcodeclonedetectionmaynotbesufficientinallcases.Forinstance,aprogrammermaycopyaparticularpieceofcode,pasteittoanotherlocationinthesystemandproceedtochangethepastedcodesuchthatitremainssyntacticallysimilartotheoriginalcode,butnotexactlysimilar.Thus,someformofnear-exactclonedetectionmustbeemployed.Thispaperpresentsoneexactmatchandtwonear-exactmatchalgorithmsforcodeclonedetection.Thesealgorithmsrelyonanabstractsyntaxtree(AST),whichstoresattributessuchasnonterminalproductioninformation,typeinformation,parentfile,parentclassandlinenumber.AllotherterminalsymbolsareintentionallynotstoredintheAST.Thus,thealgorithmsaredesignedtomatchthestructureofthecode,withrespecttotypes,ratherthanmatchonattributessuchasvariablenames,methodnamesorliterals.Theeffectivenessofthealgorithmswillbedeterminedbyahigh-levelanalysisofthealgorithmswhenrunagainstselectedmodulesinEclipseandHibernate.Amoredetailedanalysiswillbeconductedoncodegeneratedmanuallyforthepurposeofanalyzingthealgorithms.InthenexttwosectionsIwillcoversourcecodeandASTtransformation.Section4willpresentthecodeclonedetectionalgorithmsanddefinethematchingcriteriausedforthealgorithms.Then,insections5and6IdeterminetheeffectivenessofthealgorithmsandpresenttheresultsproducedwhenthealgorithmsarerunagainstmodulesfromEclipse,Hibernateandmanuallygeneratedcode.Finally,sections7-8coverthreatstovalidity,futureworkandconclusions.2CodeTransformationInordertoeffectivelyparsethecodewithinthecontextofmyanalysis,transformationoftheactualsourcecodetextisrequired.Thesetransformationsincluderemovalofcomments,substitutionofliteralscontainingtheterminalsymbol”//”,additionoffilenameidentifiersandfinallyconcatenationofallsourcefilesintoonelargesourcefile.Thesetransformationswillbeexplainedinthefollowingparagraphs.Thesubstitutionofstringliteralscanbecompletedinoneswoop,usingastreameditor,suchassed.IwroteasmallCprogramtoremovethecomments,withoutactuallydeletingthelinesthemselves.Wedonotwanttodeletethelines,sincelineinformationshouldremainconsistentthroughouttransformationandanalysis,sincewewouldliketoreportLOCpermethodandrelativelinenumberswhenreportingmethodmatches.Thus,asanexample,Figure1illustrateshowasourcefilewillbetransformed.Asafinalstepinthesourcecodetransformationphase,filenamesareaddedtothefirstlineofeachsourcefile.Then,allofthefilesareconcatenatedintoonelargefilefortheparsingstep.Byconcatenatingallofthefilesintoonelargefile,allofthemethodsubtreescanbeextractedinonesearch.2--Begintest.java}--Begintest.java\begin{verbatim}//Programmer:JohnDoe/*Mytestclass*//*Thisclassimplementsthefunctionfunc1*/classTestClass{classTestClass{//Function:func1//Preconditio

1 / 17
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功