DistinctiveImageFeaturesfromScale-InvariantKeypointsDavidG.LoweComputerScienceDepartmentUniversityofBritishColumbiaVancouver,B.C.,Canadalowe@cs.ubc.caJanuary5,2004AbstractThispaperpresentsamethodforextractingdistinctiveinvariantfeaturesfromimagesthatcanbeusedtoperformreliablematchingbetweendifferentviewsofanobjectorscene.Thefeaturesareinvarianttoimagescaleandrotation,andareshowntoproviderobustmatchingacrossaasubstantialrangeofaffinedis-tortion,changein3Dviewpoint,additionofnoise,andchangeinillumination.Thefeaturesarehighlydistinctive,inthesensethatasinglefeaturecanbecor-rectlymatchedwithhighprobabilityagainstalargedatabaseoffeaturesfrommanyimages.Thispaperalsodescribesanapproachtousingthesefeaturesforobjectrecognition.Therecognitionproceedsbymatchingindividualfea-turestoadatabaseoffeaturesfromknownobjectsusingafastnearest-neighboralgorithm,followedbyaHoughtransformtoidentifyclustersbelongingtoasin-gleobject,andfinallyperformingverificationthroughleast-squaressolutionforconsistentposeparameters.Thisapproachtorecognitioncanrobustlyidentifyobjectsamongclutterandocclusionwhileachievingnearreal-timeperformance.AcceptedforpublicationintheInternationalJournalofComputerVision,2004.11IntroductionImagematchingisafundamentalaspectofmanyproblemsincomputervision,includingobjectorscenerecognition,solvingfor3Dstructurefrommultipleimages,stereocorrespon-dence,andmotiontracking.Thispaperdescribesimagefeaturesthathavemanypropertiesthatmakethemsuitableformatchingdifferingimagesofanobjectorscene.Thefeaturesareinvarianttoimagescalingandrotation,andpartiallyinvarianttochangeinilluminationand3Dcameraviewpoint.Theyarewelllocalizedinboththespatialandfrequencydomains,re-ducingtheprobabilityofdisruptionbyocclusion,clutter,ornoise.Largenumbersoffeaturescanbeextractedfromtypicalimageswithefficientalgorithms.Inaddition,thefeaturesarehighlydistinctive,whichallowsasinglefeaturetobecorrectlymatchedwithhighprobabilityagainstalargedatabaseoffeatures,providingabasisforobjectandscenerecognition.Thecostofextractingthesefeaturesisminimizedbytakingacascadefilteringapproach,inwhichthemoreexpensiveoperationsareappliedonlyatlocationsthatpassaninitialtest.Followingarethemajorstagesofcomputationusedtogeneratethesetofimagefeatures:1.Scale-spaceextremadetection:Thefirststageofcomputationsearchesoverallscalesandimagelocations.Itisimplementedefficientlybyusingadifference-of-Gaussianfunctiontoidentifypotentialinterestpointsthatareinvarianttoscaleandorientation.2.Keypointlocalization:Ateachcandidatelocation,adetailedmodelisfittodeterminelocationandscale.Keypointsareselectedbasedonmeasuresoftheirstability.3.Orientationassignment:Oneormoreorientationsareassignedtoeachkeypointlo-cationbasedonlocalimagegradientdirections.Allfutureoperationsareperformedonimagedatathathasbeentransformedrelativetotheassignedorientation,scale,andlocationforeachfeature,therebyprovidinginvariancetothesetransformations.4.Keypointdescriptor:Thelocalimagegradientsaremeasuredattheselectedscaleintheregionaroundeachkeypoint.Thesearetransformedintoarepresentationthatallowsforsignificantlevelsoflocalshapedistortionandchangeinillumination.ThisapproachhasbeennamedtheScaleInvariantFeatureTransform(SIFT),asittransformsimagedataintoscale-invariantcoordinatesrelativetolocalfeatures.Animportantaspectofthisapproachisthatitgenerateslargenumbersoffeaturesthatdenselycovertheimageoverthefullrangeofscalesandlocations.Atypicalimageofsize500x500pixelswillgiverisetoabout2000stablefeatures(althoughthisnumberdependsonbothimagecontentandchoicesforvariousparameters).Thequantityoffeaturesispartic-ularlyimportantforobjectrecognition,wheretheabilitytodetectsmallobjectsinclutteredbackgroundsrequiresthatatleast3featuresbecorrectlymatchedfromeachobjectforreli-ableidentification.Forimagematchingandrecognition,SIFTfeaturesarefirstextractedfromasetofref-erenceimagesandstoredinadatabase.Anewimageismatchedbyindividuallycomparingeachfeaturefromthenewimagetothispreviousdatabaseandfindingcandidatematch-ingfeaturesbasedonEuclideandistanceoftheirfeaturevectors.Thispaperwilldiscussfastnearest-neighboralgorithmsthatcanperformthiscomputationrapidlyagainstlargedatabases.Thekeypointdescriptorsarehighlydistinctive,whichallowsasinglefeaturetofinditscorrectmatchwithgoodprobabilityinalargedatabaseoffeatures.However,inacluttered2image,manyfeaturesfromthebackgroundwillnothaveanycorrectmatchinthedatabase,givingrisetomanyfalsematchesinadditiontothecorrectones.Thecorrectmatchescanbefilteredfromthefullsetofmatchesbyidentifyingsubsetsofkeypointsthatagreeontheobjectanditslocation,scale,andorientationinthenewimage.Theprobabilitythatseveralfeatureswillagreeontheseparametersbychanceismuchlowerthantheprobabilitythatanyindividualfeaturematchwillbeinerror.ThedeterminationoftheseconsistentclusterscanbeperformedrapidlybyusinganefficienthashtableimplementationofthegeneralizedHoughtransform.Eachclusterof3ormorefeaturesthatagreeonanobjectanditsposeisthensubjecttofurtherdetailedverification.First,aleast-squaredestimateismadeforanaffineapproxi-mationtotheobjectpose.Anyothe