Object-Detection-with-Discriminatively-Trained-Par

小胖葱
1 ℃
2020-03-27

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

ObjectDetectionwithDiscriminativelyTrainedPart-BasedModelsPedroF.Felzenszwalb,Member,IEEEComputerSociety,RossB.Girshick,StudentMember,IEEE,DavidMcAllester,andDevaRamanan,Member,IEEEAbstract—Wedescribeanobjectdetectionsystembasedonmixturesofmultiscaledeformablepartmodels.Oursystemisabletorepresenthighlyvariableobjectclassesandachievesstate-of-the-artresultsinthePASCALobjectdetectionchallenges.Whiledeformablepartmodelshavebecomequitepopular,theirvaluehadnotbeendemonstratedondifficultbenchmarkssuchasthePASCALdatasets.Oursystemreliesonnewmethodsfordiscriminativetrainingwithpartiallylabeleddata.Wecombineamargin-sensitiveapproachfordata-mininghardnegativeexampleswithaformalismwecalllatentSVM.AlatentSVMisareformulationofMI-SVMintermsoflatentvariables.AlatentSVMissemiconvex,andthetrainingproblembecomesconvexoncelatentinformationisspecifiedforthepositiveexamples.ThisleadstoaniterativetrainingalgorithmthatalternatesbetweenfixinglatentvaluesforpositiveexamplesandoptimizingthelatentSVMobjectivefunction.IndexTerms—Objectrecognition,deformablemodels,pictorialstructures,discriminativetraining,latentSVM.Ç1INTRODUCTIONOBJECTrecognitionisoneofthefundamentalchallengesincomputervision.Inthispaper,weconsidertheproblemofdetectingandlocalizinggenericobjectsfromcategoriessuchaspeopleorcarsinstaticimages.Thisisadifficultproblembecauseobjectsinsuchcategoriescanvarygreatlyinappearance.Variationsarisenotonlyfromchangesinilluminationandviewpoint,butalsoduetononrigiddeformationsandintraclassvariabilityinshapeandothervisualproperties.Forexample,peopleweardifferentclothesandtakeavarietyofposes,whilecarscomeinvariousshapesandcolors.Wedescribeanobjectdetectionsystemthatrepresentshighlyvariableobjectsusingmixturesofmultiscaledeformablepartmodels.Thesemodelsaretrainedusingadiscriminativeprocedurethatonlyrequiresboundingboxesfortheobjectsinasetofimages.Theresultingsystemisbothefficientandaccurate,achievingstate-of-the-artresultsonthePASCALVOCbenchmarks[11],[12],[13]andtheINRIAPersondataset[10].Ourapproachbuildsonthepictorialstructuresframe-work[15],[20].Pictorialstructuresrepresentobjectsbyacollectionofpartsarrangedinadeformableconfiguration.Eachpartcaptureslocalappearancepropertiesofanobjectwhilethedeformableconfigurationischaracterizedbyspring-likeconnectionsbetweencertainpairsofparts.Deformablepartmodelssuchaspictorialstructuresprovideanelegantframeworkforobjectdetection.Yetithasbeendifficulttoestablishtheirvalueinpractice.Ondifficultdatasets,deformablepartmodelsareoftenoutperformedbysimplermodelssuchasrigidtemplates[10]orbag-of-features[44].Oneofthegoalsofourworkistoaddressthisperformancegap.Whiledeformablemodelscancapturesignificantvaria-tionsinappearance,asingledeformablemodelisoftennotexpressiveenoughtorepresentarichobjectcategory.Considertheproblemofmodelingtheappearanceofbicyclesinphotographs.Peoplebuildbicyclesofdifferenttypes(e.g.,mountainbikes,tandems,and19th-centurycycleswithonebigwheelandasmallone)andviewtheminvariousposes(e.g.,frontalversussideviews).Thesystemdescribedhereusesmixturemodelstodealwiththesemoresignificantvariations.Weareultimatelyinterestedinmodelingobjectsusing“visualgrammars.”Grammar-basedmodels(e.g.,[16],[24],[45])generalizedeformablepartmodelsbyrepresentingobjectsusingvariablehierarchicalstructures.Eachpartinagrammar-basedmodelcanbedefineddirectlyorintermsofotherparts.Moreover,grammar-basedmodelsallowfor,andexplicitlymodel,structuralvariations.Thesemodelsalsoprovideanaturalframeworkforsharinginformationandcomputationbetweendifferentobjectclasses.Forexample,differentmodelsmightsharereusableparts.Althoughgrammar-basedmodelsareourultimategoal,wehaveadoptedaresearchmethodologyunderwhichwegraduallymovetowardrichermodelswhilemaintainingahighlevelofperformance.Improvingperformancebyenrichedmodelsissurprisinglydifficult.Simplemodelshavehistoricallyoutperformedsophisticatedmodelsincomputervision,speechrecognition,machinetranslation,andinformationretrieval.Forexample,untilrecentlyspeechrecognitionandmachinetranslationsystemsbasedonn-gramlanguagemodelsoutperformedsystemsbasedongrammarsandphrasestructure.Inourexperience,IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.32,NO.9,SEPTEMBER20101627.P.F.FelzenszwalbandR.B.GirshickarewiththeDepartmentofComputerScience,UniversityofChicago,1100E.58thStreet,Chicago,IL60637.E-mail:{pff,rbg}@cs.uchicago.edu..D.McAllesteriswiththeToyotaTechnologicalInstituteatChicago,6045S.KenwoodAve.,Chicago,IL60637.E-mail:mcallester@tti-c.org..D.RamananiswiththeDepartmentofComputerScience,UniversityofCalifornia,Irvine,3019DonaldBrenHall,Irvine,CA92697.E-mail:dramanan@ics.uci.edu.Manuscriptreceived29May2009;accepted25July2009;publishedonline10Sept.2009.RecommendedforacceptancebyB.Triggs.Forinformationonobtainingreprintsofthisarticle,pleasesende-mailto:tpami@computer.org,andreferenceIEEECSLogNumberTPAMI-2009-05-0336.DigitalObjectIdentifierno.10.1109/TPAMI.2009.167.0162-8828/10/$26.002010IEEEPublishedbytheIEEEComputerSocietymaintainingperformanceseemstorequiregradualenrich-mentofthemodel.One