SUBMISSIONTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE1PedestrianDetection:AnEvaluationoftheStateoftheArtPiotrDoll´ar,ChristianWojek,BerntSchiele,andPietroPeronaAbstract—Pedestriandetectionisakeyproblemincomputervision,withseveralapplicationsthathavethepotentialtopositivelyimpactqualityoflife.Inrecentyears,thenumberofapproachestodetectingpedestriansinmonocularimageshasgrownsteadily.However,multipledatasetsandwidelyvaryingevaluationprotocolsareused,makingdirectcomparisonsdifficult.Toaddresstheseshortcomings,weperformanextensiveevaluationofthestateoftheartinaunifiedframework.Wemakethreeprimarycontributions:(1)weputtogetheralarge,well-annotatedandrealisticmonocularpedestriandetectiondatasetandstudythestatisticsofthesize,positionandocclusionpatternsofpedestriansinurbanscenes,(2)weproposearefinedper-frameevaluationmethodologythatallowsustocarryoutprobingandinformativecomparisons,includingmeasuringperformanceinrelationtoscaleandocclusion,and(3)weevaluatetheperformanceofsixteenpre-trainedstate-of-the-artdetectorsacrosssixdatasets.Ourstudyallowsustoassessthestateoftheartandprovidesaframeworkforgaugingfutureefforts.Ourexperimentsshowthatdespitesignificantprogress,performancestillhasmuchroomforimprovement.Inparticular,detectionisdisappointingatlowresolutionsandforpartiallyoccludedpedestrians.IndexTerms—pedestriandetection,objectdetection,benchmark,evaluation,dataset,CaltechPedestrianDatasetF1INTRODUCTIONPeopleareamongthemostimportantcomponentsofamachine’senvironment,andendowingmachineswiththeabilitytointeractwithpeopleisoneofthemostinterestingandpotentiallyusefulchallengesformodernengineering.Detectingandtrackingpeopleisthusanimportantareaofresearch,andmachinevisionisboundtoplayakeyrole.Applicationsincluderobotics,enter-tainment,surveillance,carefortheelderlyanddisabled,andcontent-basedindexing.JustintheUS,nearly5,000ofthe35,000annualtrafficcrashfatalitiesinvolvepedes-trians[1],hencetheconsiderableinterestinbuildingautomatedvisionsystemsfordetectingpedestrians[2].Whilethereismuchongoingresearchinmachinevisionapproachesfordetectingpedestrians,varyingevaluationprotocolsanduseofdifferentdatasetsmakesdirectcomparisonsdifficult.Basicquestionssuchas“Docurrentdetectorsworkwell?”,“Whatisthebestapproach?”,“Whatarethemainfailuremodes?”and“Whatarethemostproductiveresearchdirections?”arenoteasilyanswered.Ourstudyaimstoaddressthesequestions.Wefo-cusonmethodsfordetectingpedestriansinindividualmonocularimages;foranoverviewofhowdetectorsareincorporatedintofullsystemswereferreadersto[2].Ourapproachisthree-pronged:wecollect,annotateandstudyalargedatasetofpedestrianimagescollectedfromavehiclenavigatinginurbantraffic;wedevelopinfor-mativeevaluationmethodologiesandpointoutpitfallsinpreviousexperimentalprocedures;finally,wecom-P.Doll´arandP.PeronaarewiththeDepartmentofElectricalEngineering,CaliforniaInstituteofTechnology,Pasadena,CA.C.WojekandB.SchielearewithMPIInformatics,Saarbr¨ucken,Germany.(a)Caltech[3](b)Caltech-Japan[3](c)ETH[4](d)TUD-Brussels[5](e)Daimler[6](f)INRIA[7]Fig.1.Exampleimages(cropped)andannotationsfromsixpedestriandetectiondatasets.Weperformanextensiveevalu-ationofpedestriandetection,benchmarkingsixteendetectorsoneachofthesesixdatasets.Byusingmultipledatasetsandaunifiedevaluationframeworkwecandrawbroadconclusionaboutthestateoftheartandsuggestfutureresearchdirections.paretheperformanceofsixteenpre-trainedpedestriandetectorsonsixpubliclyavailabledatasets,includingourown.Ourstudyallowsustoassessthestateoftheartandsuggestsdirectionsforfutureresearch.Allresultsofthisstudy,andthedataandtoolsforreproducingthem,arepostedontheprojectwebsite::Inearlierwork[3],weintroducedtheCaltechPedestrianDataset,whichincludes350,000pedestrianboundingboxeslabeledin250,000framesandremainsthelargestsuchdatasettodate.Occlusionsandtemporalcorrespondencesarealsoannotated.Usingtheextensivegroundtruth,weanalyzethestatisticsofpedestrianscale,occlusion,andlocationandhelpestablishcondi-tionsunderwhichdetectionsystemsmustoperate.EvaluationMethodology:Weaimtoquantifyandrankdetectorperformanceinarealisticandunbiasedmanner.Tothiseffect,weexploreanumberofchoicesintheevaluationprotocolandtheireffectonreportedperformance.Overall,themethodologyhaschangedsubstantiallysince[3],resultinginamoreaccurateandinformativebenchmark.Evaluation:Weevaluatesixteenrepresentativestate-of-the-artpedestriandetectors(previouslyweevaluatedseven[3]).Ourgoalwastochoosediversedetectorsthatweremostpromisingintermsoforiginallyre-portedperformance.Weavoidretrainingormodifyingthedetectorstoensureeachmethodwasoptimizedbyitsauthors.Inadditiontooverallperformance,weexploredetectionratesundervaryinglevelsofscaleandocclusionandonclearlyvisiblepedestrians.Moreover,wemeasurelocalizationaccuracyandanalyzeruntime.Toincreasethescopeofouranalysis,wealsobench-markthesixteendetectorsusingaunifiedevalua-tionframeworkonsixadditionalpedestriandetectionda