2DHumanPoseEstimation:NewBenchmarkandStateoftheArtAnalysisMykhayloAndriluka1,3,LeonidPishchulin1,PeterGehler2,andBerntSchiele11MaxPlanckInstituteforInformatics,Germany2MaxPlanckInstituteforIntelligentSystems,Germany3StanfordUniversity,USAAbstractHumanposeestimationhasmadesignificantprogressduringthelastyears.Howevercurrentdatasetsarelimitedintheircoverageoftheoverallposeestimationchallenges.Stilltheseserveasthecommonsourcestoevaluate,trainandcomparedifferentmodelson.Inthispaperweintro-duceanovelbenchmark“MPIIHumanPose”1thatmakesasignificantadvanceintermsofdiversityanddifficulty,acontributionthatwefeelisrequiredforfuturedevelop-mentsinhumanbodymodels.Thiscomprehensivedatasetwascollectedusinganestablishedtaxonomyofover800humanactivities[1].Thecollectedimagescoverawidervarietyofhumanactivitiesthanpreviousdatasetsincludingvariousrecreational,occupationalandhouseholdingactiv-ities,andcapturepeoplefromawiderrangeofviewpoints.Weprovidearichsetoflabelsincludingpositionsofbodyjoints,full3Dtorsoandheadorientation,occlusionlabelsforjointsandbodyparts,andactivitylabels.Foreachim-ageweprovideadjacentvideoframestofacilitatetheuseofmotioninformation.Giventheserichannotationsweper-formadetailedanalysisofleadinghumanposeestimationapproachesandgaininginsightsforthesuccessandfail-uresofthesemethods.1.IntroductionRecentposeestimationmethodsemploycomplexap-pearancemodels[2,9,15]andrelyonlearningalgorithmstoestimatemodelparametersfromthetrainingdata.Theperformanceoftheseapproachescruciallydependsontheavailabilityoftheannotatedtrainingimagesthatarerep-resentativefortheappearanceofpeopleclothing,strongarticulation,partial(self-)occlusionsandtruncationatim-ageborders.Althoughthereexiststrainingsetsforspecialscenariossuchassportscenes[12,13]anduprightpeople[17,2],thesebenchmarksarestilllimitedintheirscopeandvariabilityofrepresentedactivities.Sportscenedatasets1Availableathuman-pose.mpi-inf.mpg.de.typicallyincludehighlyarticulatedposes,butarelimitedwithrespecttovariabilityofappearancesincepeoplearetypicallywearingtightsportsoutfits.Inturn,datasetssuchas“FashionPose”[2]and“Armlets”[9]aimtocollectim-agesofpeoplewearingavarietyofdifferentclothingtypes,andincludeocclusionsandtruncationbutaredominatedbypeopleinsimpleuprightstandingposes.Tothebestofourknowledgenoattempthasbeenmadetoestablishamorerepresentativebenchmarkaimingtocoverawidepalletofchallengesforhumanposeestima-tion.Webelievethatthishindersfurtherdevelopmentonthistopicandproposeanewbenchmark“MPIIHumanPose”.Ourbenchmarksignificantlyadvancesstateoftheartintermsofappearancevariabilityandcomplexity,andincludesmorethan40,000imagesofpeople.WeusedYouTubeasadatasourceandcollectedimagesandimagesequencesusingqueriesbasedonthedescriptionsofmorethan800activities.Thisresultsinadiversesetofimagescoveringnotonlydifferentactivities,butindoorandout-doorscenes,avarietyofimagingconditions,aswellasbothamateurandprofessionalrecordings(c.f.Fig.1).Thisal-lowsustostudyexistingbodyposeestimationtechniquesandidentifytheirindividualfailuremodes.RelatedworkThecommonlyusedpubliclyavailabledatasetsforevaluationof2DhumanposeestimationaresummarizedinTab.1accordingtotheyearofthecor-respondingpublication.Bothfullbodyandupperbodydatasetsareincluded.Existingbenchmarkscoveraspectsofthehumanposeestimationtasksuchassportscenes[12,21],frontal-facingpeople[8,3,17],peopleinteractingwithobjects[23],poseestimationingroupphotos[5]andposeestimationofpeo-pleperformingsynchronizedactivities[4].Earlierdatasetssuchas“Parse”[16]and“Buffy”[8]arestillcommonlyfoundinevaluations[22,15].Howeverthesmalltrainingsetsincludedinthesedatasetsmakethemun-suitablefortrainingmodelswithcomplexappearancerepre-sentationsandmultiplecomponents[13,17,2],whichhavebeenshowntoperformbest.1bicyclingconditioningexercisedancingfishingandhuntingbicycling,BMXskimachineballroomfish.fromriverbankhomeactivitieshomerepairinactivityquietlawnandgardentanninghidescarpentrysittingquietlydrivingtractormiscellaneousmusicplayingoccupationreligiousactivitiesstandingviolin,sittinghorsegroomingsit.,playinginstrum.runningselfcaresportstransportationrunning,stairs,uptakingmedicationsoccerridinginabusvolunteeractivitieswalkingwateractivitieswinteractivitiesplayingwithchildrenbirdwatchingsnorkelingskating,icedancingFigure1.Randomlychosenimagesfromeachof20activitycat-egoriesoftheproposed“MPIIHumanPose”dataset.Imagecap-tionsindicateactivitycategory(1strow)andactivity(2ndrow).Toviewthefulldatasetvisithuman-pose.mpi-inf.mpg.de.Someeffortshavebeenmadetocollectlargersetsofimages.Forexample[13]extendstheLSPdatasetto10;000imagesofpeopleperforminggymnastics,athleticsandparkour.[2]proposesalarge“FashionPose”datasetcollectedfromfashionblogs.Thisdatasetaimstocoverawidevarietyinpeopleclothing.TheLSPandFashion-Posedatasetsarecomplementaryandfocusontwodifferentchallengesforhumanposeestimation:posevariabilityandvariabilityofpeopleappearance.Howeversincetheyarecollectedwithaspecificfocusinmind,thesedatasetsdonotcoverreal-lifechallengessuchastruncation,occlusionsbysceneobjectsandvariabilityofimagingc