1506.02640v4-You-Only-Look-Once-YOLO-Unified--Real

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

YouOnlyLookOnce:Unified,Real-TimeObjectDetectionJosephRedmonUniversityofWashingtonpjreddie@cs.washington.eduSantoshDivvalaAllenInstituteforArtificialIntelligencesantoshd@allenai.orgRossGirshickFacebookAIResearchrbg@fb.comAliFarhadiUniversityofWashingtonali@cs.washington.eduAbstractWepresentYOLO,anewapproachtoobjectdetection.Priorworkonobjectdetectionrepurposesclassifierstoper-formdetection.Instead,weframeobjectdetectionasare-gressionproblemtospatiallyseparatedboundingboxesandassociatedclassprobabilities.Asingleneuralnetworkpre-dictsboundingboxesandclassprobabilitiesdirectlyfromfullimagesinoneevaluation.Sincethewholedetectionpipelineisasinglenetwork,itcanbeoptimizedend-to-enddirectlyondetectionperformance.Ourunifiedarchitectureisextremelyfast.OurbaseYOLOmodelprocessesimagesinreal-timeat45framespersecond.Asmallerversionofthenetwork,FastYOLO,processesanastounding155framespersecondwhilestillachievingdoublethemAPofotherreal-timedetec-tors.Comparedtostate-of-the-artdetectionsystems,YOLOmakesmorelocalizationerrorsbutisfarlesslikelytopre-dictfalsedetectionswherenothingexists.Finally,YOLOlearnsverygeneralrepresentationsofobjects.Itoutper-formsallotherdetectionmethods,includingDPMandR-CNN,byawidemarginwhengeneralizingfromnaturalim-agestoartworkonboththePicassoDatasetandthePeople-ArtDataset.1.IntroductionHumansglanceatanimageandinstantlyknowwhatob-jectsareintheimage,wheretheyare,andhowtheyin-teract.Thehumanvisualsystemisfastandaccurate,al-lowingustoperformcomplextaskslikedrivingwithlittleconsciousthought.Fast,accurate,algorithmsforobjectde-tectionwouldallowcomputerstodrivecarsinanyweatherwithoutspecializedsensors,enableassistivedevicestocon-veyreal-timesceneinformationtohumanusers,andunlockthepotentialforgeneralpurpose,responsiveroboticsys-tems.Currentdetectionsystemsrepurposeclassifierstoper-formdetection.Todetectanobject,thesesystemstakea1.Resizeimage.2.Runconvolutionalnetwork.3.Non-maxsuppression.Dog:0.30Person:0.64Horse:0.28Figure1:TheYOLODetectionSystem.ProcessingimageswithYOLOissimpleandstraightforward.Oursystem(1)resizestheinputimageto448448,(2)runsasingleconvolutionalnet-workontheimage,and(3)thresholdstheresultingdetectionsbythemodel’sconfidence.classifierforthatobjectandevaluateitatvariouslocationsandscalesinatestimage.Systemslikedeformablepartsmodels(DPM)useaslidingwindowapproachwheretheclassifierisrunatevenlyspacedlocationsovertheentireimage[10].MorerecentapproacheslikeR-CNNuseregionproposalmethodstofirstgeneratepotentialboundingboxesinanim-ageandthenrunaclassifierontheseproposedboxes.Afterclassification,post-processingisusedtorefinethebound-ingbox,eliminateduplicatedetections,andrescoretheboxbasedonotherobjectsinthescene[13].Thesecomplexpipelinesareslowandhardtooptimizebecauseeachindi-vidualcomponentmustbetrainedseparately.Wereframeobjectdetectionasasingleregressionprob-lem,straightfromimagepixelstoboundingboxcoordi-natesandclassprobabilities.Usingoursystem,youonlylookonce(YOLO)atanimagetopredictwhatobjectsarepresentandwheretheyare.YOLOisrefreshinglysimple:seeFigure1.Asin-gleconvolutionalnetworksimultaneouslypredictsmulti-pleboundingboxesandclassprobabilitiesforthoseboxes.YOLOtrainsonfullimagesanddirectlyoptimizesdetec-tionperformance.Thisunifiedmodelhasseveralbenefitsovertraditionalmethodsofobjectdetection.First,YOLOisextremelyfast.Sinceweframedetectionasaregressionproblemwedon’tneedacomplexpipeline.Wesimplyrunourneuralnetworkonanewimageattest1arXiv:1506.02640v4[cs.CV]12Nov2015timetopredictdetections.Ourbasenetworkrunsat45framespersecondwithnobatchprocessingonaTitanXGPUandafastversionrunsatmorethan150fps.Thismeanswecanprocessstreamingvideoinreal-timewithlessthan25millisecondsoflatency.Furthermore,YOLOachievesmorethantwicethemeanaverageprecisionofotherreal-timesystems.Forademoofoursystemrun-ninginreal-timeonawebcampleaseseeour(anonymous)YouTubechannel:[14],mistakesbackgroundpatchesinanimageforobjectsbecauseitcan’tseethelargercontext.YOLOmakeslessthanhalfthenumberofbackgrounderrorscomparedtoFastR-CNN.Third,YOLOlearnsgeneralizablerepresentationsofob-jects.Whentrainedonnaturalimagesandtestedonart-work,YOLOoutperformstopdetectionmethodslikeDPMandR-CNNbyawidemargin.SinceYOLOishighlygen-eralizableitislesslikelytobreakdownwhenappliedtonewdomainsorunexpectedinput.Allofourtrainingandtestingcodeisopensourceandavailableonlineat[removedforreview].Avarietyofpre-trainedmodelsarealsoavailabletodownload.2.UnifiedDetectionWeunifytheseparatecomponentsofobjectdetectionintoasingleneuralnetwork.Ournetworkusesfeaturesfromtheentireimagetopredicteachboundingbox.Italsopredictsallboundingboxesforanimagesimultane-ously.Thismeansournetworkreasonsgloballyaboutthefullimageandalltheobjectsintheimage.TheYOLOde-signenablesend-to-endtrainingandreal-timespeedswhilemaintaininghig

1 / 10
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功