ImageNet-Classification-with-Deep-Convolutional-Ne

蓝冰黄金
1 ℃
2020-02-11

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

ImageNetClassiﬁcationwithDeepConvolutionalNeuralNetworksAlexKrizhevskyUniversityofTorontokriz@cs.utoronto.caIlyaSutskeverUniversityofTorontoilya@cs.utoronto.caGeoffreyE.HintonUniversityofTorontohinton@cs.utoronto.caAbstractWetrainedalarge,deepconvolutionalneuralnetworktoclassifythe1.2millionhigh-resolutionimagesintheImageNetLSVRC-2010contestintothe1000dif-ferentclasses.Onthetestdata,weachievedtop-1andtop-5errorratesof37.5%and17.0%whichisconsiderablybetterthanthepreviousstate-of-the-art.Theneuralnetwork,whichhas60millionparametersand650,000neurons,consistsofﬁveconvolutionallayers,someofwhicharefollowedbymax-poolinglayers,andthreefully-connectedlayerswithaﬁnal1000-waysoftmax.Tomaketrain-ingfaster,weusednon-saturatingneuronsandaveryefﬁcientGPUimplemen-tationoftheconvolutionoperation.Toreduceoverﬁttinginthefully-connectedlayersweemployedarecently-developedregularizationmethodcalled“dropout”thatprovedtobeveryeffective.WealsoenteredavariantofthismodelintheILSVRC-2012competitionandachievedawinningtop-5testerrorrateof15.3%,comparedto26.2%achievedbythesecond-bestentry.1IntroductionCurrentapproachestoobjectrecognitionmakeessentialuseofmachinelearningmethods.Toim-provetheirperformance,wecancollectlargerdatasets,learnmorepowerfulmodels,andusebet-tertechniquesforpreventingoverﬁtting.Untilrecently,datasetsoflabeledimageswererelativelysmall—ontheorderoftensofthousandsofimages(e.g.,NORB[16],Caltech-101/256[8,9],andCIFAR-10/100[12]).Simplerecognitiontaskscanbesolvedquitewellwithdatasetsofthissize,especiallyiftheyareaugmentedwithlabel-preservingtransformations.Forexample,thecurrent-besterrorrateontheMNISTdigit-recognitiontask(0.3%)approacheshumanperformance[4].Butobjectsinrealisticsettingsexhibitconsiderablevariability,sotolearntorecognizethemitisnecessarytousemuchlargertrainingsets.Andindeed,theshortcomingsofsmallimagedatasetshavebeenwidelyrecognized(e.g.,Pintoetal.[21]),butithasonlyrecentlybecomepossibletocol-lectlabeleddatasetswithmillionsofimages.ThenewlargerdatasetsincludeLabelMe[23],whichconsistsofhundredsofthousandsoffully-segmentedimages,andImageNet[6],whichconsistsofover15millionlabeledhigh-resolutionimagesinover22,000categories.Tolearnaboutthousandsofobjectsfrommillionsofimages,weneedamodelwithalargelearningcapacity.However,theimmensecomplexityoftheobjectrecognitiontaskmeansthatthisprob-lemcannotbespeciﬁedevenbyadatasetaslargeasImageNet,soourmodelshouldalsohavelotsofpriorknowledgetocompensateforallthedatawedon’thave.Convolutionalneuralnetworks(CNNs)constituteonesuchclassofmodels[16,11,13,18,15,22,26].Theircapacitycanbecon-trolledbyvaryingtheirdepthandbreadth,andtheyalsomakestrongandmostlycorrectassumptionsaboutthenatureofimages(namely,stationarityofstatisticsandlocalityofpixeldependencies).Thus,comparedtostandardfeedforwardneuralnetworkswithsimilarly-sizedlayers,CNNshavemuchfewerconnectionsandparametersandsotheyareeasiertotrain,whiletheirtheoretically-bestperformanceislikelytobeonlyslightlyworse.1DespitetheattractivequalitiesofCNNs,anddespitetherelativeefﬁciencyoftheirlocalarchitecture,theyhavestillbeenprohibitivelyexpensivetoapplyinlargescaletohigh-resolutionimages.Luck-ily,currentGPUs,pairedwithahighly-optimizedimplementationof2Dconvolution,arepowerfulenoughtofacilitatethetrainingofinterestingly-largeCNNs,andrecentdatasetssuchasImageNetcontainenoughlabeledexamplestotrainsuchmodelswithoutsevereoverﬁtting.Thespeciﬁccontributionsofthispaperareasfollows:wetrainedoneofthelargestconvolutionalneuralnetworkstodateonthesubsetsofImageNetusedintheILSVRC-2010andILSVRC-2012competitions[2]andachievedbyfarthebestresultseverreportedonthesedatasets.Wewroteahighly-optimizedGPUimplementationof2Dconvolutionandalltheotheroperationsinherentintrainingconvolutionalneuralnetworks,whichwemakeavailablepublicly1.Ournetworkcontainsanumberofnewandunusualfeatureswhichimproveitsperformanceandreduceitstrainingtime,whicharedetailedinSection3.Thesizeofournetworkmadeoverﬁttingasigniﬁcantproblem,evenwith1.2millionlabeledtrainingexamples,soweusedseveraleffectivetechniquesforpreventingoverﬁtting,whicharedescribedinSection4.Ourﬁnalnetworkcontainsﬁveconvolutionalandthreefully-connectedlayers,andthisdepthseemstobeimportant:wefoundthatremovinganyconvolutionallayer(eachofwhichcontainsnomorethan1%ofthemodel’sparameters)resultedininferiorperformance.Intheend,thenetwork’ssizeislimitedmainlybytheamountofmemoryavailableoncurrentGPUsandbytheamountoftrainingtimethatwearewillingtotolerate.OurnetworktakesbetweenﬁveandsixdaystotrainontwoGTX5803GBGPUs.AllofourexperimentssuggestthatourresultscanbeimprovedsimplybywaitingforfasterGPUsandbiggerdatasetstobecomeavailable.2TheDatasetImageNetisadatasetofover15millionlabeledhigh-resolutionimagesbelongingtoroughly22,000categories.TheimageswerecollectedfromthewebandlabeledbyhumanlabelersusingAma-zon’sMechanicalTurkcrowd-sourcingtool.Startingin2010,aspartofthePascalVisualObjectChallenge,anannualcompetitioncalledtheImageNetLarge-ScaleVisualRecognitionChallenge(ILSVRC)hasbeenheld.ILSVRCusesasubsetofImageNetwithroughly1000imagesin