1FacebookAIResearch,770Broadway,NewYork,NewYork10003USA.2NewYorkUniversity,715Broadway,NewYork,NewYork10003,USA.3DepartmentofComputerScienceandOperationsResearchUniversitédeMontréal,PavillonAndré-Aisenstadt,POBox6128Centre-VilleSTNMontréal,QuebecH3C3J7,Canada.4Google,1600AmphitheatreParkway,MountainView,California94043,USA.5DepartmentofComputerScience,UniversityofToronto,6King’sCollegeRoad,Toronto,OntarioM5S3G4,Canada.Machine-learningtechnologypowersmanyaspectsofmodernsociety:fromwebsearchestocontentfilteringonsocialnet-workstorecommendationsone-commercewebsites,anditisincreasinglypresentinconsumerproductssuchascamerasandsmartphones.Machine-learningsystemsareusedtoidentifyobjectsinimages,transcribespeechintotext,matchnewsitems,postsorproductswithusers’interests,andselectrelevantresultsofsearch.Increasingly,theseapplicationsmakeuseofaclassoftechniquescalleddeeplearning.Conventionalmachine-learningtechniqueswerelimitedintheirabilitytoprocessnaturaldataintheirrawform.Fordecades,con-structingapattern-recognitionormachine-learningsystemrequiredcarefulengineeringandconsiderabledomainexpertisetodesignafea-tureextractorthattransformedtherawdata(suchasthepixelvaluesofanimage)intoasuitableinternalrepresentationorfeaturevectorfromwhichthelearningsubsystem,oftenaclassifier,coulddetectorclassifypatternsintheinput.Representationlearningisasetofmethodsthatallowsamachinetobefedwithrawdataandtoautomaticallydiscovertherepresentationsneededfordetectionorclassification.Deep-learningmethodsarerepresentation-learningmethodswithmultiplelevelsofrepresenta-tion,obtainedbycomposingsimplebutnon-linearmodulesthateachtransformtherepresentationatonelevel(startingwiththerawinput)intoarepresentationatahigher,slightlymoreabstractlevel.Withthecompositionofenoughsuchtransformations,verycomplexfunctionscanbelearned.Forclassificationtasks,higherlayersofrepresentationamplifyaspectsoftheinputthatareimportantfordiscriminationandsuppressirrelevantvariations.Animage,forexample,comesintheformofanarrayofpixelvalues,andthelearnedfeaturesinthefirstlayerofrepresentationtypicallyrepresentthepresenceorabsenceofedgesatparticularorientationsandlocationsintheimage.Thesecondlayertypicallydetectsmotifsbyspottingparticulararrangementsofedges,regardlessofsmallvariationsintheedgepositions.Thethirdlayermayassemblemotifsintolargercombinationsthatcorrespondtopartsoffamiliarobjects,andsubsequentlayerswoulddetectobjectsascombinationsoftheseparts.Thekeyaspectofdeeplearningisthattheselayersoffeaturesarenotdesignedbyhumanengineers:theyarelearnedfromdatausingageneral-purposelearningprocedure.Deeplearningismakingmajoradvancesinsolvingproblemsthathaveresistedthebestattemptsoftheartificialintelligencecommu-nityformanyyears.Ithasturnedouttobeverygoodatdiscoveringintricatestructuresinhigh-dimensionaldataandisthereforeapplica-bletomanydomainsofscience,businessandgovernment.Inadditiontobeatingrecordsinimagerecognition1–4andspeechrecognition5–7,ithasbeatenothermachine-learningtechniquesatpredictingtheactiv-ityofpotentialdrugmolecules8,analysingparticleacceleratordata9,10,reconstructingbraincircuits11,andpredictingtheeffectsofmutationsinnon-codingDNAongeneexpressionanddisease12,13.Perhapsmoresurprisingly,deeplearninghasproducedextremelypromisingresultsforvarioustasksinnaturallanguageunderstanding14,particularlytopicclassification,sentimentanalysis,questionanswering15andlan-guagetranslation16,17.Wethinkthatdeeplearningwillhavemanymoresuccessesinthenearfuturebecauseitrequiresverylittleengineeringbyhand,soitcaneasilytakeadvantageofincreasesintheamountofavailablecom-putationanddata.Newlearningalgorithmsandarchitecturesthatarecurrentlybeingdevelopedfordeepneuralnetworkswillonlyacceler-atethisprogress.SupervisedlearningThemostcommonformofmachinelearning,deepornot,issuper-visedlearning.Imaginethatwewanttobuildasystemthatcanclassifyimagesascontaining,say,ahouse,acar,apersonorapet.Wefirstcollectalargedatasetofimagesofhouses,cars,peopleandpets,eachlabelledwithitscategory.Duringtraining,themachineisshownanimageandproducesanoutputintheformofavectorofscores,oneforeachcategory.Wewantthedesiredcategorytohavethehighestscoreofallcategories,butthisisunlikelytohappenbeforetraining.Wecomputeanobjectivefunctionthatmeasurestheerror(ordis-tance)betweentheoutputscoresandthedesiredpatternofscores.Themachinethenmodifiesitsinternaladjustableparameterstoreducethiserror.Theseadjustableparameters,oftencalledweights,arerealnumbersthatcanbeseenas‘knobs’thatdefinetheinput–outputfunc-tionofthemachine.Inatypicaldeep-learningsystem,theremaybehundredsofmillionsoftheseadjustableweights,andhundredsofmillionsoflabelledexampleswithwhichtotrainthemachine.Toproperlyadjusttheweightvector,thelearningalgorithmcom-putesagradientvectorthat,foreachweight,indicatesbywhatamounttheerrorwouldincreaseordecreaseiftheweightwereincreasedbyatinyamount.Theweightvectoristhenadjustedintheoppositedirec-tiontothegradientvector.Theobjectivefunction,averagedoverallthetrainingexamples,canDeeplearningallowscomputationalmodelsthatarecomposedofmult