1NIPS2010WorkshoponDeepLearningandUnsupervisedFeatureLearningTutorialonDeepLearningandApplicationsHonglakLeeUniversityofMichiganCo-organizers:YoshuaBengio,GeoffHinton,YannLeCun,AndrewNg,andMarc’AurelioRanzato*Includesslidematerialsourcedfromtheco-organizers2Outline•Deeplearning–Greedylayer-wisetraining(forsupervisedlearning)–Deepbeliefnets–Stackeddenoisingauto-encoders–Stackedpredictivesparsecoding–DeepBoltzmannmachines•Applications–Vision–Audio–Language3Outline•Deeplearning–Greedylayer-wisetraining(forsupervisedlearning)–Deepbeliefnets–Stackeddenoisingauto-encoders–Stackedpredictivesparsecoding–DeepBoltzmannmachines•Applications–Vision–Audio–Language4Motivation:whygodeep?•DeepArchitecturescanberepresentationallyefficient–Fewercomputationalunitsforsamefunction•DeepRepresentationsmightallowforahierarchyorrepresentation–Allowsnon-localgeneralization–Comprehensibility•Multiplelevelsoflatentvariablesallowcombinatorialsharingofstatisticalstrength•Deeparchitecturesworkwell(vision,audio,NLP,etc.)!5DifferentLevelsofAbstraction•HierarchicalLearning–Naturalprogressionfromlowleveltohighlevelstructureasseeninnaturalcomplexity–Easiertomonitorwhatisbeinglearntandtoguidethemachinetobettersubspaces–Agoodlowerlevelrepresentationcanbeusedformanydistincttasks6GeneralizableLearning•SharedLowLevelRepresentations–Multi-TaskLearning–UnsupervisedTrainingrawinputtask1outputtask3outputtask2outputsharedintermediaterepresentation……………task1outputy1taskNoutputyNHigh-levelfeaturesLow-levelfeatures•PartialFeatureSharing–MixedModeLearning–CompositionofFunctions7ANeuralNetwork•ForwardPropagation:–Suminputs,produceactivation,feed-forward8ANeuralNetwork•Training:BackPropagationofError–Calculatetotalerroratthetop–Calculatecontributionstoerrorateachstepgoingbackwardst2t19DeepNeuralNetworks•Simpletoconstruct–Sigmoidnonlinearityforhiddenlayers–Softmaxfortheoutputlayer•But,backpropagationdoesnotworkwell(ifrandomlyinitialized)–Deepnetworkstrainedwithbackpropagation(withoutunsupervisedpretraining)performworsethanshallownetworks(Bengioetal.,NIPS2007)10ProblemswithBackPropagation•Gradientisprogressivelygettingmoredilute–Belowtopfewlayers,correctionsignalisminimal•Getsstuckinlocalminima–Especiallysincetheystartoutfarfrom‘good’regions(i.e.,randominitialization)•Inusualsettings,wecanuseonlylabeleddata–Almostalldataisunlabeled!–Thebraincanlearnfromunlabeleddata12DeepNetworkTraining(thatactuallyworks)•Useunsupervisedlearning(greedylayer-wisetraining)–Allowsabstractiontodevelopnaturallyfromonelayertoanother–Helpthenetworkinitializewithgoodparameters•Performsupervisedtop-downtrainingasfinalstep–Refinethefeatures(intermediatelayers)sothattheybecomemorerelevantforthetask13Outline•Deeplearning–Greedylayer-wisetraining(forsupervisedlearning)–Deepbeliefnets–Stackeddenoisingauto-encoders–Stackedpredictivesparsecoding–DeepBoltzmannmachines•Applications–Vision–Audio–Language14•Probabilisticgenerativemodel•Deeparchitecture–multiplelayers•Unsupervisedpre-learningprovidesagoodinitializationofthenetwork–maximizingthelower-boundofthelog-likelihoodofthedata•Supervisedfine-tuning–Generative:Up-downalgorithm–Discriminative:backpropagationDeepBeliefNetworks(DBNs)Hintonetal.,200615DBNstructure1h2h3hvVisiblelayerHiddenlayersRBMDirectedbeliefnets),()|()...|()|(),...,,,(112lllllPPPPPhhhhhhhvhhhv21121Hintonetal.,200617DBNGreedytraining•Firststep:–ConstructanRBMwithaninputlayervandahiddenlayerh–TraintheRBMHintonetal.,200618DBNGreedytraining•Secondstep:–StackanotherhiddenlayerontopoftheRBMtoformanewRBM–Fix,samplefromasinput.TrainasRBM.2W1W1W2W)|(1vhQ1h)|(1vhQHintonetal.,200619DBNGreedytraining•Thirdstep:–Continuetostacklayersontopofthenetwork,trainitaspreviousstep,withsamplesampledfrom•Andsoon…2W1W3W3h)|(12hhQ)|(1vhQ)|(12hhQHintonetal.,200620Whygreedytrainingworks?•RBMspecifiesP(v,h)fromP(v|h)andP(h|v)–ImplicitlydefinesP(v)andP(h)•Keyideaofstacking–KeepP(v|h)from1stRBM–ReplaceP(h)bythedistributiongeneratedby2ndlevelRBMHintonetal.,200621Whygreedytrainingworks?•Easyapproximateinference–P(hk+1|hk)approximatedfromtheassociatedRBM–ApproximationbecauseP(hk+1)differsbetweenRBMandDBN•Training:–VariationalboundjustifiesgreedylayerwisetrainingofRBMs2W1W3W3h)|(1vhQ)|(12hhQTrainedbythesecondlayerRBMHintonetal.,200622Outline•Deeplearning–Greedylayer-wisetraining(forsupervisedlearning)–Deepbeliefnets–Stackeddenoisingauto-encoders–Stackedpredictivesparsecoding–DeepBoltzmannmachines•Applications–Vision–Audio–Language23DenoisingAuto-Encoder•Corrupttheinput(e.g.set25%ofinputsto0)•Reconstructtheuncorruptedinput•UseuncorruptedencodingasinputtonextlevelKL(reconstruction|rawinput)Hiddencode(representation)CorruptedinputRawinputreconstruction(Vincentetal,2008)24DenoisingAuto-Encoder•Learnsavectorfieldtowardshigherprobabilityregions•Minimizesvariationallowerboundonagenerativemodel•CorrespondstoregularizedscorematchingonanRBMCorruptedinputCorruptedinput(Vincentetal,2008)25Stacked(Denoising)Auto-Encoders•GreedyLayerwiselearning–Startwiththelowestlevelandstackupwards–Traineachlayerofauto-encoderontheinte