深度学习的随机矩阵理论模型邱才明2/25/2020深度学习理论-AReview3Page.▪神经网络将许多单一的神经元连接在一起▪一个神经元的输出作为另一个神经元的输入▪多层神经网络模型可以理解为多个非线性函数“嵌套”▪多层神经网络层数可以无限叠加▪具有无限建模能力,可以拟合任意函数多层神经网络前向传播1122((((,,),,)),,)nnffffzwbwbwb4Page.▪Sigmoid▪Tanh▪Rectifiedlinearunits(ReLU)常用激活函数1()1zfze()zzzzeefzee0()00zzfzz5Page.层数逐年增加今天1000layers6Page.▪Featuresarelearnedratherthanhand-crafted▪Morelayerscapturemoreinvariances▪Moredatatotraindeepernetworks▪Morecomputing(GPUs)▪Betterregularization:Dropout▪Newnonlinearities▪Maxpooling,Rectifiedlinearunits(ReLU)▪Theoreticalunderstandingofdeepnetworksremainsshallow为什么深度学习性能如此之好?[1]Razavian,Azizpour,Sullivan,Carlsson,CNNFeaturesoff-the-shelf:anAstoundingBaselineforRecognition.CVPRW’14[2]Srivastava,N.,Hinton,G.E.,Krizhevsky,A.,Sutskever,I.,andSalakhutdinov,R.(2014).Dropout:asimplewaytopreventneuralnetworksfromoverfitting.Journalofmachinelearningresearch,15(1):1929–1958.[3]Ioffe,S.andSzegedy,C.(2015).Batchnormalization:Acceleratingdeepnetworktrainingbyreducinginternalcovariateshift.InInternationalConferenceonMachineLearning,pages448–456.7Page.▪ExperimentalNeuroscienceuncovered:▪NeuralarchitectureofRetina/LGN/V1/V2/V3/etc▪Existenceofneuronswithweightsandactivationfunctions(simplecells)▪Poolingneurons(complexcells)▪AllthesefeaturesaresomehowpresentinDeepLearningsystems神经科学带来的启示NeuroscienceDeepNetworkSimplecellsFirstlayerComplexcellsPoolingLayerGrandmothercellsLastlayer8Page.▪OlshausenandFielddemonstratedthatreceptivefieldslearnedfromimagepatches.▪OlshausenandFieldshowedthatoptimizationprocesscandrivelearningimagerepresentations.OlshausenandField’sWork(Nature,1996)9Page.▪Olshausen-Fieldrepresentationsbearstrongresemblancetodefinedmathematicalobjectsfromharmonicanalysiswavelets,ridgelets,curvelets.▪Harmonicanalysis:longhistoryofdevelopingoptimalrepresentationsviaoptimization▪Researchin1990's:WaveletsetcareoptimalsparsifyingtransformsforcertainclassesofimagesHarmonicanalysis10Page.▪Classpredictionrulecanbeviewedasfunctionf(x)ofhigh-dimensionalargument▪CurseofDimensionality▪Traditionaltheoreticalobstacletohigh-dimensionalapproximation▪FunctionsofhighdimensionalxcanwiggleintoomanydimensionstobelearnedfromfinitedatasetsApproximationTheory11Page.▪Approximationtheory▪Perceptronsandmultilayerfeedforwardnetworksareuniversalapproximators:Cybenko’89,Hornik’89,Hornik’91,Barron‘93▪Optimizationtheory▪Nospuriouslocaloptimaforlinearnetworks:Baldi&Hornik’89▪Stuckinlocalminima:Brady‘89▪Stuckinlocalminima,butconvergenceguaranteesforlinearlyseparabledata:Gori&Tesi‘92▪Manifoldofspuriouslocaloptima:Frasconi’97EarlyTheoreticalResultsonDeepLearning[1]Cybenko.Approximationsbysuperpositionsofsigmoidalfunctions,MathematicsofControl,Signals,andSystems,2(4),303-314,1989.[2]Hornik,StinchcombeandWhite.Multilayerfeedforwardnetworksareuniversalapproximators,NeuralNetworks,2(3),359-366,1989.[3]Hornik.ApproximationCapabilitiesofMultilayerFeedforwardNetworks,NeuralNetworks,4(2),251–257,1991.[4]Barron.Universalapproximationboundsforsuperpositionsofasigmoidalfunction.IEEETransactionsonInformationTheory,39(3):930–945,1993.[5]PBaldi,KHornik,Neuralnetworksandprincipalcomponentanalysis:Learningfromexampleswithoutlocalminima,Neuralnetworks,1989.[6]Brady,Raghavan,Slawny.Backpropagationfailstoseparatewhereperceptronssucceed.IEEETransCircuits&Systems,36(5):665–674,1989.[7]Gori,Tesi.Ontheproblemoflocalminimainbackpropagation.IEEETrans.onPatternAnalysisandMachineIntelligence,14(1):76–86,1992.[8]Frasconi,Gori,Tesi.Successesandfailuresofbackpropagation:Atheoretical.ProgressinNeuralNetworks:Architecture,5:205,1997.12Page.▪Invariance,stability,andlearningtheory▪Scatteringnetworks:Bruna’11,Bruna’13,Mallat’13▪DeformationstabilityforLipschitznon-linearities:Wiatowski’15▪Distanceandmargin-preservingembeddings:Giryes’15,Sokolik‘16▪Geometry,generalizationboundsanddepthefficiency:Montufar’15,Neyshabur’15,Shashua’14’15’16▪……RecentTheoreticalResultsonDeepLearning[1]Bruna-Mallat.Classificationwithscatteringoperators,CVPR’11.Invariantscatteringconvolutionnetworks,arXiv’12.MallatWaldspurger.DeepLearningbyScattering,arXiv’13.[2]Wiatowski,Bölcskei.Amathematicaltheoryofdeepconvolutionalneuralnetworksforfeatureextraction.arXiv2015.[3]Giryes,Sapiro,ABronstein.DeepNeuralNetworkswithRandomGaussianWeights:AUniversalClassificationStrategy?arXiv:1504.08291.[4]Sokolic.MarginPreservationofDeepNeuralNetworks,2015[5]Montufar.GeometricandCombinatorialPerspectivesonDeepNeuralNetworks,2015.[6]Neyshabur.TheGeometryofOptimizationandGeneralizationinNeuralNetworks:APath-basedApproach,2015.13Page.▪Optimizationtheoryandalgorithms▪Learninglow-degreepolynomialsfromrandominitialization:Andoni‘14▪Characterizinglosssurfaceandattackingthesaddlepointproblem:Dauphin‘14,Choromanska’15,Chaudhuri‘15▪Globaloptimalityinneuralnetworktraining:Haeffele’15▪Non-convexoptimization:Dauphin’14▪TrainingNNsusingtensormethods:Janzamin’15▪……RecentTheoreticalResultsonDeepLearning[7]Andoni,Panigraphy,Valiant,Zhang.LearningPolynomialswithNeuralNetworks.ICML2014.[8]Dauphin,Pascanu,Gulcehre,Cho,Ganguli,Bengio,Identifyingandatt