卷积神经网络CNN及其变种山世光中科院计算所InstituteofComputingTechnology,ChineseAcademyofSciencesCNN的早期历史卷积神经网络CNNK.Fukushima,“Neocognitron:Aself-organizingneuralnetworkmodelforamechanismofpatternrecognitionunaffectedbyshiftinposition,”BiologicalCybernetics,vol.36,pp.193–202,1980Y.LeCun,B.Boser,J.S.Denker,D.Henderson,R.E.Howard,W.Hubbard,andL.D.Jackel,“Backpropagationappliedtohandwrittenzipcoderecognition,”NeuralComputation,vol.1,no.4,pp.541–551,1989Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner,“Gradient-basedlearningappliedtodocumentrecognition,”ProceedingsoftheIEEE,vol.86,no.11,pp.2278–2324,19982InstituteofComputingTechnology,ChineseAcademyofSciencesDL时代的CNN扩展AKrizhevsky,ISutskever,GEHinton.ImageNetclassificationwithdeepconvolutionalneuralnetworks.NIPS2012Y.Jiaetal.Caffe:ConvolutionalArchitectureforFastFeatureEmbedding.ACMMM2014K.Simonyan,A.Zisserman.Verydeepconvolutionalnetworksforlarge-scaleimagerecognition.arXivpreprintarXiv:1409.1556,2014C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,A.Rabinovich.Goingdeeperwithconvolutions.CVPR2015(&arXiv:1409.4842,2014)3InstituteofComputingTechnology,ChineseAcademyofSciences卷积——示例4InstituteofComputingTechnology,ChineseAcademyofSciences卷积——形式化积分形式𝑠𝑡=𝑥𝑎𝑤𝑡−𝑎𝑑𝑎常用表达式𝑠𝑡=(𝑥∗𝑤)(𝑡)离散形式一维情况𝑠𝑡=𝑥∗𝑤𝑡=𝑥𝑎𝑤(𝑡−𝑎)+∞𝑎=−∞二维情况𝑠𝑖,𝑗=𝐼∗𝐾[𝑖,𝑗]=𝐼𝑚,𝑛𝐾[𝑖−𝑚,𝑗−𝑛]𝑛𝑚𝑠𝑖,𝑗=𝐼∗𝐾[𝑖,𝑗]=𝐼𝑖−𝑚,𝑗−𝑛𝐾[𝑚,𝑛]𝑛𝑚K称为kernel5InstituteofComputingTechnology,ChineseAcademyofSciences卷积——why?1.sparseinteractions有限连接,Kernel比输入小连接数少很多,学习难度小,计算复杂度低m个节点与n个节点相连O(mn)限定k(m)个节点与n个节点相连,则为O(kn)6InstituteofComputingTechnology,ChineseAcademyofSciences卷积——why?1.sparseinteractions有限连接,Kernel比输入小连接数少很多,学习难度小,计算复杂度低m个节点与n个节点相连O(mn)限定k(m)个节点与n个节点相连,则为O(kn)7InstituteofComputingTechnology,ChineseAcademyofSciences卷积——why?1.sparseinteractions有限(稀疏)连接Kernel比输入小局部连接连接数少很多学习难度小计算复杂度低层级感受野(生物启发)越高层的神经元,感受野越大8InstituteofComputingTechnology,ChineseAcademyofSciences卷积——why?2.ParameterSharing(参数共享)Tiedweights进一步极大的缩减参数数量3.Equivariantrepresentations等变性配合Pooling可以获得平移不变性对scale和rotation不具有此属性9InstituteofComputingTechnology,ChineseAcademyofSciencesCNN的基本结构三个步骤卷积突触前激活,net非线性激活DetectorPoolingLayer的两种定义复杂定义简单定义有些层没有参数10InstituteofComputingTechnology,ChineseAcademyofSciencesPooling11定义(没有需要学习的参数)replacestheoutputofthenetatacertainlocationwithasummarystatisticofthenearbyoutputs种类maxpooling(weighted)averagepoolingInstituteofComputingTechnology,ChineseAcademyofSciencesWhyPooling?12获取不变性小的平移不变性:有即可,不管在哪里很强的先验假设ThefunctionthelayerlearnsmustbeinvarianttosmalltranslationsInstituteofComputingTechnology,ChineseAcademyofSciencesWhyPooling?13获取不变性小的平移不变性:有即可,不管在哪里旋转不变性?9个不同朝向的kernels(模板)0.20.610.10.50.30.020.050.1InstituteofComputingTechnology,ChineseAcademyofSciencesWhyPooling?14获取不变性小的平移不变性:有即可,不管在哪里旋转不变性?9个不同朝向的kernels(模板)0.50.30.0210.40.30.60.30.1InstituteofComputingTechnology,ChineseAcademyofSciencesPooling与下采样结合更好的获取平移不变性更高的计算效率(减少了神经元数)15InstituteofComputingTechnology,ChineseAcademyofSciences从全连接到有限连接部分链接权重被强制设置为0通常:非邻接神经元,仅保留相邻的神经元全连接网络的特例,大量连接权重为016InstituteofComputingTechnology,ChineseAcademyofSciencesWhyConvolution&Pooling?apriorprobabilitydistributionovertheparametersofamodelthatencodesourbeliefsaboutwhatmodelsarereasonable,beforewehaveseenanydata.模型参数的先验概率分布(Nofreelunch)在见到任何数据之前,我们的信念(经验)告诉我们,什么样的模型参数是合理的Localconnections;对平移的不变性;tiedweigts来自生物神经系统的启发17InstituteofComputingTechnology,ChineseAcademyofSciences源起:Neocognitron(1980)SimplecomplexLowerorderhighorder18K.Fukushima,“Neocognitron:Aself-organizingneuralnetworkmodelforamechanismofpatternrecognitionunaffectedbyshiftinposition,”BiologicalCybernetics,vol.36,pp.193–202,1980LocalConnectionInstituteofComputingTechnology,ChineseAcademyofSciences源起:Neocognitron(1980)19InstituteofComputingTechnology,ChineseAcademyofSciences源起:Neocognitron(1980)训练方法分层自组织competitivelearning无监督输出层独立训练有监督20InstituteofComputingTechnology,ChineseAcademyofSciencesLeCun-CNN1989—用于字符识别简化了Neocognitron的结构训练方法监督训练BP算法正切函数收敛更快,SigmoidLoss,SGD用于邮编识别大量应用21InstituteofComputingTechnology,ChineseAcademyofSciencesLeCun-CNN1989—用于字符识别输入16x16图像L1—H112个5x5kernel8x8个神经元L2--H212个5x5x8kernel4x4个神经元L3—H330个神经元L4—输出层10个神经元总连接数5*5*12*64+5*5*8*12*16+192*30,约66,000个22InstituteofComputingTechnology,ChineseAcademyofSciencesLeCun-CNN1989—用于字符识别Tiedweights对同一个featuremap,kernel对不同位置是相同的!23InstituteofComputingTechnology,ChineseAcademyofSciencesLeCun-CNN1989—用于字符识别24InstituteofComputingTechnology,ChineseAcademyofSciences1998年LeNet——数字/字符识别LeNet-5Featuremapasetofunitswhoseweighsareconstrainedtobeidentical.25InstituteofComputingTechnology,ChineseAcademyofSciences1998年LeNet——数字/字符识别例如:C3层参数个数(3*6+4*9+6*1)*25+16=151626InstituteofComputingTechnology,ChineseAcademyofSciences后续:CNN用于目标检测与识别27InstituteofComputingTechnology,ChineseAcademyofSciencesAlexNetforImageNet(2012)大规模CNN网络650K神经元60M参数使用了各种技巧DropoutDataaugmentReLULocalResponseNormalizationContrastnormalization...28Krizhevsky,Alex,I