ConvolutionalnetworktrainingMultilayerperceptrons•Keyidea:buildcomplexfunctionsbycomposingsimplefunctionsf(x)=Wxf(x)=Wxf(x)=Wxg(x)=max(x,0)g(x)=max(x,0)xz1rowofzplottedforeveryvalueofx1rowofyplottedforeveryvalueofxyLinearfunction+translationinvariance=convolution•Localconnectivitydetermineskernelsize5.40.13.61.82.34.51.13.47.2Linearfunction+translationinvariance=convolution5.40.13.61.82.34.51.13.47.2•LocalconnectivitydetermineskernelsizeFeaturemapConvolutionwithmultiplefilters5.40.13.61.82.34.51.13.47.2FeaturemapConvolutionovermultiplechannels****++=Convolutionasaprimitivewhcwhc’Convolutioncc’Theconvolutionunit•Eachconvolutionalunittakesacollectionoffeaturemapsasinput,andproducesacollectionoffeaturemapsasoutput•Parameters:Filters(+bias)•Ifcininputfeaturemapsandcoutoutputfeaturemaps•Eachfilteriskxkxcin•Therearecoutsuchfilters•Otherhyperparameters:paddingConvolutionvsLinearunitLinearunit:100Kx100K=10Billionparameters100x100x10(100Kvalues)100x100x10(100Kvalues)3x3convolutionalunit:3x3x10x10=900parametersInvariancetodistortionsInvariancetodistortions:Pooling•Eachwindowisreducedtoasinglevalue,examplemaxoraverage…47693118321400121356794318521550016456Invariancetodistortions:MaxPooling…8211194856647693118321400121356794318521550016456Invariancetodistortions:AveragePooling…5.5103.54.752.75524447693118321400121356794318521550016456Globalaveragepooling…5.5wxhxc1x1xc=cdimensionalvectorThepoolingunit•Eachpoolingunittakesacollectionoffeaturemapsasinputandproducesacollectionoffeaturemapsasoutput•Outputfeaturemapsareusuallysmallerinheight/width•Parameters:NoneInvariancetodistortions:SubsamplingConvolutionsubsamplingconvolutionSmallneighborhoodsonsubsampledfeaturemap=largeneighborhoodonoriginalimageConvolutionsubsamplingconvolution•Convolutioninearlierstepsdetectsmorelocalpatternslessresilienttodistortion•Convolutioninlaterstepsdetectsmoreglobalpatternsmoreresilienttodistortion•Subsamplingallowscaptureoflarger,moreinvariantpatternsStridedconvolution•Convolutionwithstrides=standardconvolution+subsamplingbypicking1valueeverysvalues•Example:convolutionwithstride2=standardconvolution+subsamplingbyafactorof2ConvolutionalnetworksHorseConvolutionalnetworksHorseVisualizationsfrom:M.ZeilerandR.Fergus.VisualizingandUnderstandingConvolutionalNetworks.InECCV2014.ConvolutionalnetworksHorseVisualizationsfrom:M.ZeilerandR.Fergus.VisualizingandUnderstandingConvolutionalNetworks.InECCV2014.ConvolutionalNetworksandtheBrainSlidecredit:JitendraMalikReceptivefieldsofsimplecells(discoveredbyHubel&Wiesel)Slidecredit:JitendraMalikConvolutionalnetworksYannLeCun,LéonBottou,YoshuaBengio,andPatrickHaffner.Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE86.11(1998):2278-2324.ConvolutionalnetworksConvolutionalnetworksconvfilterssubsamplesubsampleconvlinearfiltersweightsLasttime•Linearclassifiersonpixelsbad,neednon-linearclassifiers•Multi-layerperceptronsoverparametrized•Reduceparametersbylocalconnectionsandshiftinvariance=Convolution•Interspersesubsamplingtocaptureeverlargerdeformations•StickafinalclassifierConvolutionalnetworksconvfilterssubsamplesubsampleconvlinearfiltersweightsEmpiricalRiskMinimizationConvolutionalnetworkGradientdescentupdateComputingthegradientofthelossConvolutionalnetworksconvfilterssubsamplesubsampleconvlinearfiltersweightsThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zThegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zRecurrencegoingbackward!!Thegradientofconvnetsf1f2f3f4f5xw1w2w3w4w5z1z2z3z4z5=zBackpropagationforasequenceoffunctionsPrevioustermFunctionderivativeBackpropagationforasequenceoffunctions•Assumewecancomputepartialderivativesofeachfunction•Useg(zi)tostoregradientofzw.r.tzi,g(wi)forwi•Calculategibyiteratingbackwards•UsegitocomputegradientofparametersBackpropagationforasequenceoffunctions•Each“function”hasa“forward”and“backward”module•Forwardmoduleforfi•takeszi-1andweightwiasinput•producesziasoutput•Backwardmoduleforfi•takesg(zi)asinput•producesg(zi-1)andg(wi)asoutputBackpropagationforasequenceoffunctionsfizi-1ziwiBackpropagationforasequenceoffunctionsfig(zi-1)g(zi)g(wi)ChainruleforvectorsJacobianLossasafunctionconvfilterssubsamplesubsampleconvlinearfiltersweightslosslabelBeyondsequences:computationgraphs•Arbitrarygraphsoffunctions•NodistinctionbetweenintermediateoutputsandparametersfhgklxywuzComputationgraph-Functions•Eachnodeimplementstwofunctions•A