arXiv:0802.3152v1[math.ST]21Feb2008EfficientEstimationofMultidimensionalRegressionModelusingMultilayerPerceptronsJosephRynkiewicz1Universit´eParisI-SAMOS/MATISSE90ruedeTolbiac,Paris-FranceFebruary21,2008AbstractThisworkconcernstheestimationofmultidimensionalnonlinearre-gressionmodelsusingmultilayerperceptrons(MLPs).Themainproblemwithsuchmodelsisthatweneedtoknowthecovariancematrixofthenoisetogetanoptimalestimator.However,weshowinthispaperthatifwechooseasthecostfunctionthelogarithmofthedeterminantoftheempiricalerrorcovariancematrix,thenwegetanasymptoticallyoptimalestimator.Moreover,undersuitableassumptions,weshowthatthiscostfunctionleadstoaverysimpleasymptoticlawfortestingthenumberofparametersofanidentifiableMLP.Numericalexperimentsconfirmthetheoreticalresults.keywordsnon-linearregression,multivariateregression,multilayerPercep-trons,asymptoticnormality1IntroductionLetusconsiderasequence(Yt,Zt)t∈Nofi.i.d.(i.e.independent,identicallydis-tributed)randomvectors,withYtad-dimensionalvector.Eachcouple(Yt,Zt)hasthesamelawasagenericvariable(Y,Z),butitisnothardtogeneralizeallthatweshowinthispaperforstationarymixingvariablesandthereforefortimeseries.WeassumethatthemodelcanbewrittenasYt=FW0(Zt)+εtwhere•FW0isafunctionrepresentedbyanMLPwithparametersorweightsW0.•(εt)isani.i.d.-centerednoisewithunknowninvertiblecovariancematrixΓ0.1Thiscorrespondstomultivariatenon-linearleastsquaremodel,asinchap-ters3.1and5.1ofGallant[5].Indeed,anMLPfunctioncanbeseenasaparametricnon-linearfunction,forexampleanonehiddenlayerMLPusinghyperbolictangentastransfertfunctions(tanh)canbewrittenFW0(Zt)= F1W0(Zt),···,FdW0(Zt)T,whereTdenotesthetranspositionofthematrix,with:FiW0(z)HXj=1aijtanhLXk=1wjkzk+wj0!+ai0whereHisthenumberofhiddenunitsandListhedimensionoftheinputz,thentheparametervectoris(a10,···,adH,w10,···,wHL)∈R(H+1)×d+(L+1)×HTherearesomeobvioustransformationsthatcanbeappliedtoanMLPwithoutchangingitsinput-outputmap.Forinstance,supposewepickanhiddennodejandwechangethesignofalltheweightswijfori=0,···,H,andalsothesignofallaijfori=0,···,d.Sincetanhisodd,thiswillnotalterthecontributionofthisnodetothetotalnetoutput.Anotherpossibilityistointerchangetwohiddennodes,thatis,totaketwohiddennodesj1andj2andrelabelj1asj2andj2asj1,takingcaretoalsorelabelthecorrespondingweights.Thesetransformationsformafinitegroup(seeSussmann[10]).WewillconsiderequivalenceclassesofonehiddenlayerMLPs:twoMLPsareinthesameclassifthefirstoneistheimagebysuchtransformationofthesecondone,theconsideredsetofparametersisthenthequotientspaceofparametersbythisfinitegroup.Inthisspace,weassumethatthemodelisidentifiableitmeansthatthetruemodelbelongstotheconsideredfamilyofmodelsandthatweconsiderMLPswithoutredundantunits.ThisisaverystrongassumptionbutitisknownthatestimatedweightsofanMLPswithredundantunitscanhaveaverystrangeasymptoticbehavior(seeKukumizu[4]),becausetheHessianmatrixissingular.TheconsequenceoftheidentifiabilityofthemodelisthattheHessianmatrixcomputedinthesequelwillbedefinitepositive(seeFukumizu[3]).InthesequelwewillalwaysassumethatweareundertheassumptionsmakingtheHessianmatrixdefinitepositive.1.1EfficientestimationApopularchoicefortheassociatedcostfunctionisthemeansquareerror:1nnXt=1kYt−FW(Zt)k2(1)wherek.kdenotestheEuclideannormonRd.Althoughthisfunctioniswidelyused,itiseasytoshowthatwethengetasuboptimalestimator,withalargerasymptoticvariancethattheestimatorminimizingthegeneralizedmeansquare2error:1nnXt=1(Yt−FW(Zt))TΓ−10(Yt−FW(Zt))(2)But,weneedtoknowthetruecovariancematrixofthenoisetousethiscostfunction.ApossiblesolutionistouseanapproximationΓofthecovarianceerrormatrixΓ0tocomputethegeneralizedleastsquaresestimator:1nnXt=1(Yt−FW(Zt))TΓ−1(Yt−FW(Zt))(3)Awaytoconstructasequenceof(Γk)k∈N∗yieldingagoodapproximationofΓ0isthefollowing:usingtheordinaryleastsquaresestimatorˆW1n,thenoisecovariancecanbeapproximatedbyΓ1:=ΓˆW1n:=1nnXt=1(Yt−FˆW1n(Zt))(Yt−FˆW1n(Zt))T.(4)then,wecanusethisnewcovariancematrixtofindageneralizedleastsquaresestimatorˆW2n:ˆW2n=argminW1nnXt=1(Yt−FW(Zt))T(Γ1)−1(Yt−FW(Zt))(5)andcalculateagainanewcovariancematrixΓ2:=ΓˆW2n=1nnXt=1(Yt−FˆW2n(Zt))(Yt−FˆW2n(Zt))T.ItcanbeshownthatthisproceduregivesasequenceofparametersˆWn→Γ1→ˆW2n→Γ2→···minimizingthelogarithmofthedeterminantoftheempiricalcovariancematrix(seechapter5inGallant[5]):Un(W):=logdet1nnXt=1(Yt−FW(Zt))(Yt−FW(Zt))T!(6)TheuseofthiscostfunctionforneuralnetworkshasbeenintroducedbyWilliamsin1996[12],howeveritstheoreticalandpracticalpropertieshavenotyetbeenstudied.Here,thecalculationoftheasymptoticpropertiesofUn(W)willshowthatthiscostfunctionleadstoanasymptoticallyoptimalestimator,withthesameasymptoticvariancethattheestimatorminimizing(2),wesaythenthattheestimatoris“efficient”.31.2testingthenumberofparametersLetqbeanintegerlessthans,wewanttotest“H0:W∈Θq⊂Rq”against“H1:W∈Θs⊂Rs”,wherethesetsΘqandΘsarecompactandΘq⊂Θs.H0expressesthefactthatWbelongstoasubsetΘqofΘswithaparametricdimensionlesserthansor,equivalently,thats−qweightsoftheMLPinΘsarenull.Ifweconsidertheclassicalmeansquareerrorcostfunction:Vn(W)=Pnt=1kYt−FW(Zt)k2,wegetthefollowingteststati