Deep-Sparse-Rectifier-Neural-Networks

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

315DeepSparseRecti erNeuralNetworksXavierGlorotAntoineBordesYoshuaBengioDIRO,UniversitedeMontrealMontreal,QC,Canadaglorotxa@iro.umontreal.caHeudiasyc,UMRCNRS6599UTC,Compiegne,FranceandDIRO,UniversitedeMontrealMontreal,QC,Canadaantoine.bordes@hds.utc.frDIRO,UniversitedeMontrealMontreal,QC,Canadabengioy@iro.umontreal.caAbstractWhilelogisticsigmoidneuronsaremorebi-ologicallyplausiblethanhyperbolictangentneurons,thelatterworkbetterfortrain-ingmulti-layerneuralnetworks.Thispa-pershowsthatrectifyingneuronsareanevenbettermodelofbiologicalneuronsandyieldequalorbetterperformancethanhy-perbolictangentnetworksinspiteofthehardnon-linearityandnon-di erentiabilityatzero,creatingsparserepresentationswithtruezeros,whichseemremarkablysuitablefornaturallysparsedata.Eventhoughtheycantakeadvantageofsemi-supervisedsetupswithextra-unlabeleddata,deeprecti ernet-workscanreachtheirbestperformancewith-outrequiringanyunsupervisedpre-trainingonpurelysupervisedtaskswithlargelabeleddatasets.Hence,theseresultscanbeseenasanewmilestoneintheattemptsatunder-standingthedicultyintrainingdeepbutpurelysupervisedneuralnetworks,andclos-ingtheperformancegapbetweenneuralnet-workslearntwithandwithoutunsupervisedpre-training.1IntroductionManydi erencesexistbetweentheneuralnetworkmodelsusedbymachinelearningresearchersandthoseusedbycomputationalneuroscientists.ThisisinpartAppearinginProceedingsofthe14thInternationalCon-ferenceonArti cialIntelligenceandStatistics(AISTATS)2011,FortLauderdale,FL,USA.Volume15ofJMLR:W&CP15.Copyright2011bytheauthors.becausetheobjectiveoftheformeristoobtaincom-putationallyecientlearners,thatgeneralizewelltonewexamples,whereastheobjectiveofthelatteristoabstractoutneuroscienti cdatawhileobtainingex-planationsoftheprinciplesinvolved,providingpredic-tionsandguidanceforfuturebiologicalexperiments.Areaswherebothobjectivescoincidearethereforeparticularlyworthyofinvestigation,pointingtowardscomputationallymotivatedprinciplesofoperationinthebrainthatcanalsoenhanceresearchinarti cialintelligence.Inthispaperweshowthattwocom-mongapsbetweencomputationalneurosciencemodelsandmachinelearningneuralnetworkmodelscanbebridgedbyusingthefollowinglinearbypartactiva-tion:max(0;x),calledtherecti er(orhinge)activa-tionfunction.Experimentalresultswillshowengagingtrainingbehaviorofthisactivationfunction,especiallyfordeeparchitectures(seeBengio(2009)forareview),i.e.,wherethenumberofhiddenlayersintheneuralnetworkis3ormore.Recenttheoreticalandempiricalworkinstatisticalmachinelearninghasdemonstratedtheimportanceoflearningalgorithmsfordeeparchitectures.Thisisinpartinspiredbyobservationsofthemammalianvi-sualcortex,whichconsistsofachainofprocessingelements,eachofwhichisassociatedwithadi erentrepresentationoftherawvisualinput.Thisispartic-ularlyclearintheprimatevisualsystem(Serreetal.,2007),withitssequenceofprocessingstages:detectionofedges,primitiveshapes,andmovinguptogradu-allymorecomplexvisualshapes.Interestingly,itwasfoundthatthefeatureslearnedindeeparchitecturesresemblethoseobservedinthe rsttwoofthesestages(inareasV1andV2ofvisualcortex)(Leeetal.,2008),andthattheybecomeincreasinglyinvarianttofactorsofvariation(suchascameramovement)inhigherlay-ers(Goodfellowetal.,2009).316DeepSparseRecti erNeuralNetworksRegardingthetrainingofdeepnetworks,somethingthatcanbeconsideredabreakthroughhappenedin2006,withtheintroductionofDeepBeliefNet-works(Hintonetal.,2006),andmoregenerallytheideaofinitializingeachlayerbyunsupervisedlearn-ing(Bengioetal.,2007;Ranzatoetal.,2007).Someauthorshavetriedtounderstandwhythisunsuper-visedprocedurehelps(Erhanetal.,2010)whileoth-ersinvestigatedwhytheoriginaltrainingprocedurefordeepneuralnetworksfailed(BengioandGlorot,2010).Fromthemachinelearningpointofview,thispaperbringsadditionalresultsintheselinesofinvestigation.Weproposetoexploretheuseofrectifyingnon-linearitiesasalternativestothehyperbolictangentorsigmoidindeeparti cialneuralnetworks,inad-ditiontousinganL1regularizerontheactivationval-uestopromotesparsityandpreventpotentialnumer-icalproblemswithunboundedactivation.NairandHinton(2010)presentpromisingresultsoftheinu-enceofsuchunitsinthecontextofRestrictedBoltz-mannMachinescomparedtologisticsigmoidactiva-tionsonimageclassi cationtasks.Ourworkextendsthisforthecaseofpre-trainingusingdenoisingauto-encoders(Vincentetal.,2008)andprovidesanexten-siveempiricalcomparisonoftherectifyingactivationfunctionagainstthehyperbolictangentonimageclas-si cationbenchmarksaswellasanoriginalderivationforthetextapplicationofsentimentanalysis.Ourexperimentsonimageandtextdataindicatethattrainingproceedsbetterwhenthearti cialneuronsareeithero oroperatingmostlyinalinearregime.Sur-prisingly,rectifyingactivationallowsdeepnetworkstoachievetheirbestperformancewithoutunsupervisedpre-training.Hence,ourworkproposesanewcontri-butiontothetrendofunderstandingandmergingtheperformancegapbetweendeepnetworkslearntwithandwithoutunsupervisedpre-training(Erhanetal.,2010;BengioandGlorot,2010).Still,recti ernet-workscanbene tfromunsupervisedpre-traininginthecontextofsemi-sup

1 / 9
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功