TheoryoftheBackpropagationNeuralNetworkRobertHecht-NielsenHNC,Inc.5501OberlinDriveSanDiego,CA92121619-546-8877andDepartmentofElectricalandComputerEngineeringUniversityofCaliiomiaatSanDiegoLaJolla,CA92139AbstractBackpropagationiscurrentlythemostwidelyappliedneuralnetworkarchitecture.Tbeinformationprocessingoperationthatitcarriesoutistheapproximationofamappingorfunctionf:ACR-R,fromaboundedsubsetAofn-dimensionalEuclideanspacetoabonndedsubsetAA]ofm-dimensionalEuclideanspace,bymeansoftrainingonexamples(XI,yl),(XZ,y~),...,(Xt.yt),...ofthemapping'saction,whereyk=f(Xk).ItisassumedthatsuchexamplesaregeneratedbyselectingXIvectorsrandomlyfromAinaccordancewithafixedprobabilitydensityfunctionAx).Thispaperpresentsasurveyofthebasictheoryofthebackpropagationneuralnetworkarchitecturecoveringtheareasof:architectnraldesign,performancemeasurement,functionapproximationcapability.andlearning.Thesurveyincludespreviouslyknownmaterial,aswellassomenewresults:aformulationofthebackpropagationneuralnetworkarchitecturetomakeitavalidneuralnetwork(pastformnlationsviolatedthelocalityofprocessingrestriction)andaproofthatthebackpropagationmeansqnarederrorfunctionexistsandisdifferentiable.AlsoinclndedisatheoremshowingthatanyLzfunctionfrom[O,I]toRcanbeimplemenkdtoanydesireddegreeof.ccuracywithathreclayerbackpropagationnenralnetwork.Finally,anAppendixpresentsaspecnlativenenrophysiologicalmodelillustratinghowthebackpropagationneuralnetworkarchitecturemightplausiblybeimplementedinthemammalianbrainforcorticwcorticallearningbetweennearbyregionsofcerebralcortex.1IntroductionWithoutquestion,backpropagationiscurrentlythemostwidelyappliedneuralnetworkarchitecture.Thispopularityprimarilyrevolvesaroundtheabilityofbackpropagationnetworkstolearncomplicatedmultidimensionalmapping.Onewaytolookatthisabilityisthat,inthewordsofWerbos[49,50,52],backpropagationgoes'BeyondRegression.Backpropagationhasacolorfulhistory.Apparently,itwasoriginallyintroducedbyBrysonandBoin1969[5]andindependentlyrediscoveredbyWerbosin1974(521,byParkerinthemid1980's[41,39,40]andbyRumelhart,WilliamsandothermembersofthePDPgroupin1985[46,44,1].AlthoughthePDPgroupbecameawareofParker'sworkshortlyaftertheirdiscovery(theycitedParker's1985reportintheirfirstpapersonbackpropagation[59,46]),Werbos'workwasnotwidelyappreciateduntilmid-1987,whenitwasfoundbyParker.TheworkofBrpnandHOwasdiscoveredin1988byleCun[34].Evenearlierincarnationsmayyetemerge.Notwithstandingitscheckeredhistory,thereisnoquestionthatcreditfordevelopinKbackpropagationintoausabletechnique,aswellaspromulgatbnofthearchitecturetoalargeaudience,restsentirelywithRumelhartandtheothermembersofthePDPgroup[45].Beforetheirwork,backpropagationwasunappreciatedandobscure.Today,itisamainstayofneurocomputing.Oneofthecrucialdecisionsinthedesignofthebackpropagationarchitectureistheselectionofasigmoidalactivationfunction(seeSection2below).Historically,sigmoidalactivationfunctionshavebeenusedbyanumberofinvestigators.GroaPbergwasthefirstadvocateoftheuseofsigmoidfunctionsinneuralnetworks[18];althoughhisreasonsforusingthemarenotcloselyrelatedtotheirroleinbackpropagation(see[E]foradiscussionoftherelationshipbetweenthesetwobodiesofwork).Sejnowski,Hinton,andAckley[21,45]andHopfield[23]providedstillotherreasonsforusingsigmoidalactivationfunctions,butagain,thesearenotdirectlyrelatedtobackpropagation.Thechoiceofsigmoidactivationforbackpropagation(atleastforthePDPgroupreincarnationofthearchitecture)wasmadequiteconsciouslybyWilliams,baseduponhis1983studyofactivationfunctions[60].Asweshallsee,itwasapropitiouschoice.2BackpropagationNeuralNetworkArchitectureThisSectionreviewsthearchitectureofthebackpropagationneuralnetwork.Thetransferfunctionequationsforeachprocessingelementareprovidedforboththeforwardandbackwardpasses.First,werecallthedefinitionofaneuralnetwork:Definition1Aneuralnetworkisapamllcl,distributedinformationprocessingstructureconsistingofprocessingelements(whichcanpossessalocalmemoyandcancarryoutlocalizedinformationprocessingoperations)interconnectedtogetherwithunidirectionalsignalchannelscalledconnections.Eachprocessingelementhasasingleoutputconnectionwhichbranches(fansout)intoasmanycollatemlconnectionsasdesired(eachcanyingthesamesignal-theprocessingelementoutputsignal).Theprocessingelementoatputsignalcanbeofanymathematicaltypedesired.AlloftheprocessingthatgoesonMthineachprocessingelemenimustbecompletelylocal:i.e.,itmustdependonlyuponthecamntvaluesoftheinputsignalsam'uingattheprocessingelementviaimpingingconnectionsanduponvaluesstoredintheprocessingelement'alocalmemory.Theimportanceofrestatingtheneuralnetworkdefinitionrelatestothefactthat(aspointedoutbyCarpenterandGrossberg[E])traditionalformsofthebackpropagationarchitectureare,infact,notneuralnetworks.Theyviolatethelocalityofprocessingrestriction.Thenewbackpropagationneuralnetworkarchitecturepresentedbeloweliminatesthisobjection,whileretainingthetraditionalmathematicalformofthearchitecture.Thebackpropagationneuralnetworkarchitectureisahierarchicaldesignconsistingoffullyinterconnectedlayersorrowsofproces