Factor analysis using delta-rule wake-sleep learni

永恒吟唱师
1 ℃
2020-01-25

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

TechnicalReportNo.9607,DepartmentofStatistics,UniversityofTorontoFactorAnalysisUsingDelta-RuleWake-SleepLearningRadfordM.NealDepartmentofStatisticsandDepartmentofComputerScienceUniversityofTorontoradford@stat.utoronto.caPeterDayanDepartmentofBrainandCognitiveSciencesMassachusettsInstituteofTechnologydayan@ai.mit.edu24July1996Wedescribealinearnetworkthatmodelscorrelationsbetweenreal-valuedvisiblevari-ablesusingoneormorereal-valuedhiddenvariables—afactoranalysismodel.Thismodelcanbeseenasalinearversionofthe“Helmholtzmachine”,anditsparameterscanbelearnedusingthe“wake-sleep”method,inwhichlearningoftheprimary“generative”modelisassistedbya“recognition”model,whoseroleistoﬁllinthevaluesofhiddenvariablesbasedonthevaluesofvisiblevariables.Thegenerativeandrecognitionmodelsarejointlylearnedin“wake”and“sleep”phases,usingjustthedeltarule.ThislearningprocedureiscomparableinsimplicitytoOja’sversionofHebbianlearning,whichpro-ducesasomewhatdifferentrepresentationofcorrelationsintermsofprincipalcompo-nents.Wearguethatthesimplicityofwake-sleeplearningmakesfactoranalysisaplau-siblealternativetoHebbianlearningasamodelofactivity-dependentcorticalplasticity.1IntroductionActivity-dependentplasticityinthevertebratebrainhastypicallybeenmodeledintermsofHebbianlearning(Hebb1959),inwhichweightchangesarebasedonthecovarianceofpre-synapticandpost-synapticactivity(eg,vonderMalsburg1973;Linsker1986;Miller,Keller,andStryker1989).Thesemodelsderivesupportfromneurobiologicalevidenceoflong-termpotentiation(see,forexample,CollingridgeandBliss(1987),andforarecentreview,BaudryandDavis(1994)).Theyhavealsobeenseenasperformingareasonablefunction,namelyextractingthestatisticalstructureamongstacollectionofinputsintermsofprincipalcom-ponents(Linkser1988).Inthispaper,wesuggestthestatisticaltechniqueoffactoranalysisasaninterestingalternativetoprincipalcomponentsanalysis,andshowhowtoimplementitusinganalgorithmwhosedemandsonsynapticplasticityareaslocalasthoseoftheHebbrule.Factoranalysisisamodelforreal-valueddatainwhichcorrelationsare“explained”bypostulatingthepresenceofoneormoreunderlying“factors”.Thesefactorsplaytheroleof1“latent”or“hidden”variables,whicharenotdirectlyobservable,butwhichallowthedepen-denciesbetweenthe“visible”variablestobeexpressedinaconvenientway.Everitt(1984)givesagoodintroductiontolatentvariablemodelsingeneral,andtofactoranalysisinpar-ticular.Thesemodelsarewidelyusedinpsychologyandthesocialsciencesasawayofex-ploringwhetherobservedpatternsindatamightbeexplainableintermsofasmallnumberofunobservedfactors.Ourinterestinthesemodelsstemsfromtheirpotentialasawayofbuildinghigh-levelrepresentationsfromsensorydata.Oja’sversionofHebbianlearning(OjaandKarhunen1985;Oja1989,1992)isaparticu-larlyconvenientcounterpoint.Thisruleappliestoalinearunitwithweightvectorwthatcomputesanoutputy=wTxwhenpresentedwithareal-valuedinputvectorx(which,forconvenience,isassumedtohavemeanzero).Aftereachpresentationofaninputvector,theweightsfortheunitarechangedbyanamountgivenbythefollowingproportionality:w/y(xyw)=yxy2w:(1)Theﬁrstterminthisweightincrement,yx,isofHebbianform.Thesecondterm,y2w,tendstopushtheweightstowardszero,balancingthepositivefeedbackinplainHebbianlearn-ing,whichwouldotherwiseincreasethemagnitudeoftheweightswithoutbound.WyattandElfadel(1995)giveanexplicitanalysisoflearningbasedonequation(1),showingthatwithreasonablestartingconditions,wconvergestotheprincipaleigenvectorofthecovari-ancematrixoftheinputs—thatis,itconvergestoaunitvectorpointinginthedirectionofhighestvarianceintheinputspace.Extractingthesubsidiaryeigenvectorsofthecovari-ancematrixoftheinputsissomewhatmorechallenging,requiringsomeformofinhibitionbetweensuccessiveoutputunits(Sanger1989;F¨oldi´ak1989;Plumbley1993).Linsker(1988)viewsHebbianlearningasawayofmaximisingtheinformationretainedbyyaboutx.UnderthesimplifyingassumptionthatthedistributionoftheinputsisGaus-sian,settingtheoutputofaunittotheprojectionofitsinputontotheﬁrstprincipalcompo-nentoftheinputcovariancematrixconveysasmuchinformationaspossibleonaverage(seealsoPlumbley1993).Thisgoalseemsreasonablefortheveryearlystagesofsensoryprocess-ing,whereinformationbottleneckssuchastheopticnervemayplausiblybepresent.Note,however,thatitimplicitlyassumesthatallinformationisequallyimportant.Maximizingin-formationtransferseemslesscompellingasagoalforsubsequentlevelsofprocessing,oncesensorysignalshavereachedcortex.Severalothercomputationalgoalshavebeensuggestedfromthisstageupwards,includingfactorialcoding(Barlow1989),sparsiﬁcation(OlshausenandField1995),andvariousmethodsforencouragingthecortextorespectreasonableinvari-ances,suchastranslationorscaleinvarianceforvisualprocessing(LiandAtick1994).Inthispaper,wepursuethesuggestionofHintonandZemel(1994)(seealsoGrenander1976-1981;Mumford1994;Dayan,Hinton,Neal,andZemel1995)thatthecortexmightbeconstructingahierarchicalstochastic“generative”modelofitsinputinthetop-downcon-nections,whileimplementinginthebottom-upconnectionsa“recognition”modelthatinasenseistheinverseofthegenerativemodel.Therecognitionmodelprovideshigh-