Convolutional-Neural-Networks-for-Speech-Recogniti

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ConvolutionalNeuralNetworksforSpeechRecognitionOssamaAbdel-Hamid,Abdel-rahmanMohamed,HuiJiang,LiDeng,GeraldPenn,andDongYuIEEE/ACMTRANSACTIONSONAUDIO,SPEECH,ANDLANGUAGEPROCESSING,VOL.22,NO.10,OCTOBER2014Organizationofthepaper•Introduction•DeepNeuralNetworks•ConvolutionalNeuralNetworks•CNNwithlimitedweightsharing•Experiments•ConclusionsDeepNeuralNetworks•Generallyspeaking,adeepneuralnetwork(DNN)referstoafeedforwardneuralnetworkwithmorethanonehiddenlayer.Eachhiddenlayerhasanumberofunits(orneurons),eachofwhichtakesalloutputsofthelowerlayerasinput,multipliesthembyaweightvector,sumstheresultandpassesitthroughanon-linearactivationfunctionsuchassigmoidortanh.•Allneuronactivationsineachlayercanberepresentedinthefollowingmatrixform:•Foramulti-classclassificationproblem,theposteriorprobabilityofeachclasscanbeestimatedusinganoutputsoftmaxlayer:•InthehybridDNN-HMMmodel,theDNNoutputlayercomputesthestateposteriorprobabilitieswhicharedividedbythestates’priorstoestimatetheobservationlikelihoods.•BecauseoftheincreasedmodelcomplexityofDNNs,apretrainingalgorithmisoftenneeded,whichinitializesallweightmatricespriortotheaboveback-propagationalgorithm.•OnepopularmethodtopretrainDNNsusestherestrictedBoltzmannmachine(RBM).•AnRBMhasasetofhiddenunitsthatareusedtocomputeabetterfeaturerepresentationoftheinputdata.Afterlearning,allRBMweightscanbeusedasagoodinitializationforoneDNNlayer.ConvolutionalNeuralNetworks•Theconvolutionalneuralnetwork(CNN)canberegardedasavariantofthestandardneuralnetwork.Insteadofusingfullyconnectedhiddenlayersasdescribedintheprecedingsection,theCNNintroducesaspecialnetworkstructure,whichconsistsofconvolutionandpoolinglayers.ConvolutionalNeuralNetworks•OrganizationoftheInputDatatotheCNN•ConvolutionPly•PoolingPly•LearningWeightsintheCNN•TreatmentofEnergyFeatures•TheOverallCNNArchitecture•BenefitsofCNNsforASROrganizationoftheInputDatatotheCNN•InusingtheCNNforpatternrecognition,theinputdataneedtobeorganizedasanumberoffeaturemapstobefedintotheCNN.•Weneedtouseinputsthatpreservelocalityinbothaxesoffrequencyandtime.•MFSCfeatureswillbeusedtorepresenteachspeechframe,alongwiththeirdeltasanddelta-deltas,inordertodescribetheacousticenergydistributionineachofseveraldifferentfrequencybands.•Asfortime,asinglewindowofinputtotheCNNwillconsistofawideamountofcontext.Asforfrequency,theconventionaluseofMFCCsdoespresentamajorproblembecausethediscretecosinetransformprojectsthespectralenergiesintoanewbasisthatmaynotmaintainlocality.Inthispaper,weshallusethelog-energycomputeddirectlyfromthemel-frequencyspectralcoefficients(i.e.,withnoDCT),whichwewilldenoteasMFSCfeatures.anumberofone-dimensional(1-D)featuremapsthree2-DfeaturemapsConvolutionPly•Everyinputfeaturemapisconnectedtomanyfeaturemapsintheconvolutionplybasedonanumberoflocalweightmatrices.Themappingcanberepresentedasthewell-knownconvolutionoperationinsignalprocessing.•eachunitofonefeaturemapintheconvolutionplycanbecomputedas:•writteninamoreconcisematrixform:PoolingPly•Thepoolingplyisalsoorganizedintofeaturemaps,andithasthesamenumberoffeaturemapsasthenumberoffeaturemapsinitsconvolutionply,buteachmapissmaller.•Thisreductionisachievedbyapplyingapoolingfunctiontoseveralunitsinalocalregion.Itisusuallyasimplefunctionsuchasmaximizationoraveraging.PoolingPlyLearningWeightsintheCNN•Allweightsintheconvolutionplycanbelearnedusingthesameerrorback-propagationalgorithmbutsomespecialmodificationsareneededtotakecareofsparseconnectionsandweightsharing.•letusfirstrepresenttheconvolutionoperationineq.(9)inthesamemathematicalformasthefullyconnectedANNlayersothatthesamelearningalgorithmcanbesimilarlyapplied.•Sincethepoolingplyhasnoweights,nolearningisneededhere.However,theerrorsignalsshouldbeback-propagatedtolowerpliesthroughthepoolingfunction.•Thatis,theerrorsignalreachingthelowerconvolutionplycanbecomputedas:TreatmentofEnergyFeatures•InASR,log-energyisusuallycalculatedperframeandappendedtootherspectralfeatures.InaCNN,itisnotsuitabletotreatenergythesamewaysinceitisthesumoftheenergyinallfrequencybandsandsodoesnotdependonfrequency.Instead,thelog-energyfeaturesshouldbeappendedasextrainputstoallconvolutionunits.TreatmentofEnergyFeaturesTheOverallCNNArchitecture•Inthispaper,wefollowthehybridANN-HMMframework,whereweuseasoftmaxoutputlayerontopofthetopmostlayeroftheCNNtocomputetheposteriorprobabilitiesforallHMMstates.TheseposteriorsareusedtoestimatethelikelihoodofallHMMstatesperframebydividingbythestates’priorprobabilities.Finally,thelikelihoodsofallHMMstatesaresenttoaViterbidecodertorecognizethecontinuousstreamofspeechunits.BenefitsofCNNsforASR•TheCNNhasthreekeyproperties:locality,weightsharing,andpooling.•Localityintheunitsoftheconvolutionplyallowsmorerobustnessagainstnon-whitenoisewheresomebandsarecleanerthantheothers.Thisisbecausegoodfeaturescanbecomputedlocallyfromcleanerpartsofthespectrumandonlyasmallernumberoffeaturesareaffectedbythenoise.BenefitsofCNNsforASR•Weightsharingcanalsoimprovemodelrobustnessandreduceoverfittingaseachweightislearnedfrommultiplefrequencybandsintheinputinsteadofjustfromonesinglelo

1 / 42
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功