SpeechRecognition语音识别——ByTerrySpeechRecognitionSpeechrecognitionisahightechnologyofprocessingvoicesignalintocorrespondingtextsandcommandsbymachinerecognitionandunderstanding.Speechrecognitiontechnologyhasinvolvedsignalprocessing,patternrecognition,probabilitytheoryandinformationtheory,vocalmechanism,hearingmechanismandartificialintelligence.Speechrecognitiontechnologyismainlyconsistofthreemodule,includingfeatureextraction,patternmatchingtechnologyandmodeltraining.SpeechRecognitionTheHistoryofSpeechRecognitionDevelopment1959TenphonemerecognitionsystemAudrySystem,BellLabs20th,50slate60stoearly70sLPC,DTWVQ,HMMSphinxSystem,CarnegieMellonUniversity,ANN,HMM80s90sIBM,Apple,AT&TandNTTAHotAreainAI,MoreprocessingMethod,NowadaysSpeechRecognitionCategoryofmethod:IsolatedwordrecognitionConnectedwordrecognitionContinuousspeechRecognitionSpecificpersonrecognitionNon-specificpersonrecognitionSmallvocabularyMedianvocabularyLargevocabularyInfinitevocabularySpeechRecognitionMainlyMethods:TemplateMatchingDTW(DynamicTimeWarping)VQ(VectorQuantization)HMMDHMM(DiscreteHiddenMarkovModel)CHMM(ContinuousHiddenMarkovModel)SCHMM(Semi-ContinuousHiddenMarkovModel)ANN(ArtificialNeuralNet)SpeechRecognitionSignalPre-processingFramming-5msto50msEndpointdetection-detectthestartingpointandterminalpointSpeechEnhancement-inhibitnoiseandimprovespeechqualityICA-IndependentComponentAnalysisSpeechRecognitionFeatureExtractionLPC-LinearPredictioncoefficientLPCC-LinearPredictionCepstrumCoefficientMFCC-MelFrequencyCepstrumCoefficientCepstrum:njnjwenxeX)()(njnjwenxeX)()(njnjwenxeX)()(deeXmcjmjw|)(|ln21)(SpeechRecognitionSpeechRecognitionTemplateMatchingDTW(DynamicTimeWarping)VQ(VectorQuantization)HMMDHMM(DiscreteHiddenMarkovModel)CHMM(ContinuousHiddenMarkovModel)SCHMM(Semi-ContinuousHiddenMarkovModel)ANN(ArtificialNeuralNet)CRSIntroductionMatlabGUICRSIntroductionPeocedurePre-ProcessingFeatureExtractionDTW+VQCRSIntroductionPre-ProcessingPre-emphasisWindowing-Non-stationarysignalRectangleWindowHanningWindowHaimingWindow1()1HZuZ20.540.46cos()()10nwnNCRSIntroductionFeatureExtractionEndPointDetectionShort-timeenergyZerocrossingrate(DoubleGates)MFCC-BasedonAuditoryModelDFTDFT逆DFT信号频谱对数倒谱CRSIntroductionTemplateMatchingTemplatesetsselectingSingleOptimalSelectionMethodSFS(SequenceForwardSelecting)SBS(SequenceBackwardSelecting)GRNN(GeneralRegressionNeuralNetwork)Templatesubsets(ourown)ClassifyingaccordingtothesizeofframeA.20B.30C.ElseCRSIntroductionTemplateMatchingDTWAlgorithmNnnNnnjiCWWnynxdD11])),(),(([min)1()(,2,1)1()(,2,1,0)()1(:)(,1)1(:nwnwnwnwnwnwMNww连续条件边界条件CRSIntroductionTemplateMatchingDTWAlgorithm.)(,),()1()(,)1()(,1),(:)]2,(),1,(),,(),(min[],1[),1(的约束条件取值满足就是其中nwmnmngnwnwnwnwmngmnDmnDmngmnDmndmnDNnnNnnjiCWWnynxdD11])),(),(([minCRSIntroductionTemplateMatchingDTWAlgorithmDP(DynamicProgramming)123fori21(1.);1?(1,1):Re;2?(1,2):Re;(,)(,)min([1,2,3]);tondoforjtomdoDdijDjDijalMaxDjDijalMaxDijdijDDDendendCRSIntroductionClassicK-NNSorttheDistance-SequnceBySmalltolargeFindthefirstKdistanceelementsThebestmatch(result)isthenumberwiththelargestproportionintheKelementsOurOwn:WeightedK-NNCRSIntroductionVQTrainingCodeBook),(1)(1^TtittixxdTCDiitCx^),(minarg^ijCiyityxtdxij)(min)(iikCDCDCRSIntroductionRecognitionExperimentPerformance:70%~90%Themostrobustnumber:5Confusednumbers:0and6,2and8numberwiththeworstperformance:3Twowaveofnumber3CRSIntroductionThat’sallThanks!