MATLAB在语音识别中的应用

天下第一城
0 ℃
2021-04-22

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

11.基于GUI的音频采集处理系统注：本实验是对“东、北、大、学、中、荷、学、院”孤立文字的识别！首先是GUI的建立，拖动所需控件，双击控件，修改控件的参数；主要有stringTag(这个是回调函数的依据)，其中还有些参数如valuestyle也是需要注意的，这个在实际操作中不能忽视。这里需要给说明一下：图中所示按钮都是在一个按钮组里面，都属于按钮组的子控件。所以在添加回调函数时，是在按钮组里面添加的，也就是说右击三个按钮外面的边框，选择ViewCallback——SelectionChange,则在主函数中显示该按钮的回调函数：functionuipanel1_SelectionChangeFcn(hObject,eventdata,handles)以第一个按钮“录音”为例讲解代码；2下面是“播放”和“保存”的代码：以上就是语音采集的全部代码。程序运行后就会出现这样的界面：3点击录音按钮，录音结束后就会出现相应波形：点击保存，完成声音的保存，保存格式为.wav。这就完成了声音的采集。42.声音的处理与识别2.1打开文件语音处理首先要先打开一个后缀为.wav的文件，这里用到的不是按钮组，而是独立的按钮，按钮“打开”的回调函数如下：functionpushbutton1_Callback(hObject,eventdata,handles)其中pushbutton1是“打开”按钮的Tag.在回调函数下添加如下代码：运行结果如图：562.2预处理回调函数如下：functionpushbutton2_Callback(hObject,eventdata,handles)运行结果如图：72.3短时能量短时能量下的回调函数：functionpushbutton3_Callback(hObject,eventdata,handles)其回调函数下的代码是：892.4端点检测这里要先声明一点，为了避免在以后的函数调用中，不能使用前面的变量，所以其实后面的函数都包含了前面的部分。显而易见这样程序就会显得很冗长，这也是值得以后修改的地方。functionpushbutton4_Callback(hObject,eventdata,handles)101112132.5生成模版本功能和上面重复的部分省略掉了，现在只补充添加的代码：142.6语音识别将打开的语音与提前录好的语音库进行识别，采用的是DTW算法。识别完后就会在相应的文本框里显示识别的文字。代码如下：15程序运行前后的对比图：16GUI的整体效果图：17总结实验已经实现了对“东、北、大、学、中、荷、学、院”文字的识别，前提是用模版的语音作为样本去和语音库测试，这已经可以保证１００％的正确率，这说明算法是正确的，只是需要优化。而现场录音和模版匹配时，则不能保证较高的正确率，这说明特征参数的提取这方面还不够完善。特征参数提取的原则是类内距离尽量小，类间距离尽量大的原则，这是需要以后完善的地方。ＧＵＩ也需要优化，先生成一个模版库，然后用待测语音和模版库语音识别，让这个模版库孤立出来，不需要每次测试都要重复生成模版库，提高运算速率。以后有机会可以实现连续语音的识别！18附件这是全部代码文件mfcc.mat文件是程序运行过程中生成的；test文件夹里面存放了录音的模版：这里是6个.M文件，如下：1WienerScalart96.mfunctionoutput=WienerScalart96(signal,fs,IS)%output=WIENERSCALART96(signal,fs,IS)%WienerfilterbasedontrackingaprioriSNRusingDecision-Directed%method,proposedbyScalartetal96.Inthismethoditisassumedthat%SNRpost=SNRprior+1.basedonthistheWienerFiltercanbeadaptedtoa19%modellikeEphraimsmodelinwhichwehaveagainfunctionwhichisa%functionofaprioriSNRandaprioriSNRisbeingtrackedusingDecision%Directedmethod.%Author:EsfandiarZavarehei%Created:MAR-05if(nargin3|isstruct(IS))IS=.25;%InitialSilenceorNoiseOnlypartinsecondsendW=fix(.025*fs);%Windowlengthis25msSP=.4;%Shiftpercentageis40%(10ms)%Overlap-Addmethodworksgoodwiththisvalue(.4)wnd=hamming(W);%IGNOREFROMHERE...............................if(nargin=3&isstruct(IS))%ThisoptionisforcompatibilitywithanotherprogrammeW=IS.windowsizeSP=IS.shiftsize/W;%nfft=IS.nfft;wnd=IS.window;ifisfield(IS,'IS')IS=IS.IS;elseIS=.25;endend%......................................UPTOHEREpre_emph=0;signal=filter([1-pre_emph],1,signal);NIS=fix((IS*fs-W)/(SP*W)+1);%numberofinitialsilencesegmentsy=segment(signal,W,SP,wnd);%Thisfunctionchopsthesignalintoframes20Y=fft(y);YPhase=angle(Y(1:fix(end/2)+1,:));%NoisySpeechPhaseY=abs(Y(1:fix(end/2)+1,:));%SpecrogramnumberOfFrames=size(Y,2);FreqResol=size(Y,1);N=mean(Y(:,1:NIS)')';%initialNoisePowerSpectrummeanLambdaD=mean((Y(:,1:NIS)').^2)';%initialNoisePowerSpectrumvariancealpha=.99;%usedinsmoothingxi(ForDeciesionDirectedmethodforestimationofAPrioriSNR)NoiseCounter=0;NoiseLength=9;%ThisisasmoothingfactorforthenoiseupdatingG=ones(size(N));%InitialGainusedincalculationofthenewxiGamma=G;X=zeros(size(Y));%InitializeX(memoryallocation)h=waitbar(0,'Wait...');fori=1:numberOfFrames%%%%%%%%%%%%%%%%VADandNoiseEstimationSTARTifi=NIS%IfinitialsilenceignoreVADSpeechFlag=0;NoiseCounter=100;else%ElseDoVAD[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad(Y(:,i),N,NoiseCounter);%MagnitudeSpectrumDistanceVADendifSpeechFlag==0%IfnotSpeechUpdateNoiseParametersN=(NoiseLength*N+Y(:,i))/(NoiseLength+1);%UpdateandsmoothnoisemeanLambdaD=(NoiseLength*LambdaD+(Y(:,i).^2))./(1+NoiseLength);%Updateandsmoothnoisevarianceend%%%%%%%%%%%%%%%%%%%VADandNoiseEstimationENDgammaNew=(Y(:,i).^2)./LambdaD;%ApostirioriSNRxi=alpha*(G.^2).*Gamma+(1-alpha).*max(gammaNew-1,0);%Decision21DirectedMethodforAPrioriSNRGamma=gammaNew;G=(xi./(xi+1));X(:,i)=G.*Y(:,i);%ObtainthenewCleanedvaluewaitbar(i/numberOfFrames,h,num2str(fix(100*i/numberOfFrames)));endclose(h);output=OverlapAdd2(X,YPhase,W,SP*W);%Overlap-addSynthesisofspeechoutput=filter(1,[1-pre_emph],output);%UndotheeffectofPre-emphasisfunctionReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen);%Y=OverlapAdd(X,A,W,S);%Yisthesignalreconstructedsignalfromitsspectrogram.Xisamatrix%witheachcolumnbeingthefftofasegmentofsignal.Aisthephase%angleofthespectrumwhichshouldhavethesamedimensionasX.ifitis%notgiventhephaseangleofXisusedwhichinthecaseofrealvaluesis%zero(assumingthatitsthemagnitude).Wisthewindowlengthoftime%domainsegmentsifnotgiventhelengthisassumedtobetwiceaslongas%fftwindowlength.Sistheshiftlengthofthesegmentationprocess(for%exampleinthecaseofnonoverlappingsignalsitisequaltoWandinthe%caseof%50overlapisequaltoW/2.ifnotgivvenW/2isused.Yisthe%reconstructedtimedomainsignal.%Sep-04%EsfandiarZavareheiifnargin2yphase=angle(XNEW);22endifnargin3windowLen=size(XNEW,1)*2;endifnargin4ShiftLen=windowLen/2;endiffix(ShiftLen)~=ShiftLenShiftLen=fix(ShiftLen);disp('Theshiftlengthhavetobeanintegerasitisthenumberofsamples.')disp(['shiftlengthisfixedto'num2str(ShiftLen)])end[FreqResFrameNum]=size(XNEW);Spec=XNEW.*exp(j*yphase);ifmod(windowLen,2)%ifFreqResolisoddSpec=[Spec;flipud(conj