基于神经网络的自动口语识别(IJITCS-V10-N8-2)

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

I.J.InformationTechnologyandComputerScience,2018,8,11-17PublishedOnlineAugust2018inMECS()DOI:10.5815/ijitcs.2018.08.02Copyright©2018MECSI.J.InformationTechnologyandComputerScience,2018,8,11-17AutomaticSpokenLanguageRecognitionwithNeuralNetworksValentinGazeauDepartmentofComputerScienceatSamHoustonStateUniversity,Huntsville,TX,USAE-mail:vcg006@shsu.eduCihanVarolDepartmentofComputerScienceatSamHoustonStateUniversity,Huntsville,TX,USAE-mail:cxv007@shsu.eduReceived:22June2018;Accepted:15July2018;Published:08August2018Abstract—TranslationhasbecomeveryimportantinourgenerationaspeoplewithcompletelydifferentculturesandlanguagesarenetworkedtogetherthroughtheInternet.NowadaysonecaneasilycommunicatewithanyoneintheworldwiththeservicesofGoogleTranslateand/orothertranslationapplications.Humanscanalreadyrecognizelanguagesthattheyhavepriorybeenexposedto.Eventhoughtheymightnotbeabletotranslate,theycanhaveagoodideaofwhatthespokenlanguageis.ThispaperdemonstrateshowdifferentNeuralNetworkmodelscanbetrainedtorecognizedifferentlanguagessuchasFrench,English,Spanish,andGerman.ForthetrainingdatasetvoicesampleswerechoosedfromShtooka,VoxForge,andYoutube.Fortestingpurposes,notonlydatafromthesewebsites,butalsopersonallyrecordedvoiceswereused.Attheend,thisresearchprovidestheaccuracyandconfidencelevelofmultipleNeuralNetworkarchitectures,SupportVectorMachineandHiddenMarkovModel,withtheHiddenMarkovModelyieldingthebestresultsreachingalmost70percentaccuracyforalllanguages.IndexTerms—HiddenMarkovModel,LanguageIdentification,LanguageTranslation,NeuralNetworks,SupportVectorMachine.I.INTRODUCTIONThereareroughly6,500spokenlanguagesavailableintheworldtoday.Whilesomeofthemarepopular,i.e.Chinesespokenbyover1Billionpeople,thereareotherssuchasLiki,Njerepwhichareonlyusedbylessthan1,000people.Definitely,understandingthespokenlanguageandcorrespondingaccordinglywhenneededarevitaltocreatecommunicationbetweendifferentbackgroundandlanguagespeakingpeople.Spokenlanguagerecognitionreferstotheautomaticprocessthatdeterminestheidentityofthelanguagespokeninaspeechsample.Thistechnologycanbeusedforawiderangeofmultilingualspeechprocessingapplications,suchasspokenlanguagetranslationand/ormultilingualspeechrecognition[1].Inpractice,spokenlanguagerecognitionisfarmorechallengingthantext-basedlanguagerecognitionbecausethereisnoguaranteethatamachineisabletotranscribespeechtotextwithouterrors.Weknowthathumansrecognizelanguagesthroughaperceptualorpsychoacousticprocessthatisinherenttotheauditorysystem.Therefore,thetypeofspeechperceptionthathumanlistenersuseisalwaysthesourceofinspirationforautomaticspokenlanguagerecognition[2].ThispaperoutlineshowNeuralNetworkscanbetrainedviaSupportVectorMachineandHiddenMarkovModeltoautomaticallyidentifythelanguagedirectlyfromspeechusinglibrariessuchasTensorFlow[3].Thiscanbedoneinmanydifferentwaysastheresultsdependonanumberoffactors:forexamplethediversityandsizeofthetrainingdata,and/ortheNeuralNetworkmodel.Italsodemonstrateshowthetrainingdatawasgathered,theproblemsfacedandhowtestingwasconducted.Thepaperisorganizedasfollows:SectionIIintroducespreviousattemptsatautomaticspokenlanguageidentificationusingphonemeclassifiers,statisticalrecurrentNeuralNetworks,anddeeplearningapproaches.SectionIIIcoverswhatmodelswerechosentoimplementtheNeuralNetworkaswellashowthetrainingdatawasgatheredandhowthetrainingwasconducted.SectionIVdetailshowthetestingwasperformedandhowtheperformanceofthemultiplemodelswereevaluatedaswellastheresults.TheconclusionisdrawnoutinSectionValongwiththefutureimprovementsthatcanbemade.YourgoalistosimulatetheusualappearanceofpapersinaJournaloftheAcademyPublisher.Wearerequestingthatyoufollowtheseguidelinesascloselyaspossible.II.RELATEDWORKSSeveralattemptshavealreadybeenmadetorecognizespokenlanguageswithdifferentdatasets.WhileNeuralNetworkswasusedforvarietyofidentification/classificationproblems[4,5],thissectioncoverssomeofthemostpromisingrecentworksinautomaticspokenlanguageidentificationusingNeuralNetworks:Srivastavaetal.[6]usealanguage12AutomaticSpokenLanguageRecognitionwithNeuralNetworksCopyright©2018MECSI.J.InformationTechnologyandComputerScience,2018,8,11-17independentphonemeclassifierthatextractsthesequenceofphonemesfromanaudiofile.ThephonemesequenceisthenclassifiedusingstatisticalandrecurrentNeuralNetworkmodels(RNNs).Withphonemeclassificationofthreelanguages(Turkish,UzbekandMandarin)authorsachievedanaverageaccuracyof58%.Montavon’s[7]attemptusesadeepneuralnetworkfeaturingthreeconvolutionallayersaswellasasmallerattemptwithasingleconvolutionallayertime-delaymodel.TryingtoclassifyEnglish,German,andFrench,authorachieves91.3%accuracyforknownspeakers(usedintraining)and80%forunknownspeakers.Leiet.alproposedtwonovelfrontendsforrobustlanguageidentification(LID)usingaconvolutionalneuralnetwork(CNN)trainedforautomaticspeechrecognition(ASR)[8].IntheCNN/i-vectorfrontend,theCNNisusedforgettingtheposteriorprobabilitiesfori-vectortrainingandextraction.Theauthorsevaluatedtheirapproachonheavilylowqualityspeechdata,butstilltheywe

1 / 7
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功