Finite-State Transducers in Language and Speech Pr

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

Finite-StateTransducersinLanguageandSpeechProcessingMehryarMohriAT&TLabs-ResearchFinite-statemachineshavebeenusedinvariousdomainsofnaturallanguageprocessing.Weconsiderheretheuseofatypeoftransducersthatsupportsveryefficientprograms:sequentialtransducers.Werecallclassicaltheoremsandgivenewonescharacterizingsequentialstring-to-stringtransducers.Transducersthatoutputweightsalsoplayanimportantroleinlanguageandspeechprocessing.Wegiveaspecificstudyofstring-to-weighttransducers,includingalgorithmsfordeterminizingandminimizingthesetransducersveryefficiently,andcharacterizationsofthetransducersadmittingdeterminizationandthecorrespondingalgorithms.Someapplicationsofthesealgorithmsinspeechrecognitionaredescribedandillustrated.1.IntroductionFinite-statemachineshavebeenusedinmanyareasofcomputationallinguistics.Theirusecanbejustifiedbybothlinguisticandcomputationalarguments.Linguistically,finiteautomataareconvenientsincetheyallowonetodescribeeasilymostoftherelevantlocalphenomenaencounteredintheempiricalstudyoflanguage.Theyoftenleadtoacompactrepresentationoflexicalrules,oridiomsandclich´es,thatappearsasnaturaltolinguists(Gross,1989).Graphictoolsalsoallowonetovisualizeandmodifyautomata.Thishelpsincorrectingandcompletingagrammar.Othermoregeneralphenomenasuchasparsingcontext-freegrammarscanalsobedealtwithusingfinite-statemachinessuchasRTN’s(Woods,1970).Moreover,theunderlyingmechanismsinmostofthemethodsusedinparsingarerelatedtoautomata.Fromthecomputationalpointofview,theuseoffinite-statemachinesismainlymo-tivatedbyconsiderationsoftimeandspaceefficiency.Timeefficiencyisusuallyachievedbyusingdeterministicautomata.Theoutputofdeterministicmachinesdepends,ingen-erallinearly,onlyontheinputsizeandcanthereforebeconsideredasoptimalfromthispointofview.Spaceefficiencyisachievedwithclassicalminimizationalgorithms(Aho,Hopcroft,andUllman,1974)fordeterministicautomata.Applicationssuchascompilerconstructionhaveshowndeterministicfiniteautomatatobeveryefficientinpractice(Aho,Sethi,andUllman,1986).Finiteautomatanowalsoconstitutearichchapteroftheoreticalcomputerscience(Perrin,1990).Theirrecentapplicationsinnaturallanguageprocessingwhichrangefromthecon-structionoflexicalanalyzers(Silberztein,1993)andthecompilationofmorphologicalandphonologicalrules(KaplanandKay,1994;Karttunen,Kaplan,andZaenen,1992)tospeechprocessing(Mohri,Pereira,andRiley,1996)showtheusefulnessoffinite-statemachinesinmanyareas.Weprovideheretheoreticalandalgorithmicbasesfortheuseandapplicationofthedevicesthatsupportveryefficientprograms:sequentialtransducers.600MountainAvenue,MurrayHill,NJ07974,USA.c1997AssociationforComputationalLinguisticsComputationalLinguisticsVolume23,NumberWeextendtheideaofdeterministicautomatatotransducerswithdeterministicin-put,thatismachinesthatproduceoutputstringsorweightsinadditiontoaccepting(deterministically)input.Thus,wedescribemethodsconsistentwiththeinitialrea-sonsforusingfinite-statemachines,inparticularthetimeefficiencyofdeterministicmachines,andthespaceefficiencyachievablewithnewminimizationalgorithmsforsequentialtransducers.Bothtimeandspaceconcernsareimportantwhendealingwithlanguage.Indeed,oneofthetrendswhichclearlycomeoutofthenewstudiesoflanguageisalargeincreaseinthesizeofdata.Lexicalapproacheshaveshowntobethemostappropriateinmanyareasofcomputationallinguisticsrangingfromlarge-scaledictionariesinmorphologytolargelexicalgrammarsinsyntax.Theeffectofthesizeincreaseontimeandspaceefficiencyisprobablythemaincomputationalproblemoneneedstofaceinlanguageprocessing.Theuseoffinite-statemachinesinnaturallanguageprocessingiscertainlynotnew.Limitationsofthecorrespondingtechniques,however,areveryoftenpointedoutmorethantheiradvantages.Thereasonforthatisprobablythatrecentworkinthisfieldarenotyetdescribedincomputersciencetextbooks.Sequentialfinite-statetransducersarenowusedinallareasofcomputationallinguistics.Inthefollowingwegiveanextendeddescriptionofthesedevices.Wefirstconsiderthecaseofstring-to-stringtransducers.Thesetransducershavebeensuccessfullyusedintherepresentationoflarge-scaledictionaries,computationalmorphology,andlocalgrammarsandsyntax.Wedescribethetheoreticalbasesfortheuseofthesetransduc-ers.Inparticular,werecallclassicaltheoremsandgivenewonescharacterizingthesetransducers.Wethenconsiderthecaseofsequentialstring-to-weighttransducers.Thesetrans-ducersappearasveryinterestinginspeechprocessing.Languagemodels,phonelatticesandwordlatticesareamongtheobjectsthatcanberepresentedbythesetransducers.Wegivenewtheoremsextendingthecharacterizationsknownforusualtransducerstothesetransducers.Wedefineanalgorithmfordeterminizingstring-to-weighttransduc-ers.Webothcharacterizetheunambiguoustransducersadmittingdeterminizationanddescribeanalgorithmtotestdeterminizability.Wealsogiveanalgorithmtominimizesequentialtransducerswhichhasacomplexityequivalenttothatofclassicalautomataminimizationandwhichisveryefficientinpractice.Undercertainrestrictions,themin-imizationofsequentialstring-to-weighttransducerscanalsobeperformedusingthedeterminizationalgorithm.Wedescribethecorrespondingalgorithmandgivet

1 / 42
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功