ParsingandLearningInternationalWorkshoponParsingTechnologyMarkJohnsonBrownUniversity/MacquarieUniversityOctober20091/88ParsingandLearningParsingandgrammarinductiontraditionallyviewedasdistinctIUnknownwordsandphrases)parsermustbepreparedtoadaptMostgrammarlearningsystemsareerror-drivenIparsingisacentralcomponentofmostlearningalgorithmsTraditionalstatisticallearningalgorithmslearnruleprobabilitiesLearninggrammarrulesIvia\generateandtestIvianon-parametricBayes)adaptorgrammars2/88TalkoutlineLearningPCFGruleprobabilitiesfromterminalstringsIcollapsedGibbssamplersexplictlysampleparses,butdon'texplicitlyrepresentruleprobabilitiesNon-parametricextensionstoPCFGswithunboundedlymanypossiblerulesIadaptorgrammarslearnprobabilitiesofarbitrarysubtreesIcollapsedGibbssampleronlyneedstorepresentsampleparses,andnotprobabilitiesofinnitelymanyrulesAdaptorgrammarscanbeusedtolearn:IsimpleconcatenativemorphologyIunsupervisedwordsegmentationIcollocationsintopicmodelsIstructureinnames3/88OutlineLearningruleprobabilitiesLearninggrammarrules(notjustprobabilities)ChineseRestaurantProcessesAdaptorgrammarsBayesianinferenceforadaptorgrammarsAdaptorgrammarsforunsupervisedwordsegmentationTopicCollocationmodelsusingadaptorgrammarsAdaptorgrammarsfornamedentitiesConclusionExtendingAdaptorGrammars4/88Probabilisticcontext-freegrammarsRulesinContext-FreeGrammars(CFGs)expandnonterminalsintosequencesofterminalsandnonterminalsAProbabilisticCFG(PCFG)associateseachnonterminalwithamultinomialdistributionovertherulesthatexpanditProbabilityofatreeistheproductoftheprobabilitiesoftherulesusedtoconstructitRulerrRulerrS!NPVP1:0NP!Sam0:75NP!Sandy0:25VP!barks0:6VP!snores0:4P0B@SamNPSVPbarks1CA=0:45P0B@SandyNPSVPsnores1CA=0:15/88MaximumlikelihoodestimationfromvisibleparsesEachruleexpansionissampledfromparent'smultinomial)MaximumLikelihoodEstimator(MLE)isrule'srelativefrequencySamNPSVPbarksSamNPSVPsnoresSandyNPSVPsnoresRulernrrRulernrrS!NPVP31:0NP!Sam20:66NP!Sandy10:33VP!barks10:33VP!snores20:66ButMLEisoftenoverlycertain,especiallywithsparsedataIE.g.,\accidentalzerosnr=0)r=0.6/88BayesianestimationfromvisibleparsesBayesianestimatorsestimateadistributionoverruleprobabilitiesP(jn)|{z}Posterior/P(nj)|{z}LikelihoodP()|{z}PriorDirichletdistributionsareconjugatepriorsformultinomialsIADirichletdistributionover(1;:::;m)isspeciedbypositiveparameters(1;:::;m)IIfPrior=Dir()thenPosterior=Dir(+n)01234500.20.40.60.81P(θ1|α)Ruleprobabilityθ1α=(1,1)α=(3,2)α=(21,11)7/88SparseDirichletpriorsAs!0,Dirichletdistributionsbecomepeakedaround0\Grammarincludessomeoftheserules,butwedon'tknowwhich!01234500.20.40.60.81P(θ1|α)Ruleprobabilityθ1α=(1,1)α=(0.5,0.5)α=(0.25,0.25)α=(0.1,0.1)8/88EstimatingruleprobabilitiesfromstringsaloneInput:terminalstringsandgrammarrulesOutput:ruleprobabilitiesIngeneral,noclosed-formsolutionforIiterativealgorithmsusuallyinvolvingrepeatedlyreparsingtrainingdataExpectationMaximization(EM)proceduregeneralizesvisibledataMLestimatorstohiddendataproblemsInside-Outsidealgorithmisacubic-timeEMalgorithmforPCFGsBayesianestimationofvia:IVariationalBayesorIMarkovChainMonteCarlo(MCMC)methodssuchasGibbssampling9/88GibbssamplerforparsetreesandruleprobabilitiesInput:terminalstrings(x1;:::;xn),grammarrulesandDirichletpriorparametersOutput:streamofsampleruleprobabilitiesandparsetreest=(t1;:::;tn)Algorithm:Assignparsetreestothestringssomehow(e.g.,randomly)Repeatforever:ComputerulecountsnfromtSamplefromDir(+n)Foreachstringxi:replacetiwithatreesampledfromP(tjxi;).Afterburn-in,(;t)aredistributedaccordingtoBayesianposteriorSamplingparsetreefromP(tjxi;)involvesparsingstringxi.10/88CollapsedGibbssamplersIntegrateoutruleprobabilitiestoobtainpredictivedistributionP(tijxi;t i)ofparsetiforsentencexigivenotherparsest iCollapsedGibbssamplerForeachsentencexiintrainingdata:ReplacetiwithasamplefromP(tjxi;t i)Aproblem:P(tijxi;t i)isnotaPCFGdistribution)nodynamic-programmingsampler(AFAIK)SNPcatsVPVchaseNPdogsSNPdogsVPVchaseNPdogs11/88Metropolis-HastingssamplersMetropolis-Hastings(MH)acceptance-rejectionprocedureusessamplesfromaproposaldistributiontoproducesamplesfromatargetdistributionWhensentencesizetrainingdatasize,P(tijxi;t i)isalmostaPCFGdistributionIuseaPCFGapproximationbasedont iasproposaldistributionIapplyMHtotransformproposalstoP(tijxi;t i)ToconstructaMetropolis-Hastingssampleryouneedtobeableto:IecientlysamplefromproposaldistributionIcalculateratiosofparseprobabilitiesunderproposaldistributionIcalculateratiosofparseprobabilitiesundertargetdistribution12/88CollapsedMetropolis-within-GibbssamplerforPCFGsInput:terminalstrings(x1;:::;xn),grammarrulesandDirichletpriorparametersOutput:streamofsampleparsetreest=(t1;:::;tn)Algorithm:Assignparsetreestothestringssomehow(e.g.,randomly)Repeatforever:Foreachsentencexiintrainingdata:Computerulecountsn ifromt iComputeproposalgrammarprobabilitiesfromn iSampleatreetfromP(tjxi;)ReplacetiwithtaccordingtoMetropolis-Hastingsaccept-rejectformula13/88Q:Arethereecientlocal-movetreesamplers?DPPCFGsa