CFGsandPCFGs(Probabilistic)Context-FreeGrammarsChristopherManningAphrasestructuregrammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPpeoplefishtankspeoplefishwithrodsNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningPhrasestructuregrammars=context-freegrammars(CFGs)•G=(T,N,S,R)•Tisasetofterminalsymbols•Nisasetofnonterminalsymbols•Sisthestartsymbol(S∈N)•Risasetofrules/productionsoftheformX•X∈Nand∈(N∪T)*•AgrammarGgeneratesalanguageL.ChristopherManningPhrasestructuregrammarsinNLP•G=(T,C,N,S,L,R)•Tisasetofterminalsymbols•Cisasetofpreterminalsymbols•Nisasetofnonterminalsymbols•Sisthestartsymbol(S∈N)•Listhelexicon,asetofitemsoftheformXx•X∈Pandx∈T•Risthegrammar,asetofitemsoftheformX•X∈Nand∈(N∪C)*•Byusualconvention,Sisthestartsymbol,butinstatisticalNLP,weusuallyhaveanextranodeatthetop(ROOT,TOP)•Weusuallywriteeforanemptysequence,ratherthannothingChristopherManningAphrasestructuregrammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPpeoplefishtankspeoplefishwithrodsNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningProbabilistic–orstochastic–context-freegrammars(PCFGs)•G=(T,N,S,R,P)•Tisasetofterminalsymbols•Nisasetofnonterminalsymbols•Sisthestartsymbol(S∈N)•Risasetofrules/productionsoftheformX•Pisaprobabilityfunction•P:R[0,1]••AgrammarGgeneratesalanguagemodelL. P(g)=1gÎT*åChristopherManningAPCFGSNPVP1.0VPVNP0.6VPVNPPP0.4NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0[WithemptyNPremovedsolessambiguous]ChristopherManningTheprobabilityoftreesandstrings•P(t)–Theprobabilityofatreetistheproductoftheprobabilitiesoftherulesusedtogenerateit.•P(s)–TheprobabilityofthestringsisthesumoftheprobabilitiesofthetreeswhichhavethatstringastheiryieldP(s)=ΣjP(s,t)wheretisaparseofs=ΣjP(t)ChristopherManningChristopherManningChristopherManningTreeandStringProbabilities•s=peoplefishtankswithrods•P(t1)=1.0×0.7×0.4×0.5×0.6×0.7×1.0×0.2×1.0×0.7×0.1=0.0008232•P(t2)=1.0×0.7×0.6×0.5×0.6×0.2×0.7×1.0×0.2×1.0×0.7×0.1=0.00024696•P(s)=P(t1)+P(t2)=0.0008232+0.00024696=0.00107016VerbattachNounattachChristopherManningChristopherManningCFGsandPCFGs(Probabilistic)Context-FreeGrammarsGrammarTransformsRestrictingthegrammarformforefficientparsingChristopherManningChomskyNormalForm•AllrulesareoftheformXYZorXw•X,Y,Z∈Nandw∈T•Atransformationtothisformdoesn’tchangetheweakgenerativecapacityofaCFG•Thatis,itrecognizesthesamelanguage•Butmaybewithdifferenttrees•Emptiesandunariesareremovedrecursively•n-aryrulesaredividedbyintroducingnewnonterminals(n2)ChristopherManningAphrasestructuregrammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomskyNormalFormstepsSNPVPSVPVPVNPVPVVPVNPPPVPVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomskyNormalFormstepsSNPVPVPVNPSVNPVPVSVVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomskyNormalFormstepsSNPVPVPVNPSVNPVPVVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfishNtanksNrodsVpeopleSpeopleVfishSfishVtanksStanksPwithChristopherManningChomskyNormalFormstepsSNPVPVPVNPSVNPVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfishNtanksNrodsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithChristopherManningChomskyNormalFormstepsSNPVPVPVNPSVNPVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPPPNPPNPPPPNPNPpeopleNPfishNPtanksNProdsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithPPwithChristopherManningChomskyNormalFormstepsSNPVPVPVNPSVNPVPV@VP_V@VP_VNPPPSV@S_V@S_VNPPPVPVPPSVPPNPNPNPNPNPPPNPPNPPPPNPNPpeopleNPfishNPtanksNProdsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithPPwithChristopherManningAphrasestructuregrammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomskyNormalFormstepsSNPVPVPVNPSVNPVPV@VP_V@VP_VNPPPSV@S_V@S_VNPPPVPVPPSVPPNPNPNPNPNPPPNPPNPPPPNPNPpeopleNPfishNPtanksNProdsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithPPwithChristopherManningChomskyNormalForm•Youshouldthinkofthisasatransformationforefficientparsing•Withsomeextrabook-keepinginsymbolnames,youcanevenreconstructthesametreeswithadetransform•InpracticefullChomskyNormalFormisapain•Reconstructingn-ariesiseasy•Reconstructingunaries/emptiesistrickier•BinarizationiscrucialforcubictimeCFGparsing•Therestisn’tnecessary;itjustmakesthealgorithmscleanerandabitquickerChristopherManningROOTSNPVPNpeopleVNPPPPNProdswithtanksfishNNAnexample:beforebinarization…ChristopherManningPNProdsNwithNPNpeopletanksfishNVPVNPPP@VP_VROOTSAfterbinarization…ChristopherManningTreebank:emptiesandunariesROOTS-HLNNP-SUBJVPVB-NONE-eAtonePTBTreeROOTSNPVPVB-NONE-eAtoneNoFuncTags