Parameter estimation for text analysis

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ParameterestimationfortextanalysisGregorHeinrichTechnicalNotevsonixGmbH+UniversityofLeipzig,Germanygregor@vsonix.comAbstract.Presentsparameterestimationmethodscommonwithdiscreteproba-bilitydistributions,whichisofparticularinterestintextmodeling.Startingwithmaximumlikelihood,aposterioriandBayesianestimation,centralconceptslikeconjugatedistributionsandBayesiannetworksarereviewed.Asanapplication,themodeloflatentDirichletallocation(LDA)isexplainedindetailwithafullderivationofanapproximateinferencealgorithmbasedonGibbssampling,in-cludingadiscussionofDirichlethyperparameterestimation.History:version1:May2005,version2.4:August2008.1IntroductionThistechnicalnoteisintendedtoreviewthefoundationsofBayesianparameteresti-mationinthediscretedomain,whichisnecessarytounderstandtheinnerworkingsoftopic-basedtextanalysisapproacheslikeprobabilisticlatentsemanticanalysis(PLSA)[Hofm99],latentDirichletallocation(LDA)[BNJ02]andothermixturemodelsofcountdata.Despitetheirgeneralacceptanceintheresearchcommunity,itappearsthatthereisnocommonbookorintroductorypaperthatfillsthisrole:MostknowntextsuseexamplesfromtheGaussiandomain,whereformulationsappeartoberatherdi erent.Otherverygoodintroductoryworkontopicmodels(e.g.,[StGr07])skipsdetailsofalgorithmsandotherbackgroundforclarityofpresentation.Wethereforewillsystematicallyintroducethebasicconceptsofparameterestima-tionwithacoupleofsimpleexamplesonbinarydatainSection2.Wethenwillin-troducetheconceptofconjugacyalongwithareviewofthemostcommonprobabilitydistributionsneededinthetextdomaininSection3.Thejointpresentationofconjugacywithassociatedreal-worldconjugatepairsdirectlyjustifiesthechoiceofdistributionsintroduced.Section4willintroduceBayesiannetworksasagraphicallanguagetode-scribesystemsviatheirprobabilisticmodels.Withthesebasicconcepts,wepresenttheideaoflatentDirichletallocation(LDA)inSection5,aflexiblemodeltoestimatethepropertiesoftext.OntheexampleofLDA,theusageofGibbssamplingisshownasastraight-forwardmeansofapproximateinferenceinBayesiannetworks.TwootherimportantaspectsofLDAarediscussedafterwards:InSection6,theinfluenceofLDAhyperparametersisdiscussedandanestimationmethodproposed,andinSection7,methodsarepresentedtoanalyseLDAmodelsforqueryingandevaluation.22ParameterestimationapproachesWefacetwoinferenceproblems,(1)toestimatevaluesforasetofdistributionparam-eters#thatcanbestexplainasetofobservationsXand(2)tocalculatetheprobabilityofnewobservations˜xgivenpreviousobservations,i.e.,tofindp(˜xjX).Wewillrefertotheformerproblemastheestimationproblemandtothelatterasthepredictionorregressionproblem.ThedatasetX,fxigjXji=1canbeconsideredasequenceofindependentandidenti-callydistributed(i.i.d.)realisationsofarandomvariable(r.v.)X.Theparameters#aredependentonthedistributionsconsidered,e.g.,foraGaussian,#=f;2g.Forthesedataandparameters,acoupleofprobabilityfunctionsareubiquitousinBayesianstatistics.TheyarebestintroducedaspartsofBayes’rule,whichis1:p(#jX)=p(Xj#)p(#)p(X);(1)andwedefinethecorrespondingterminology:posterior=likelihoodpriorevidence:(2)Inthenextparagraphs,wewillshowdi erentestimationmethodsthatstartfromsimplemaximisationofthelikelihood,thenshowhowpriorbeliefonparameterscanbeincor-poratedbymaximisingtheposteriorandfinallyuseBayes’ruletoinferacompleteposteriordistribution.2.1MaximumlikelihoodestimationMaximumlikelihood(ML)estimationtriestofindparametersthatmaximisethelikeli-hood,L(#jX),p(Xj#)=\x2XfX=xj#g=Yx2Xp(xj#);(3)i.e.,theprobabilityofthejointeventthatXgeneratesthedataX.BecauseoftheproductinEq.3,itisoftensimplertousetheloglikelihood,L,logL.TheMLestimationproblemthencanbewrittenas:ˆ#ML=argmax#L(#jX)=argmax#Xx2Xlogp(xj#):(4)Thecommonwaytoobtaintheparameterestimatesistosolvethesystem:@L(#jX)@#k!=08#k2#:(5)1Derivation:p(#jX)p(X)=p(X;#)=p(Xj#)p(#).3Theprobabilityofanewobservation˜xgiventhedataXcannowbefoundusingtheapproximation2:p(˜xjX)=Z#2p(˜xj#)p(#jX)d#(6)Z#2p(˜xjˆ#ML)p(#jX)d#=p(˜xjˆ#ML);(7)thatis,thenextsampleisanticipatedtobedistributedwiththeestimatedparametersˆ#ML.Asanexample,considerasetCofNBernoulliexperimentswithunknownparam-eterp,e.g.,realisedbytossingadeformedcoin.TheBernoullidensityfunctionforther.v.Cforoneexperimentis:p(C=cjp)=pc(1p)1c,Bern(cjp)(8)wherewedefinec=1forheadsandc=0fortails3.BuildinganMLestimatorfortheparameterpcanbedonebyexpressingthe(log)likelihoodasafunctionofthedata:L=logNYi=1p(C=cijp)=NXi=1logp(C=cijp)(9)=n(1)logp(C=1jp)+n(0)logp(C=0jp)=n(1)logp+n(0)log(1p)(10)wheren(c)isthenumberoftimesaBernoulliexperimentyieldedeventc.Di erentiatingwithrespectto(w.r.t.)theparameterpyields:@L@p=n(1)pn(0)1p!=0,ˆpML=n(1)n(1)+n(0)=n(1)N;(11)whichissimplytheratioofheadsresultstothetotalnumberofsamples.Toputsomenumbersintotheexample,wecouldimaginethatourcoinisstronglydeformed,andafter20trials,wehaven(1)=12timesheadsandn(0)=8timestails.ThisresultsinanMLestimationofofˆpML=12=20=0:6.2.2MaximumaposterioriestimationMaximumaposteriori(MAP)estimationisverysimilartoMLestimationbutallowstoincludesomeaprioribeliefontheparametersbyweightingthemwithapriordis-tributionp(#).Thenamederivesfromtheobj

1 / 31
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功