文本挖掘技术12-情感

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

1文本情感分析技术杨建武Email:yangjianwu@icst.pku.edu.cn第十二章:北京大学计算机科学技术研究所文本挖掘技术(2009)2情感计算的概念¾情感计算(AffectiveComputing)™通过计算机技术,自动分析文本、图像或视音频等对象所包含的情感倾向及其强度•例如:正面或负面、喜欢或讨厌、快乐或悲伤、愤怒和恐惧等™情感计算的分类•主观性(Subjectivity)–主观性、客观性和中性•情感倾向(Orientation)–正面(褒义)、负面(贬义)和中性3情感计算的应用¾Businessesandorganizations:productandservicebenchmarking.Marketintelligence.™Businessspendsahugeamountofmoneytofindconsumersentimentsandopinions.•Consultants,surveysandfocusedgroups,etc¾Individuals:interestedinother’sopinionswhen™Purchasingaproductorusingaservice,™Findingopinionsonpoliticaltopics,¾Adsplacements:Placingadsintheuser-generatedcontent™Placeanadwhenonepraisesaproduct.™Placeanadfromacompetitorifonecriticizesaproduct.¾Opinionretrieval/search:providinggeneralsearchforopinions.4文本情感计算¾词或短语的情感倾向¾文档与句子的情感倾向¾观点挖掘™基于特征的观点挖掘™比较式观点挖掘5词语的情感倾向¾OpinionWordsorPhrases(alsocalledpolarwords,opinionbearingwords,etc).E.g.,™Positive:beautiful,wonderful,good,amazing™Negative:bad,poor,terrible¾Importanttonote:™Someopinionwordsarecontextindependent(e.g.,good).™Somearecontextdependent(e.g.,long).¾Threemainwaystocompilesuchalist:™Manualapproach:notabadidea,onlyanone-timeeffort™Corpus-basedapproaches™Dictionary-basedapproaches6词语的情感倾向¾1997年,Hatzivassiloglou等人通过连词的语义约束计算形容词的情感倾向¾2002年,Turney等人提出利用搜索引擎查询词之间的互信息(PMI):AltaVista的Near操作符,“excellent”and“poor”¾2003年,Turney等人又提出基于潜在语义分析(LSA)计算词语的语义倾向¾2004年,Kamps等人提出基于WordNet的方法,通过计算词与“good”and“bad”之间的语义距离来作为分类标注7SO-PMI¾MeasuringPraiseandCriticism:InferenceofSemanticOrientationfromAssociation(TURNEY2003)¾SO-PMI(SemanticOrientationfromPointwiseMutualInformation)8SO-PMI9Dictionary-basedapproaches¾TypicallyuseWordNet’ssynsetsandhierarchiestoacquireopinionwords™Startwithasmallseedsetofopinionwords.™UsethesettosearchforsynonymsandantonymsinWordNet(HuandLiu,KDD-04;KimandHovy,COLING-04).™Manualinspectionmaybeusedafterward.¾Useadditionalinformation(e.g.,glosses注释)fromWordNetandlearning™(AndreevskaiaandBergler,EACL-06)™(EsutiandSebastiani,CIKM-05)¾Weaknessoftheapproach™Donotfindcontextdependentopinionwords,•e.g.,small,long,fast.¾中文资源:HowNet、同义词词林10DocumentsSentimentclassification¾Classifydocuments(e.g.,reviews)basedontheoverallsentimentsexpressedbyopinionholders(authors),™Positive,negative,and(possibly)neutral¾Similarbutdifferentfromtopic-basedtextclassification.™Intopic-basedtextclassification,topicwordsareimportant.™Insentimentclassification,sentimentwordsaremoreimportant,e.g.,great,excellent,horrible,bad,worst,etc.11文章的倾向分析¾2003年,Turney用评论中出现的词语的倾向的平均值来代表整篇评论的倾向;¾2003年,Dave等用词的倾向代表文章的倾向,考虑了词的倾向强度;¾2002年,BoPang等人首先在情感分析领域引入了机器学习的方法,利用NaïveBayes、MaxEntropy、SVM等分类,在文档级别上对文档进行自动的情感分类;(作者通过IMDB收集了具有标注的电影评论)¾2004年,BoPang等人又提出通过机器学习和图中最小割的方法对文档中的句子进行主观性判断;¾2005年,BoPang等人进一步拓展了他们的工作,通过机器学习的方法对电影评论进行3级或4级打分。12Unsupervisedreviewclassification¾(Turney,ACL-02)¾Data:reviewsfromepinions.comonautomobiles,banks,movies,andtraveldestinations.¾Theapproach:Threesteps¾Step1:™Part-of-speechtagging™Extractingtwoconsecutivewords(two-wordphrases)fromreviewsiftheirtagsconformtosomegivenpatterns,e.g.,(1)JJ,(2)NN.13Unsupervisedreviewclassification¾Step2:Estimatethesemanticorientation(SO)oftheextractedphrases™UsePointwisemutualinformation™Semanticorientation(SO):™UsingAltaVistanearoperatortodosearchtofindthenumberofhitstocomputePMIandSO.14Unsupervisedreviewclassification¾Step3:ComputetheaverageSOofallphrases™classifythereviewasrecommendedifaverageSOispositive,notrecommendedotherwise.¾Finalclassificationaccuracy:™automobiles-84%™banks-80%™movies-65.83™traveldestinations-70.53%15Sentimentclassificationusingmachinelearningmethods¾(Pangetal,EMNLP-02)¾Thispaperdirectlyappliedseveralmachinelearningtechniquestoclassifymoviereviewsintopositiveandnegative.¾Threeclassificationtechniquesweretried:™NaïveBayes™Maximumentropy™Supportvectormachine¾Pre-processingsettings:negationtag,unigram(singlewords),bigram,POStag,position.™SVM:thebestaccuracy83%(unigram)16Sentence-levelsentimentanalysis¾Document-levelsentimentclassificationistoocoarseformostapplications.¾Muchoftheworkonsentencelevelsentimentanalysisfocusesonidentifyingsubjectivesentencesinnewsarticles.™Classification:objectiveandsubjective.™Alltechniquesusesomeformsofmachinelearning.™E.g.,usinganaïveBayesianclassifierwithasetofdatafeatures/attributesextractedfromtrainingsentences(Wiebeetal.ACL-99).17Letusgofurther?¾Sentimentclassificationatbothdocumentandsentence(orclause)levelsareuseful,but™Theydonotfindwhattheopinionholderlikeanddislike.¾Annegativesentimentonanobject™doesnotmeanthattheopinionholderdislikeseverythingabouttheobject.¾Apositivesentimentonanobject™doesnotmeanthattheopinionholderlikeseverythingabouttheobject.¾Weneedtogotothefeaturelevel.18观点挖掘¾观点挖掘™目的:从文档或者文档集合中挖掘出评论对象以及对该对象的观点;™与文档级别的情感分类相比较:观点挖掘需要在更细的粒度上对文档进行情感分析;™观点挖掘往往采用紧密相关的信息抽取技术,用来发现文章内的对象、及其相应观点。19观点挖掘¾BingLiu:原型系统OpinionObserver20Opinionmining¾(HuandLiu,KDD-04;Liu,WebDataMiningbook2007)¾Basiccomponentsofanopinion™Opinionholder:Thepersonororganizationthatholdsaspecificopiniononaparticularobject.™Object:onwhichanopinionisexpressed™Opinion:aview,attitude,orap

1 / 66
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功