TextMining12-情感

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

1文本情感分析技术杨建武Email:yangjw@pku.edu.cn第十二章:北京大学计算机科学技术研究所文本挖掘技术(2012春)2情感计算的概念情感计算(AffectiveComputing)通过计算机技术,自动分析文本、图像或视音频等对象所包含的情感倾向及其强度•例如:正面或负面、喜欢或讨厌、快乐或悲伤、愤怒和恐惧等情感计算的分类•主观性(Subjectivity)–主观性、客观性和中性•情感倾向(Orientation)–正面(褒义)、负面(贬义)和中性情感计算的应用81%ofInternetusers(or60%ofAmericans)havedoneonlineresearchonaproductatleastonce;Amongreadersofonlinereviewsofrestaurants,hotels,andvariousservices(e.g.,travelagenciesordoctors),between73%and87%reportthatreviewshadasignificantinfluenceontheirpurchase;Consumersreportbeingwillingtopayfrom20%to99%morefora5-star-rateditemthana4-star-rateditem(thevariancestemsfromwhattypeofitemorserviceisconsidered);34情感计算的应用Businessesandorganizations:productandservicebenchmarking.Marketintelligence.Businessspendsahugeamountofmoneytofindconsumersentimentsandopinions.•Consultants,surveysandfocusedgroups,etcIndividuals:interestedinother‟sopinionswhenPurchasingaproductorusingaservice,Findingopinionsonpoliticaltopics,Adsplacements:Placingadsintheuser-generatedcontentPlaceanadwhenonepraisesaproduct.Placeanadfromacompetitorifonecriticizesaproduct.Opinionretrieval/search:providinggeneralsearchforopinions.ChallengesDeterminewhetheradocumentorportion(e.g.paragraphorstatement)issubjective.Example:“thebatterylasts2hours”vs.“thebatteryonlylasts2hours”5ChallengesThedifficultyliesintherichnessofhumanlanguageuse.Example:1.Thisisagreatcamera.2.Agreatamountofmoneywasspentforpromotingthiscamera.3.Onemightthinkthisisagreatcamera.Wellthinkagain,because.....asinglekeywordcanbeusedtoconveythreedifferentopinions,+ve,neutraland-verespectively.67文本情感计算词或短语的情感倾向文档与句子的情感倾向观点挖掘基于特征的观点挖掘比较式观点挖掘8词语的情感倾向OpinionWordsorPhrases(alsocalledpolarwords,opinionbearingwords,etc).E.g.,Positive:beautiful,wonderful,good,amazingNegative:bad,poor,terribleImportanttonote:Someopinionwordsarecontextindependent(e.g.,good).Somearecontextdependent(e.g.,long).Threemainwaystocompilesuchalist:Manualapproach:notabadidea,onlyanone-timeeffortCorpus-basedapproachesDictionary-basedapproaches9词语的情感倾向1997年,Hatzivassiloglou等人通过连词的语义约束计算形容词的情感倾向2002年,Turney等人提出利用搜索引擎查询词之间的互信息(PMI):AltaVista的Near操作符,“excellent”and“poor”2003年,Turney等人又提出基于潜在语义分析(LSA)计算词语的语义倾向2004年,Kamps等人提出基于WordNet的方法,通过计算词与“good”and“bad”之间的语义距离来作为分类标注词语的情感倾向nicehandsometerriblecomfortablepainfulexpensivefunscenicslow11SO-PMIMeasuringPraiseandCriticism:InferenceofSemanticOrientationfromAssociation(TURNEY2003)SO-PMI(SemanticOrientationfromPointwiseMutualInformation)12SO-PMI13基于词典的方法TypicallyuseWordNet‟ssynsetsandhierarchiestoacquireopinionwordsStartwithasmallseedsetofopinionwords.UsethesettosearchforsynonymsandantonymsinWordNet(HuandLiu,KDD-04;KimandHovy,COLING-04).Manualinspectionmaybeusedafterward.Useadditionalinformation(e.g.,glosses注释)fromWordNetandlearning(AndreevskaiaandBergler,EACL-06)(EsutiandSebastiani,CIKM-05)14基于词典的方法WeaknessoftheapproachDonotfindcontextdependentopinionwords,•e.g.,small,long,fast.中文资源:HowNet、同义词词林15文章的情感倾向分析Classifydocuments(e.g.,reviews)basedontheoverallsentimentsexpressedbyopinionholders(authors),Positive,negative,and(possibly)neutralSimilarbutdifferentfromtopic-basedtextclassification.Intopic-basedtextclassification,topicwordsareimportant.Insentimentclassification,sentimentwordsaremoreimportant,e.g.,great,excellent,horrible,bad,worst,etc.16文章的情感倾向分析2003年,Turney用评论中出现的词语的倾向的平均值来代表整篇评论的倾向;2003年,Dave等用词的倾向代表文章的倾向,考虑了词的倾向强度;2002年,BoPang等人首先在情感分析领域引入了机器学习的方法,利用NaïveBayes、MaxEntropy、SVM等分类,在文档级别上对文档进行自动的情感分类;(作者通过IMDB收集了具有标注的电影评论)2004年,BoPang等人又提出通过机器学习和图中最小割的方法对文档中的句子进行主观性判断;2005年,BoPang等人进一步拓展了他们的工作,通过机器学习的方法对电影评论进行3级或4级打分。17Unsupervisedreviewclassification(Turney,ACL-02)Data:reviewsfromepinions.comonautomobiles,banks,movies,andtraveldestinations.Theapproach:ThreestepsStep1:Part-of-speechtaggingExtractingtwoconsecutivewords(two-wordphrases)fromreviewsiftheirtagsconformtosomegivenpatterns,e.g.,(1)JJ,(2)NN.18UnsupervisedreviewclassificationStep2:Estimatethesemanticorientation(SO)oftheextractedphrasesUsePointwisemutualinformationSemanticorientation(SO):UsingAltaVistanearoperatortodosearchtofindthenumberofhitstocomputePMIandSO.19UnsupervisedreviewclassificationStep3:ComputetheaverageSOofallphrasesclassifythereviewasrecommendedifaverageSOispositive,notrecommendedotherwise.Finalclassificationaccuracy:automobiles-84%banks-80%movies-65.83traveldestinations-70.53%20Sentimentclassificationusingmachinelearningmethods(Pangetal,EMNLP-02)Thispaperdirectlyappliedseveralmachinelearningtechniquestoclassifymoviereviewsintopositiveandnegative.Threeclassificationtechniquesweretried:NaïveBayesMaximumentropySupportvectormachinePre-processingsettings:negationtag,unigram(singlewords),bigram,POStag,position.SVM:thebestaccuracy83%(unigram)21句子级情感倾向分析Document-levelsentimentclassificationistoocoarse(粗糙)formostapplications.Muchoftheworkonsentencelevelsentimentanalysisfocusesonidentifyingsubjectivesentencesinnewsarticles.Classification:objectiveandsubjective.Alltechniquesusesomeformsofmachinelearning.E.g.

1 / 75
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功