07ReviewMiningforBI数据挖掘

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

Reviewminingforbusinessintelligence1OutlineReviewminingfromonlinemediaObjectivesandchallengesOursolutionsS-PLSAARSAHelpMeterCFwith***ect-basedopinionminingSummary2Background3SignificanceofonlinereviewsValuablesourceofinformationExcellentindicatorofpublicopinionsObjective:Yes:opinions,feelingsandattitudesNo:factsResearchonreviewminingSentimentclassificationThumbuporthumbdown?OpinionextractionHowtoautomaticallysummarizethereview?Opinionminingforbusinessintelligence4Ourquestion:Howsentimentinthetextaffectproductdemand?Oursolution:sentimentPLSA5AprobabilisticapproachtosentimentminingTraditionalform:PLSAFeatureselection:appraisalwordsAssumeareviewisgeneratedundertheinfluenceofanumberofhiddensentimentfactorsHiddenfactor:accommodatetheintricatenatureofsentimentsProbabilisticgenerativemodel:dealwithsentimentanalysisinaprincipledwayPosteriorprobabilities:providesummarizationofablogintermsofsentiments.S-PLSA6JointprobabilityandBayesruleGenerativeprocess1.PickareviewbfromBwithP(b)2.ChooseazfromZwithP(z|b)3.ChooseawordwfromWwithP(w|z)ZzzwPbzPbPwbP)|()|()(),(bBZzwZzzwPzbPzPwbP)|()|()(),(SentimentRepresentation7PosteriorprobabilitywithBayesruleHowmuchahiddensentimentfactorzcontributestothereviewbSummarizationofbintermsofsentimentsZzzPzbPzPzbPbzP)()|()()|()|(z1z2z3Blog10.20.30.5Blog20.80.10.1Blog30.30.30.4Existingworkonpredictionwithblogs8Blogvolumesandinterlinkstructures[Gruhletal.2004and2005]InformationdiffusionthroughblogspaceUsetopicburstingtopredictthetrendofproductsalesCorrelationbetweenblogmentionsandsalesspikesFailtoconsidertheeffectofsentimentSalesrankBlogmentionsCharacteristicsofonlinediscussions9IntuitivelyHotdiscussions=outstandingsalesperformanceArealexamplefromthemoviesectorTheDaVinciCodeOverTheHedgeUserRating6.57.1Oursolution:Productsalespredictionwithword-of-mouth10ARSA:Asentiment-awaremodelTwofactors:BoxofficerevenueoftheproceedingdaysPeople’ssentimentaboutthemoviepiqiKjtjitjiitit**111,,InfluenceofpastBOrevenueEffectofsentimentExperimentsettingsforsalesprediction11ExperimentsettingsDatasetsBlogentriesonmoviesFromMay1,2006toAugust8,2006GoogleblogsearchBoxofficerevenuedataEvaluationmetricNiiiiTrueTrueednMAPE1|Pr|1Empiricalstudyforsalesprediction12ParameterselectionsK,P,QTrendandrealcasesComparisonwithalternativemethodsWithoutsentiment:autoregressivemodelWithvolume:replacesentimentwithvolumescalarComparisonwithfeatureselectionmethodsBag-of-wordsAdaptivesentimentanalysisS-PLSA+:AdaptivesentimentanalysisCapturethehiddensentimentfactorsinthereviewsIncrementallyupdateparametersasmoredatabecomeavailableQuasi-BayesianestimationBatchtrainingIncrementalchangeApplicationtosalesperformanceprediction13Reviewqualitymining14PopularsolutionAggregatedscoreProblemsofexistingsolutions15FewvotesfornewpostsValuablereviewsbeingburiedinthelargenumberoflow-qualityreviewsMonopolyReviewsonthetopreceivemoreattentionSpamvotingMotivatedbysomeinterestsAnon-linearmodelforminingreviewquality16HelpMeter:Anon-linearregressionmodelforpredictingthehelpfulnessofonlinereviewsFeaturesandcontributions:SupporthelpfulnesspredictionDetectthereviewqualityirrespectiveofpublishingtimeIntegratemostinfluentialfactorsthatmayaffectthehelpfulnessvalueAnalyzeeachfactoraccordingtotheirnaturewithanon-linearmathematicalmodelConductextensiveexperimentsonrealdatasetsProblemdefinition17ObjectivePredictthehelpfulnessofareviewH[0,1]:fractionofpeoplewhofindthereviewhelpfulGoldenstandardTallyattachedtothereviewintrainingdata“xoutofypeoplefoundthefollowingreviewhelpful”i.e.,IMDBdatasetyxHObservationsonsimplecontextualfeatures18PreliminaryexperimentsonIMDBdataResultanalysisAdifficulttextminingtaskOne-dimensionalfeaturesaretoosimpleObservationonexpertise19familyadventuremusiccomedyObservationonwritingstyle20JaneAusten’sbooksmaddenmesothatIcan’tconcealmyfrenzyfromthereader.EverytimeIread‘PrideandPrejudice’Iwanttodigherupandbeatherovertheskullwithherownshin-bone.IHATEIT!!!LengthyandcomplicatedShortandfirmObservationontimeliness21Declinesastimepasses•Twomovies•PiratesoftheCaribbean•CasinoRoyale•Helpfulnessvs.days•14-daymovingaverageOursolution22ApproximateeachfactoraccordingtoitsnatureExpertise:RBFWritingstyle:RBFTimeliness:exponentialAcompletemodelthataccountsforallmajorfactorsNonlinearregressionmodelModelingexpertise23ActionSci-fiAdventureDramaThrillerDocumentaryFantasy…Review1:StarWarsReview2:IronManReview3:MatrixReview4:AmericanZeitgeistReviewMovieGenreHelpfulnessModelingexpertise24FeatureselectionEachmovieSimilarityofagivenmovietoothermovieshelpfulnessscoreRM::numberofcentersintheRBF:centerofthei-thRBF:spreadofthei-thRBF:weightofthei-thRBF),...,,(21mxxxx111),|(ˆkiiiixuH1kiiiuModelingwritingstyle25Featureselectionsyntacticalfeatures:POStagsQualifiers(quite,rather,enough)Modalauxiliaries(can,should,will)Comparativeandsuperlativeadjectives(top,largest,bigger)……Modeltherelationbetweenfeaturevectorandthehelpfulness),|(

1 / 45
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功