协同过滤外文文献翻译

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

外文:IntroductiontoRecommenderSystemApproachesofCollaborativeFiltering:NearestNeighborhoodandMatrixFactorization“Weareleavingtheageofinformationandenteringtheageofrecommendation.”Likemanymachinelearningtechniques,arecommendersystemmakespredictionbasedonusers’historicalbehaviors.Specifically,it’stopredictuserpreferenceforasetofitemsbasedonpastexperience.Tobuildarecommendersystem,themosttwopopularapproachesareContent-basedandCollaborativeFiltering.Content-basedapproachrequiresagoodamountofinformationofitems’ownfeatures,ratherthanusingusers’interactionsandfeedbacks.Forexample,itcanbemovieattributessuchasgenre,year,director,actoretc.,ortextualcontentofarticlesthatcanextractedbyapplyingNaturalLanguageProcessing.CollaborativeFiltering,ontheotherhand,doesn’tneedanythingelseexceptusers’historicalpreferenceonasetofitems.Becauseit’sbasedonhistoricaldata,thecoreassumptionhereisthattheuserswhohaveagreedinthepasttendtoalsoagreeinthefuture.Intermsofuserpreference,itusuallyexpressedbytwocategories.ExplicitRating,isarategivenbyausertoanitemonaslidingscale,like5starsforTitanic.Thisisthemostdirectfeedbackfromuserstoshowhowmuchtheylikeanitem.ImplicitRating,suggestsuserspreferenceindirectly,suchaspageviews,clicks,purchaserecords,whetherornotlistentoamusictrack,andsoon.Inthisarticle,Iwilltakeacloselookatcollaborativefilteringthatisatraditionalandpowerfultoolforrecommendersystems.NearestNeighborhoodThestandardmethodofCollaborativeFilteringisknownasNearestNeighborhoodalgorithm.Thereareuser-basedCFanditem-basedCF.Let’sfirstlookatUser-basedCF.Wehaveann×mmatrixofratings,withuseruᵢ,i=1,...nanditempⱼ,j=1,…m.Nowwewanttopredicttheratingrᵢⱼiftargetuserididnotwatch/rateanitemj.Theprocessistocalculatethesimilaritiesbetweentargetuseriandallotherusers,selectthetopXsimilarusers,andtaketheweightedaverageofratingsfromtheseXuserswithsimilaritiesasweights.Whiledifferentpeoplemayhavedifferentbaselineswhengivingratings,somepeopletendtogivehighscoresgenerally,someareprettystricteventhoughtheyaresatisfiedwithitems.Toavoidthisbias,wecansubtracteachuser’saverageratingofallitemswhencomputingweightedaverage,andadditbackfortargetuser,shownasbelow.TwowaystocalculatesimilarityarePearsonCorrelationandCosineSimilarity.Basically,theideaistofindthemostsimilaruserstoyourtargetuser(nearestneighbors)andweighttheirratingsofanitemasthepredictionoftheratingofthisitemfortargetuser.Withoutknowinganythingaboutitemsandusersthemselves,wethinktwousersaresimilarwhentheygivethesameitemsimilarratings.Analogously,forItem-basedCF,wesaytwoitemsaresimilarwhentheyreceivedsimilarratingsfromasameuser.Then,wewillmakepredictionforatargetuseronanitembycalculatingweightedaverageofratingsonmostXsimilaritemsfromthisuser.OnekeyadvantageofItem-basedCFisthestabilitywhichisthattheratingsonagivenitemwillnotchangesignificantlyovertime,unlikethetastesofhumanbeings.Therearequiteafewlimitationsofthismethod.Itdoesn’thandlesparsitywellwhennooneintheneighborhoodratedanitemthatiswhatyouaretryingtopredictfortargetuser.Also,it’snotcomputationalefficientasthegrowthofthenumberofusersandproducts.MatrixFactorizationSincesparsityandscalabilityarethetwobiggestchallengesforstandardCFmethod,itcomesamoreadvancedmethodthatdecomposetheoriginalsparsematrixtolow-dimensionalmatriceswithlatentfactors/featuresandlesssparsity.ThatisMatrixFactorization.Besidesolvingtheissuesofsparsityandscalability,there’sanintuitiveexplanationofwhyweneedlow-dimensionalmatricestorepresentusers’preference.AusergavegoodratingstomovieAvatar,Gravity,andInception.Theyarenotnecessarily3separateopinionsbutshowingthatthisusersmightbeinfavorofSci-FimoviesandtheremaybemanymoreSci-Fimoviesthatthisuserwouldlike.Unlikespecificmovies,latentfeaturesisexpressedbyhigher-levelattributes,andSci-Ficategoryisoneoflatentfeaturesinthiscase.Whatmatrixfactorizationeventuallygivesusishowmuchauserisalignedwithasetoflatentfeatures,andhowmuchamoviefitsintothissetoflatentfeatures.Theadvantageofitoverstandardnearestneighborhoodisthateventhoughtwousershaven’tratedanysamemovies,it’sstillpossibletofindthesimilaritybetweenthemiftheysharethesimilarunderlyingtastes,againlatentfeatures.Toseehowamatrixbeingfactorized,firstthingtounderstandisSingularValueDecomposition(SVD).BasedonLinearAlgebra,anyrealmatrixRcanbedecomposedinto3matricesU,Σ,andV.Continuingusingmovieexample,Uisann×ruser-latentfeaturematrix,Visanm×rmovie-latentfeaturematrix.Σisanr×rdiagonalmatrixcontainingthesingularvaluesoforiginalmatrix,simplyrepresentinghowimportantaspecificfeatureistopredictuserpreference.TosortthevaluesofΣbydecreasingabsolutevalueandtruncatematrixΣtofirstkdimensions(ksingularvalues),wecanreconstructthematrixasmatrixA.TheselectionofkshouldmakesurethatAisabletocapturethemostofvariancewithintheoriginalmatrixR,sothatAistheapproximationofR,A≈R.ThedifferencebetweenAandRistheerrorthatisexpectedtobeminimized.ThisisexactlythethoughtofPrincipleComponentAnalysis.WhenmatrixRisdense,UandVcouldbeeasilyfactorizedanalytically.However,amatrixofmovieratingsissupersparse.Althoughtherearesomeimputationmethodstofil

1 / 22
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功