The PageRank Citation Ranking - Redone

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ThePageRankCitationRanking:BringingOrdertotheWebPageL.,BrinS.,MotwaniR.,WinogradT.StanfordDigitalLibraryTechnologiesProject:Dr.GautamDasDatabasesandInformationRetrievalUniversityofTexasatArlingtonOutlinePaperCitationsandtheWeb:MotivationPageRank:Whyitshouldbeconsidered?MorePageRank:NutsandboltsPageRankUnleashed:LookingunderthehoodConvergenceandRandomWalks:Whydoesitwork?Implementation:GettingyourhandsdirtyPersonalizedPageRank:TheinvisiblesourceApplications:Whatwasn’tapparentalreadyConclusionsUniversityofTexasatArlingtonPaperCitationsandtheWeb:MotivationAcademicCitationslinktootherwellknownpapersButtheyarepeerreviewedandhavequalitycontrolWebofacademicdocumentsarehomogeneousintheirquality,usage,citation&lengthMostwebpageslinktowebpagesaswellQualitymeasureofawebpageissubjectivetotheuserthoughImportanceofapageisaquantitythatisn’tintuitivelypossibletocaptureUniversityofTexasatArlingtonContd.Anuserwantstoseewhatismostapplicabletoherneedsfirst.Thejoboftheretrievalsystemistopresentthemorerelevantdocumentsupfront.ThenotionofqualityorrelativeimportanceofawebpagemagnifiesTheaveragequalityexperiencedbyanuserishigherthantheaveragequalityoftheaveragewebpage.NotationsUsed:•Backlinks(inedges):Linksthatpointtoacertainpage•ForwardLinks(outedges):LinksthatemanatefromthatpageUniversityofTexasatArlingtonPageRank:Whyitshouldbeconsidered?Thinkofacolorpalette–Colorsareformedbythemixtureofoneormorecolors–Theamountandintensityofeachcoloryoumixultimatelygovernsthecolorofthefinalmixturenotthenumberofcolors!!!NowthinkofaWebPage–Anumberofbacklinks(inedges)pointtothiswebpage–SayacertainbacklinkcamefromYahoo!andanothercamefromanobscurehomepage.ThinkoftheimportanceoftheYahoo!Pageasopposedtotheimportanceofthe‘homepage’.NowsaytheimportanceoftheYahoo!Pagewasmappedtotheamount(intensity)ofonecolorandthe‘homepage’toanothercolorImportanceofbacklinksratherthantheirnumber.++UniversityofTexasatArlingtonMorePageRank:NutsandboltsSayforanyWebPageuthenumberofforwardlinksisgivenbyFuandthenumberofbacklinksbeBuandNu=|Fu|R()=Rankofpageu;c=NormalizationConstant–Note:c1tocoverforpageswithnooutgoinglinksUniversityofTexasatArlingtonContd..Sowhatdoestheoverallpicturelooklike?Aisdesignatedtobeamatrix,uandvcorrespondtothecolumnsofthismatrixUniversityofTexasatArlingtonContd..(MatricesRevisited)EigenvectorsandeigenvaluesGiventhatAisamatrix,andRbeavectoroveralltheWebpages,thedominanteigenvectoristheoneassociatedwiththemaximaleigenvalue.Itcanbefoundoutbyrecursingthepreviousequationtilltherecurrenceconverges.–Asetofeigenvaluesformwhatiscalledtheeigenspace.UniversityofTexasatArlingtonContd..(AWalkThroughExample)LetstakeanexampleAT=UniversityofTexasatArlingtonContd..MatrixNotationR=cAR=MRc:eigenvalueR:eigenvectorofAAx=λx|A-λI|x=0A=R=Normalized=UniversityofTexasatArlingtonContd..(MarkovChains)Randomsurfermodel–DescriptionofarandomwalkthroughtheWebgraph–Interpretedasatransitionmatrixwithasymptoticprobabilitythatasurferiscurrentlybrowsingthatpage–TheabovenotionisfundamentaltoanyMarkovianSystem.Foradiscretenotionoftheabove,thefollowingisassumed.Rt=MRt-1M:transitionmatrixforafirst-orderMarkovchain(stochastic)Thequestionisdoesitconvergetosomesensiblesolution(ast)regardlessoftheinitialranks?UniversityofTexasatArlingtonContd..(Issues..)TheaboveequationwouldconvergewereitnotforalittleproblemThisproblemiscalledthe‘RankSink’Problem.–Thesinkaccumulatesrank,butneverdistributesit!UniversityofTexasatArlingtonContd..()IngeneralmanyWebpagesdon’thaveeitherbacklinksorforwardlinks.Resultsindanglingedgesofthegraphnoparentrank0–MTconvergestoamatrixwhoselastcolumnisallzeronochildrennosolution–MTconvergestozeromatrixUniversityofTexasatArlingtonContd..(MoreRandomSurfer)Howdoweescapefromthis?–A:Weactually‘escape’fromit.Sayasurferisrandomlyclickingandhoppingfromonepagetotheother.Ifthissurferkeepsgoingbacktothe‘same’setofpages,shewillgetbored(inrealitytoo)andtryand‘escape’fromthissetofpages.Hence,weassociatean‘escape’factorEtoaccountforthis‘boredom’.–HowdowemodelthisescapeprobabilityWetermthisEtobeavectoroverallthewebpagesthataccountsforeachpage’sescapeprobability.UniversityofTexasatArlingtonContd..GiventhisEscapevector,howdoweassociatethiswiththeoriginalmodelInmatrixnotationwhereItcanberewrittenasHenceUniversityofTexasatArlingtonPageRankUnleashed:Lookingunderthehood•Whatcanwesayaboutdand?•d1iscalledtheeigengapanditcontrolstherateofconvergence•istheconvergencethresholdThemainalgorithm:UniversityofTexasatArlingtonConvergenceandRandomWalks:Whydoesitwork?IrreducibleAperiodicMarkovChainswithaPrimitivetransitionprobabilitymatrixWhatistheissueallabout?–Weneedatransitionmatrixmodelthatisguaranteedconvergenceanddoesindeedconvergetoauniquestationarydistributionvector.UniversityofTexasatArlingtonContd..AdditionoftheescapevectorE,allowsustomaketheoriginalmatrixAbebothprimitiveandstochastic–ThisguaranteesconvergenceWhatabouttheadditionofnewlinks–Whetherthe

1 / 23
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功