朱飞达-BDTC__大数据与金融创新:从研究到实战26

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

AssistantProfessorDSLeeFoundationFellowSchoolofInformationSystemsSingaporeManagementUniversityDec.11,2015FeidaZhuFoundingDirectorPinnacleLabforAnalyticsDBS-SMULabforLifeAnalyticsSingaporeManagementUniversity    – Insight  from  scale  • What  can  big  data  tell  us  that  small  data  cannot?  – Knowledge  from  enrichment  • What  important  knowledge  can  we  learn  from  enriching  small  data  with  big  data?  – Agility  from  real-­‐?me  responsiveness  • What  are  the  values  of  being  real-­‐?me?        VOLUMEVARIETYVELOCITY企业内部数据通常只以交易纪录为基础  Transac?on-­‐based  总量和覆盖有限  Limited  coverage    只反映用户生活的局部和侧面  Fragmented  par?al  perspec?ve  静态,低频  Sta?c,  low  frequency  孤立单一的客户视图,只见人  Isolated  view  of  individual  user    外部社交媒体大数据能展现交易行为的上下文场景  Context-­‐based  海量的社会级覆盖  Societal  scale  提供用户的多角度全景式洞察  Mul?-­‐facet  insight  动态实时,高频  Dynamic,  high  frequency  能综合考虑丰富真实社交关系  Network-­‐embedded  user  view  123123§ § § § § 我今天要联系50个潜在客户卖“少儿险”,告诉我打哪些电话?  为什么是这些人?  你今天最应该打这些人的电话!  因为这些人最关心孩子,而且最近有增长趋势,现在是最好时机!  § § § § § § § !!#!$!%!&!!#$%&'()'$%*+',-,-(.'$/0,0*%.0-1.$+,2#%#*3,('*)4*5.3.%1'()*+,-.(*./0)11-23+401(34+,5'()*+,-.(*./0)+23+401(34+,5Figure1.MutualReachability.!!!#!!$!!%!!&!!'!!(!!!&!!!!!&!!!#$%&#'()*+&'+,#$$*-.(/$&./()01.2/#$3*,#4(/.%,5#&,##)Figure2.FriendshipRetainability.!#$%$&%!!'(!')!')(!')!')(!'))!'))(!'&!'&(!'&)!'&)(!'*!'*(!'*)!'*)(!'#!'#(!'#)!'#)(!'+!'+(!'+)!'+)($'!!#$%&'!Figure3.CommunityAffinity.nitymemberswhodonothavedirecttwo-wayfollowlinkswiththetargetuseryetdohavestrongconnec-tionswithotheroff-linecommunitymembers.Thefol-lowingexperimentfurtherillustratestheprinciple.Foreachofour65Twittertargetuseru,weexamineeachuservinu’sneighborhood,andcounthowmanyoff-linefriendsofuhavedirecttwo-wayfollowlinkswithv.WerankthembythecountandcomputeAUC(AreaUnderROCCurve)oftheranklistbasedontheground-truthoff-linefriendsofu.Figure3showsthatformostusers(52outof65),theAUCvalueisgreaterthan0.8.Thismeansoff-linefriendsindeedsharemoredirecttwo-wayfollowlinkswithotheroff-linefriends,exhibitingmuchstrongercommunityaffinitythanonlinefriends.Hereweusedirecttwo-wayfollowlinksasanindicationofgreaterconnectionstrength.Principle3.CommunityAffinity.Givenatar-getuseru,forauserv∈Nku,letS={w|w∈Cu!Nkv},thelargerthecardinalityofS,themorelikelywehavev∈CuwithrespecttoNku.ALGORITHMWithincorporatingthethreeprinciples,weproposeouralgorithmbasedontheideaofrandomwalkwithrestart(RWR).Itisdefinedin[6]withthefollowingequation.⃗ri=(1−c)˜W⃗ri+c⃗ei(1)Inourproblemsetting,giventheTwitternetworkG=(V,E),atargetuseru∈Vandanumberk,wefocusonG’ssubgraphGkuinducedbyNku{u},whichissim-plifiedasGuwhenkisfixed.AprobabilitytransitionmatrixWisdefinedforV(Gu)suchthat,fortwonodesv,w∈V(Gu),theentryW(v,w)denotestheprobabil-ityofvtransmittingtowatanystep.InaccordancewithPrinciple(II),wedefineW(v,w)asW(v,w)=#1|F1v→|ifw∈F1v→0ifw̸∈F1v→(2)InEquation1,˜WisthetransposeoftheprobabilitytransitionmatrixWasdefinedabove.⃗eiisthestartingindicatorvectorsuchthatei,i=1andei,j=0wherei̸=j.⃗riistheprobabilityvectorfornodeisuchthatri,jistheprobabilityoftransmittingtonodejfromi.cisrestartprobability.Ithasbeenshownthat⃗ricanbecomputediterativelyanditfinallyconverges[6].Whenitconverges,thesteady-stateprobabilityvector⃗rire-flectsthebandwidthofinformationfloworiginatedfromuseritouserjforeveryj∈V(Gu).Weusethissteady-stateprobabilitytodefinetheclosenessscoreci,jfortwousersiandj:ci,j=ri,j∗rj,i(3)TheclosenessscorethusdefinedsatisfiesPrinciple(I).Wenextexplorehowtotakeadvantageoftheoff-linecommunitytoidentifyotherunknownmembers,imple-mentingPrinciple(III).Theideaistodiscovertheoff-linecommunityiteratively,addingnewmembersintotheknownsetineachround.Forthatpurpose,weintroduceanauxiliarydummynode,ˆv,toprovideathresholdtocutthenewoff-linecommunitybound-aryforeachround.ˆvisconstructedasavirtualnodesuchthat(I).ˆvandthetargetuserufolloweachother,i.e.,ˆv∈F1u←!F1u→,(II).ˆvonlyassociateswithu,i.e.,foreachv∈(Nku\{u}),ˆv̸∈(F1v←F1v→),and(III).thenumberoffollowersofˆvissettobetheme-dianofthenumberoffollowersofallusersinu’sk-hopnetworkwiththehubusersexcluded,i.e.,|F1ˆv→|=medianv∈(Nku\H){|F1v→|}.Hubusers,denotedasH,re-fertothoseaccountswithmorethan2000followers,whichtypicallybelongtocelebrities,newsmedia,etc.Thedummynodeisdefinedinsuchawayastosetthelower-boundcaseforanoff-linefriend.Itsimulatesthescenarioinwhichthetargetuserufindsbychancethisrandomuserˆvwhohasnoconnectionswithu’soff-linecommunity.Findinghim/herinteresting,ufollowsˆv,whothenalsofollowsbacksomehow.Assuch,ˆvrepre-sentsaconnectiontoualmostasweakasanyoff-linereal-lifefriendshouldbe.Onahighlevel,thealgorithmworksiniterationsasfollows.Givenatargetuseru,computetheclosenessscorebetweenuandalltheotherusersaswellasˆv.Arankinglistofalltheuserstogetherwithˆvindecreas-ingorderoftheclosenessscoreisthusgenerated.AllProblem:    Given  a  TwiNer  follow  network  of  a  target  user,  iden?fy  the  user’s  offline  community  by  examining  the  follow  linkage  alone.    Informa.on  should  be  able  to  flow  in  both  direc.ons  within  a  small  distance

1 / 26
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功