基于Python的新浪微博数据爬虫

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

2014-06-052017-08-27。913301162011310812002211510500300。1989-CCF、1981-CCF、1971-CCF、。1001-9081201411-3131-04doi10.11772/j.issn.1001-9081.2014.11.3131Python,,*(,200444)(*jiangx@shu.edu.cn)。。、。。。PythonTP391TP311ADatacrawlerforSinaWeibobasedonPythonZHOUZhonghua,ZHANGHuiran,XIEJiang*(SchoolofComputerEngineeringandScience,ShanghaiUniversity,Shanghai200444,China)Abstract:Nowadays,mostofresearchesaboutsocialnetworkusedatafromforeignsocialnetworkplatforms.HoweverthelargestsocialnetworkplatformSinaWeiboinChinahasnodatainterfacesforinvestors.ASinaWeibodatacrawlercombinedwithparallelizationtechnologywasputforward.ItgotfansinformationandWeibodatacontentofdifferentweibousersinreal-time.Italsosupportedkeywordsmatchingandparallelization.Theserialdatacrawleranditsparallelversionwerecompared,andanexperimentaboutfluwasconductedonsomeWeibodata.Theresultsindicatethat,withparallelization,thistoolhaslinerspeedupandallthefetchingdataarewithtimelinessandaccuracy.Keywords:SinaWeibo;crawler;Python;parallel;bigdata0。。。Twitter、Facebook1-56-9。Twitter。Twitter。201335.565000。、。。。、。。Python。、、、。。、、、。。1。10-12JournalofComputerApplications,2014,34(11):3131-3134ISSN1001-9081CODENJYIIDU2014-11-10.joca.cn。。1.1。123、2。session。BeginStep1SendLoginRequestStep2GetResponseFromServerGetEncryptInformationFromResponseStep3EncryptUserInformationSendEncryptInformationStep4GetLoginStatusEnd1.2HTTP、。HTTPHTMLHTML。IDIDID。。。116。1。ID。。BeginInitializeWaitingQueryInitializeFinishedQueryPushSeedUserIntoWaitingQueryWhilelengthFinishedQuery<MaxNumBeginPopUserFromWaitingQueryScanUserInformationIfNewUserNotInWaitingQueryandNewUserNotInFinishedQueryBeginPushNewUserIntoWaitingQueryEndEndEnd1.3。。。。1.4。。。。BeginFori=0i<lenkeysi++BeginIfmatchcontentkeysiBegin231334returnTrueEndEndreturnFalseEnd2MPI。1。。。BeginIfIsMasterBeginInitializeWaitingQueueInitializeFinishedQueueLoadSomeUserFromDiskIntoWaitingQueuePop50NUsersFromWaitingQueueFori=0i<Ni++BeginSend50UserstoSlaveriPush50UsersIntoFinishedQueueEndWhilelengthFinishedQueue<MaxNumBeginIfReceiveNewUsersFromSlaverjBeginForeachUserinNewUsersBeginIfUserNotInWaitingQueueandUserNotInFinishedQueueBeginPushUserIntoWaitingQueueEndEndPop50UsersFromWaitingQueueSend50UserstoSlaverjPush50UsersIntoFinishedQueueEndEndEndElseBeginInitializeJobQueueInitializeNewUserQueueWhileTrueBeginReceiveUsersFromMasterPushUsersIntoJobQueueClearNewUserQueueWhilelengthJobQueue>0BeginPopUserFromJobQueueScanUserInformationPushNewUsersintoNewUserQueueEndSendNewUserQueueToMasterEndEndEnd3。3.15050100~2002。21CPU/sCPU/s11361.08226.221271.110197.24512.712168.56314.4141310.512。CPU2CPU。3.2Gephi133。3。4。331311Python、。343.315623520091020136376565361、、、578711。5。56。65~62013201336。。4。1。。2。。3。。[1]TUMASJANA,SPRENGERTO,SANDNERPG,etal.Predic-tingelectionswithTwitter:what140charactersrevealaboutpoliticalsentiment[C]//ProceedingsoftheFourthInternationalAAAICon-ferenceonWeblogsandSocialMedia.Madison:AAAIPress,2010,10:178-185.[2]WELCHMJ,SCHONFELDU,HED,etal.Topicalsemanticsoftwitterlinks[C]//ProceedingsoftheFourthACMInternationalConferenceonWebSearchandDataMining.NewYork:ACMPress,2011:327-336.[3]CARLISLEJE,PATTONRC.Issocialmediachanginghowweunderstandpoliticalengagement?AnanalysisofFacebookandthe2008presidentialelection[J].PoliticalResearchQuarterly,2013,66(4):883-895.[4]CUNLIFFED,MORRISD,PRYSC.Youngbilinguals'languagebehaviourinsocialnetworkingsites:theuseofwelshonFacebook[J].JournalofComputer-MediatedCommunication,2013,18(3):339-361.[5]STRAFLINGN,KRAMERNC.LearningtogetheronFacebooketal.Theinfluenceofsocialaspectsandpersonalityontheusageofsocialmediaforstudyrelatedexchange[J].GruppendynamikundOrganisationsberatung,2013,44(4):409-428.[6]DUANJY,DHOLAKIAN.ThereshapingofChineseconsumerval-uesinthesocialmediaera:exploringtheimpactofWeibo[J].JournalofMacromarketing,2013,33(4):402-403.[7]HUANGR,SUNX.Weibonetwork,informationdiffusionandim-plicationsforcollectiveactioninChina[J].InformationCommuni-cationandSociety,2014,17(1):86-104.[8]MAZOJ.BlockedonWeibo:whatgetssuppressedonChina'sver-sionofTwitter(andwhy)[J].Survival,2013,55(6):191-192.[9]POELLT,deKLOETJ,ZENGG,etal.WilltherealWeibopleasestandup?Chineseonlinecontentionandactor-networktheory[J].ChineseJournalofCommunication,2014,7(1):1-18.[10]PINKERTONB.Findingwhatpeoplewant:experienceswiththeWebCrawler[EB/OL].[2010-10-10]..webir.org/resources/phd/pinkerton_2000.pdf.[11]AHMADI-ABKENARIF,SELAMATA.Anarchitectureforafo-cusedtrendparallelWebcrawlerwiththeapplicationofclickstreamanalysis[J].InformationSciences,2012,184(1):266-281.[12]ZHOUL,LINL.Surveyontheresearchoffocusedcrawlingtechnique[J].ComputerApplications,2005,25(9):1965-1969(,.[J].,2005,25(9):1965-1969.)[13]BASTIANM,HEYMANNS,JACOMYM.Gephi:anopensourcesoftwareforexploringandmanipulatingnetworks[EB/OL].[2010-10-10]..org/publications/gephi-bastian-feb09.pdf.431334

1 / 4
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功