H Hadoo op 201 114 UDC CA Analysisa 201 Haandcalcul 114 20adooplationofm 0114massivelog 104gbasedon 2011 10497497208250 nHadoop 430070 5 00 0 I 2005CPU18ITGoogleIBMFaceBookYaohooHadoopHadoopMapReduceHadoopHadoopHadoopHadoopHDFSMapReduceHadoopHadoopHadoopHadoopHadoopHadoopHDFSMapReduce IIAbstract Withdevelopmentofscientifictechnology,thetransistorcircuithasbeengraduallyapproachingitsphysicallimitsontheperformance.Moor’Lawhasceasesedtobeinforceafter2005.ThecomputingpowerofsingleCPUisdoubledevery18monthsthatcannotbepossible.But,peopleon-lineexplode,thesecompanieswhoareprovingservicesonnetworkhavetoanalyzemassiverecordlogseverydayinordertomodifytheproductstomeetthecustomers’srequirementsintime.So,somecriticaldataoftheproductshouldbeprocessedinagiventime.Traditionaldatabasetechnologycannotprovideenoughcomputationalabilityandstoragetoalldatatomeetcustormer’sprocessingdataneeds.Peoplegiveaconceptofcloudcomputingtosolvethisproblem.Thisconceptcometobethedirectioninnearfurther.Nowadays,ITindustrybusinessgiantsuchasGoogle,IBM,FaceBook,YaohooandMicrosofthavetakenitsowncloudcomputingplatformtoprocessmassivedataandprovidecomputationalability.Inthispaper,Google’sHadoopcloudcomputingplatformwasselectedtoenhancethepowerofprocessinglargeoflog.Hadoopisanopensourcedistributedcomputingframework.Thisframeworkowngoodexpandcapactity,cheaperoperatingcosts,higherefficiencyandbetterstability.themore,MapReduceprogrammingmodelcanbecompatiblewithprocessingtextapplicationperfectly.Secondly,Hadoopcandealwithalllowermessagesforprogrammersduringparallelcomputing.Programmersonlyneedtodealwiththelogicalofdataandunnecessarytoconsiderthemessagesbetweentheparallelcomputersonhadoopcloudcomputing.Theprogrammerscanfocusonthecriticalissuesandspeedupprogramdevelopment.So,Hadoopplatformwaswidelyusedlaterreleased.Thispaperin-depthstudiedHadoop’sHDFSandMapReducemodel.AccordingtoHadoop’smodelofprocessingdata,wedesignprocessingdatamodeltofitourbusinessrequirements.Thismodelisappliedtopracticeworktosolvemassivelogprocessingandcutdownthetimeofdataprocessing.ThemostimportisHadoop IIIcloudplatformsolvedsingleseverdataprocessingpowerbottleneck.Inthispaper,Hadoopcloudcomputingplatformwasdesignedandimplemented.Onthehadoopplatform,Thedata-processmodelwasdesignedandimplementedtoresolvelogstatisticsandimprovethespeedofmassivelogprocessing.Programmingfordata-processsomestatisticproductonownHadoopcloudplatformanddosomeperformancetest.Byanalyzingrelationshipbetweencomputingpowerandnumberofworknodes,comparingthecomputingpowerofmultiplenodeswithsingledatabasecomputing,experimentaldatashowhadoophasastrongadvantageofpowerdealingwithmassivedata.Keywords:HadoopHDFSMapReduceCloudcomputingmassivedataprocessingandanalysis i ...........................................................................................................................I Abstract.........................................................................................................................II ...........................................................................................................................I 1...........................................................................................................1 1.1.............................................................................................................1 1.2.............................................................................................3 1.3.................................................................................................5 2...........................................................................................6 2.1HDFS...........................................................................................................6 2.2HDFS......................................................................................................7 2.3.............................................................................................7 2.4...............................................................................10 3HadoopMapReduce...................................................................11 3.1MapReduce................................................................................................11 3.2MapReduce................................................................................12 3.3...........................................................................................13 3.4...........................................................................................................14 4Hadoop................................................................