MichaelG.NollMydigitalmoleskineHomeAboutContactBlogTutorialsProjectsPublicationsPhotographyAboutthistutorialYou’recurrentlyreading“RunningHadoopOnUbuntuLinux(Single-NodeCluster)”.Author:MichaelG.NollPublished:Aug05,2007Lastupdated:Nov27,2010Bookmark:PermanentLinkRunningHadoopOnUbuntuLinux(Single-NodeCluster)Inthistutorial,Iwilldescribehowtosetupasingle-nodeHadoopcluster.TableofContents:WhatwewanttodoPrerequisitesSunJava6AddingadedicatedHadoopsystemuserConfiguringSSHDisablingIPv6AlternativeHadoopInstallationAlternativeExcursus:HadoopDistributedFileSystem(HDFS)Configurationhadoop-env.sh页码,1/33(W)conf/*-site.xmlFormattingthenamenodeStartingyoursingle-nodeclusterStoppingyoursingle-nodeclusterRunningaMapReducejobDownloadexampleinputdataRestarttheHadoopclusterCopylocalexampledatatoHDFSRuntheMapReducejobRetrievethejobresultfromHDFSHadoopWebInterfacesMapReduceJobTrackerWebInterfaceTaskTrackerWebInterfaceHDFSNameNodeWebInterfaceWhat’snext?RelatedLinksChangelogComments(57)WhatwewanttodoInthisshorttutorial,Iwilldescribetherequiredstepsforsettingupasingle-nodeHadoopclusterusingtheHadoopDistributedFileSystem(HDFS)onUbuntuLinux.Areyoulookingforthemulti-nodeclustertutorial?Justheadoverthere.HadoopisaframeworkwritteninJavaforrunningapplicationsonlargeclustersofcommodityhardwareandincorporatesfeaturessimilartothoseoftheGoogleFileSystemandofMapReduce.HDFSisahighlyfault-tolerantdistributedfilesystemandlikeHadoopdesignedtobedeployedonlow-costhardware.Itprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.Themaingoalofthistutorialistogeta”simple”HadoopinstallationupandrunningsothatyoucanClusterofmachinesrunningHadoopatYahoo!(Source:Yahoo!)页码,2/33(W)playaroundwiththesoftwareandlearnmoreaboutit.Thistutorialhasbeentestedwiththefollowingsoftwareversions:UbuntuLinux10.04LTS(deprecated:8.10LTS,8.04,7.10,7.04)Hadoop0.20.2,releasedFebruary2010(deprecated:0.13.x–0.19.x)Youcanfindthetimeofthelastdocumentupdateattheverybottomofthispage.PrerequisitesSunJava6HadooprequiresaworkingJava1.5.x(aka5.0.x)installation.However,usingJava1.6.x(aka6.0.xaka6)isrecommendedforrunningHadoop.Forthesakeofthistutorial,IwillthereforedescribetheinstallationofJava1.6.InUbuntu10.04LTS,thepackagesun-java6-jdkhasbeendroppedfromtheMultiversesectionoftheUbuntuarchive.Youhavetoperformthefollowingfourstepstoinstallthepackage.1.AddtheCanonicalPartnerRepositorytoyouraptrepositories:2.Updatethesourcelist3.Installsun-java6-jdk4.SelectSun’sJavaasthedefaultonyourmachine.ThefullJDKwhichwillbeplacedin/usr/lib/jvm/java-6-sun(well,thisdirectoryisactuallyasymlinkonUbuntu).Afterinstallation,makeaquickcheckwhetherSun’sJDKiscorrectlysetup:AddingadedicatedHadoopsystemuserWewilluseadedicatedHadoopuseraccountforrunningHadoop.Whilethat’snotrequireditisrecommendedbecauseithelpstoseparatetheHadoopinstallationfromothersoftwareapplications1$sudoadd-apt-repositorydeb@ubuntu:~#java-version2javaversion1.6.0_203Java(TM)SERuntimeEnvironment(build1.6.0_20-b02)4JavaHotSpot(TM)ClientVM(build16.3-b01,mixedmode,sharing)页码,3/33(W)anduseraccountsrunningonthesamemachine(think:security,permissions,backups,etc).Thiswilladdtheuserhadoopandthegrouphadooptoyourlocalmachine.ConfiguringSSHHadooprequiresSSHaccesstomanageitsnodes,i.e.remotemachinesplusyourlocalmachineifyouwanttouseHadooponit(whichiswhatwewanttodointhisshorttutorial).Foroursingle-nodesetupofHadoop,wethereforeneedtoconfigureSSHaccesstolocalhostforthehadoopuserwecreateintheprevioussection.IassumethatyouhaveSSHupandrunningonyourmachineandconfiguredittoallowSSHpublickeyauthentication.Ifnot,thereareseveralguidesavailable.First,wehavetogenerateanSSHkeyforthehadoopuser.ThesecondlinewillcreateanRSAkeypairwithanemptypassword.Generally,usinganemptypasswordisnotrecommended,butinthiscaseitisneededtounlockthekeywithoutyourinteraction(youdon’twanttoenterthepassphraseeverytimeHadoopinteractswithitsnodes).Second,youhavetoenableSSHaccesstoyourlocalmachinewiththisnewlycreatedkey.ThefinalstepistotesttheSSHsetupbyconnectingtoyourlocalmachinewiththehadoopuser.Thestepisalsoneededtosaveyourlocalmachine’shostkeyfingerprinttothehadoopuser’sknown_hostsfile.IfyouhaveanyspecialSSHconfigurationforyourlocalmachinelikeanon-standardSSHport,youcandefinehost-specificSSHoptionsin$HOME/.ssh/config(seemanssh_configformoreinformation).1$sudoaddgrouphadoop2$sudoadduser--ingrouphadoophadoop01user@ubuntu:~$su-hadoop02hadoop@ubuntu:~$ssh-keygen-trsa-P03Generatingpublic/privatersakeypair.04Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):05Createddirectory'/home/hadoop/.ssh'.06Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.07Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.08Thekeyfingerprintis:099b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2hadoop@ubuntu10Thekey'srandomartimageis:11[...snipp...]12hadoop@ubuntu:~$1hadoop@ubuntu:~$cat$HOME/.ssh/i