Hadoop学习笔记——IT进行时(Email&MSN:zhengxianquan@hotmail.com)1.准备工作Hadoop最新版本为0.20.1,网址::Cygwin-Requiredforshellsupportinadditiontotherequiredsoftwareabove.1.2.Hadoop需要的基础东西1)JDK1.62)Cygwin1.3.安装Cygwin注意事项在windows上跑,需要这个东西来跑shell,构建Linux-likeenvironmentforWindows。下载地址:,当前版本为1.6。安装过程需要注意选择安装OPENSSL,在Net@Default的Category中选中:如下图:模式上必须选择ForAllUsers,而不是JustMe,否则到时候2.1步的SSH服务会启不来。2.单机上配置过程2.1.配置SSH2.1.1.配置服务1.OpenCygwincommandprompt2.Executethefollowingcommandssh-host-config3.Whenaskedifprivilegeseparationshouldbeused,answerno.4.Whenaskedifsshdshouldbeinstalledasaservice,answeryes.5.WhenaskedaboutthevalueofCYGWINenvironmentvariableenterntsec.6.Hereistheexamplesessionofthiscommand,notethattheinputtypedbytheuserisshowninpinkandoutputfromthesystemisshowningray.2.1.2.启动Cygwinsshd服务2.1.3.Setupauthorizationkeys1.Opencygwincommandprompt2.Executethefollowingcommandtogeneratekeysssh-keygen3.WhenpromptedforfilenamesandpassphrasespressENTERtoacceptdefaultvalues.4.Aftercommandhasfinishedgeneratingtheykey,enterthefollowingcommandtochangeintoyour.sshdirectorycd~/.ssh5.Checkifthekeyswhereindeedgeneratedbyexecutingthefollowingcommandls-lYoushouldseetwofileid_rsa.pubandid_rsawiththerecentcreationdates.Thesefilescontainauthorizationkeys.6.Toregisterthenewauthorizationkeysenterthefollowingcommand.Notethatdoublebrackets,theyareveryimportant.catid_rsa.pubauthorized_keys7.Nowcheckifthekeyswhereset-upcorrectlybyexecutingthefollowingcommandsshlocalhostSinceitisanewsshinstallationyouwarnedthatauthenticityofthehostcouldnotbeestablishedandwillbepromptedwhetheryoureallywanttoconnect,answeryesandpressENTER.Youshouldseethecygwinpromptagain,whichmeansthatyouhavesuccessfullyconnected.8.NowexecutethecommandagainsshlocalhostThistimeyoushouldnotbepromptedforanything.2.2.修订/配置/conf/core-site.xml默认是空的,修订为这样:propertynamefs.default.name/namevaluehdfs://zhengxq:9000/valuedescriptionThenameofthedefaultfilesystem.Eithertheliteralstringlocalorahost:portforDFS./description/propertypropertynamehadoop.tmp.dir/namevaluec:/temp/hadoop/valuedescriptionAbaseforothertemporarydirectories.Multipathcansplitby,/description/propertypropertynamedfs.replication/namevalue1/valuedescriptionDefaultblockreplication.Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.Thedefaultisusedifreplicationisnotspecifiedincreatetime./description/property曾经按照一些文档在这个文件配置完整而忽略了mapred-site.xml的配置,走了不少弯路。这里为正确的配置版本。据说可以配置多个目录来存储,我没试过,加上如下配置:propertynamedfs.name.dir/namevalue/disk2/hadoop/filesystem/name,/disk3/hadoop/filesystem/name/valuedescriptionDetermineswhereonthelocalfilesystemtheDFSnamenodeshouldstorethenametable.Ifthisisacomma-delimitedlistofdirectoriesthenthenametableisreplicatedinallofthedirectories,forredundancy./description/propertypropertynamedfs.data.dir/namevalue/disk2/hadoop/filesystem/data,/disk3/hadoop/filesystem/data/valuedescriptionDetermineswhereonthelocalfilesystemanDFSdatanodeshouldstoreitsblocks.Ifthisisacomma-delimitedlistofdirectories,thendatawillbestoredinallnameddirectories,typicallyondifferentdevices.Directoriesthatdonotexistareignored./description/property另外,在WINDOWS环境下,需要注意hadoop.tmp.dir的配置。2.3.修订/配置/conf/mapred-site.xml配置mapred-site.xml:propertynamemapred.job.tracker/namevaluelocalhost:9001/valuedescriptionThehostandportthattheMapReducejobtrackerrunsat.Iflocal,thenjobsarerunin-processasasinglemapandreducetask./description/property2.4.修订/配置\conf\masters和slaves把数据节点的主机名加到slaves、名称节点主机名加到masters。可以加多个,每行一个。注意主机名需要在每个服务器的/etc/hosts映射好。我是自己搭来验证下,所以只有默认的localhost2.5.修订/配置\conf\hadoop-env.sh加入exportJAVA_HOME=%JDK_HOME%一行即可(去掉注释并修订)。我自己的%JDK_HOME%为:D:/java/jdk1.6.0_14我尝试过D:/java/jdk1.5.0_19,不行,不知道源代码可否用1.5编译,反正直接跑的话JDK必须为1.6。2.6.其他数据节点类似,重复即可在每个数据节点重复1~7这些步骤。我没干过,因为我就是一台破IBMT61,也没空间安装虚拟机了。3.格式化及启动3.1.启动命令Administrator@zhengxq/cygdrive/e/download/java/hadoop/hadoop-0.20.1/bin$./hadoopnamenode-format09/11/0909:53:01INFOnamenode.NameNode:STARTUP_MSG:/************************************************************STARTUP_MSG:StartingNameNodeSTARTUP_MSG:host=zhengxq/192.168.129.138STARTUP_MSG:args=[-format]STARTUP_MSG:version=0.20.1STARTUP_MSG:build=:55:56UTC2009************************************************************/Re-formatfilesysteminc:\temp\hadoop\dfs\name?(YorN)Y09/11/0909:54:14INFOnamenode.FSNamesystem:fsOwner=Administrator,None,root,Administrators,Users,ORA_DBA09/11/0909:54:14INFOnamenode.FSNamesystem:supergroup=supergroup09/11/0909:54:14INFOnamenode.FSNamesystem:isPermissionEnabled=true09/11/0909:54:14INFOcommon.Stor