94ubuntu下的hadoop配置与运行

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

上海电力学院计算机与信息工程学院撰写人:周耀君QQ:648134235Hadoop部署、配置与运行扉言:此文档为自己部署过程中的记录。配置后演示了单节点、单机伪分布和两台机器之间的分布运行、并对伪分布和完全分布做了初步对比以增进理解,最后演示了在eclipse下运行hadoop自带例子wordcount的步骤。=====================================================系统配置=====================================================(一)资源需求➢LinuxUbuntu9.10//最新版本可上官方网站免费下载也可以向Ubuntu社区申请免费安装shipit.ubuntu.com➢Hadoop0.20.0包//最新版本可在Apache提供的镜像服务器下载//→→镜像服务器→hadoop➢Sun-java6-jdk包//在终端机里输入:apt-getinstallsun-java6-jdk//系统会自动下载包以及所有的依存包,同时进行包的安装➢SSH包(为远程登录会话提供安全性协议)//在终端机里输入:apt-getinstallssh➢Eclipse包//官方下载最新版本:(二)配置流程1.安装ubuntu9.042.更新deb软件包列表$sudoapt-getupdate3.安装系统更新$sudoapt-getupgrade4.安装JDK-1-上海电力学院计算机与信息工程学院撰写人:周耀君QQ:648134235$sudoapt-getinstallsun-java6-jdk//默认路径在/usr/lib/jvm,安装时需要TAB键选择OK5.设置java-6-sun为默认的java程序$sudoupdate-alternatives--configjava//JDK唯一,不需选择$sudoupdate-java-alternatives-sjava-6-sun6.设置CLASSPATH和JAVA_HOME系统环境变量$sudogedit/etc/environment添加以下两行内容:CLASSPATH=.:/usr/lib/jvm/java-6-sun/libJAVA_HOME=/usr/lib/jvm/java-6-sun7.调整系统虚拟机的优先顺序$sudogedit/etc/jvm在文件顶部添加一行/usr/lib/jvm/java-6-sun如果文件/etc/jvm不存在则自己新建8.多节点分布式环境下的两个必要条件a、每个节点有相同的用户名,如shiep205b、hadoop文件路径相同,如/home/shiep205/hadoop9.下载hadoop-*.tar.gz至/home/shiep205/$cd~//选择默认路径$sudotarxzfhadoop-0.20.0.tar.gz//解压至当前路径$mvhadoop-0.20.0hadoop//重命名为hadoop$sudochown-Rshiep205:shiep205hadoop//赋予shiep205权限10.更新hadoop环境变量$gedithadoop/conf/hadoop-env.sh将#exportJAVA_HOME=/usr/lib/jvm/java-6-sun改为exportJAVA_HOME=/usr/lib/jvm/java-6-sun11.配置SSH$sudoapt-getinstallssh$sudoapt-getinstallrsync//远程同步,可能已经安装了最新版本$ssh-keygen-tdsa-P''-f~/.ssh/id_dsa-2-上海电力学院计算机与信息工程学院撰写人:周耀君QQ:648134235$cat~/.ssh/id_dsa.pub~/.ssh/authorized_keys$sshlocalhost//验证配置成功与否=====================================================单节点配置=====================================================在前面工作已经做好的基础上,单节点的运行,运行在非分布模式,hadoop作为单个java进程。运行命令,查看hadoop的使用文档Bin/hadoop以下例子复制压缩的conf目录作为输入,查找并显示正规式的匹配。输出写到output目录$mkdirinput$cpconf/*.xmlinput$bin/hadoopjarhadoop-*-examples.jargrepinputoutput'dfs[a-z.]+'$catoutput/*-3-上海电力学院计算机与信息工程学院撰写人:周耀君QQ:648134235=====================================================单机伪分布=====================================================伪分布运行模式是在运行在单个机器之上,每一个hadoop的守护进程为一个单独的java进程。(一)配置三个文件conf/core-site.xml:configurationpropertynamefs.default.name/namevaluehdfs://localhost:9000/value/property/configurationconf/hdfs-site.xml:configurationpropertynamedfs.replication/namevalue1/value/property/configurationconf/mapred-site.xml:configurationpropertynamemapred.job.tracker/namevaluelocalhost:9001/value/property/configuration(二)格式化HDFS进入hadoop的bin目录,运行命令:-4-上海电力学院计算机与信息工程学院撰写人:周耀君QQ:648134235$sudobin/hadoopnamenode-format10/02/2100:15:08INFOnamenode.NameNode:STARTUP_MSG:/************************************************************STARTUP_MSG:StartingNameNodeSTARTUP_MSG:host=master/127.0.1.1STARTUP_MSG:args=[-format]STARTUP_MSG:version=0.20.0STARTUP_MSG:build=:18:40UTC2009************************************************************/10/02/2100:15:09INFOnamenode.FSNamesystem:fsOwner=root,root10/02/2100:15:09INFOnamenode.FSNamesystem:supergroup=supergroup10/02/2100:15:09INFOnamenode.FSNamesystem:isPermissionEnabled=true10/02/2100:15:09INFOcommon.Storage:Imagefileofsize94savedin0seconds.10/02/2100:15:09INFOcommon.Storage:Storagedirectory/tmp/hadoop-root/dfs/namehasbeensuccessfullyformatted.10/02/2100:15:09INFOnamenode.NameNode:SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG:ShuttingdownNameNodeatmaster/127.0.1.1************************************************************/(三)启动hadoop监护进程命令$bin/start-all.shstartingnamenode,loggingto/home/shiep205/hadoop/bin/../logs/hadoop-shiep205-namenode-master.outlocalhost:startingdatanode,loggingto/home/shiep205/hadoop/bin/../logs/hadoop-shiep205-datanode--5-上海电力学院计算机与信息工程学院撰写人:周耀君QQ:648134235master.outlocalhost:startingsecondarynamenode,loggingto/home/shiep205/hadoop/bin/../logs/hadoop-shiep205-secondarynamenode-master.outstartingjobtracker,loggingto/home/shiep205/hadoop/bin/../logs/hadoop-shiep205-jobtracker-master.outlocalhost:startingtasktracker,loggingto/home/shiep205/hadoop/bin/../logs/hadoop-shiep205-tasktracker-master.out(四)复制输入文件到HDFS命令:$bin/hadoopdfs-putconfinput//在HDFS下创建input目录,将hadoop/conf下的文件上传到input下//可以通过bin/hadoopdfs-lsinput查看文件夹中的内容(五)运行例子命令:$bin/hadoopjarhadoop-*-examples.jargrepinputoutput'dfs[a-z].+'10/02/2100:06:13INFOmapred.FileInputFormat:Totalinputpathstoprocess:1910/02/2100:06:13INFOmapred.JobClient:Runningjob:job_201002202351_000110/02/2100:06:14INFOmapred.JobClient:map0%reduce0%10/02/2100:06:27INFOmapred.JobClient:map10%reduce0%10/02/2100:06:33INFOmapred.JobClient:map21%reduce0%10/02/2100:06:36INFOmapred.JobClient:map31%reduce7%10/02/2100:06:39INFOmapred.JobClient:map42%reduce7%10/02/2100:06:42INFOmapred.JobClient:map52%reduce7%10/02/2100:06:

1 / 25
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功