Hadoop本机集群机器:211212215216四台对应s100s101s102s103s104Ubuntu204-server-64-2-00100s100211node1Ubuntu204-server-64-2-01101s101212node2Ubuntu204-server-64-2-02102s102215node3Ubuntu204-server-64-2-03103s103215node3Ubuntu204-server-64-2-04104s104216node4a)bigdate海量数据:1byte=8bit1024B=1M2^101024M=1G2^101024G=1T2^101024T=1P2^101024P=1E2^101024E=1Z2^101024Z=1Y2^101231231024Y=1N2^10存储分布式存储计算分布式计算Hadoop(一头大象)DougcuttingHadoop可靠,可伸缩,分布式计算的开源软件。HDFS去IOE(IBM+oracle+EMC)MapReduceMR//映射和化简,编程模型推荐Bigdata4V1)Volumn//题量大2)Variaty//样式多3)Velocity//速度快4)Valueless//价值密度低b)Hadoop的安装:(安装ubuntu系统)安装jdkLn–s/soft/jdk-xxxjdk配置环境变量JAVA_HOME=/soft/jdkPATH=”…:/soft/jdk/bin”Source/etc/environmentJava–version安装hadoopHadoop.tar.gzHadoopversion配置环境变量HADOOP_HOMEPATH配置hadoop分三个模式:1.standelone|local//独立/本地模式,使用的本地文件系统Nothing查看文件系统的方式:查看文件系统hadoopfs–ls/没有守护进程,所有程序运行在同一JVM中,利用test和debug.2.PaeudodistributedMode//伪分布模式3.Fullydistributedmode//完全分布式配置SSH1)安装ssh$sudoapt-getinstallssh2)生成密钥对Ssh-keygen–trsa–P‘’–f~/.ssh/Cd~/.ssh3)导入公钥数据到授权库中Cat~/.ssh/id_rsa.pub~/.ssh/authorized_keys4)登录到localhostSshlocalhost5)格式化hdfs文件系统Hadoopnamenode-format6)启动所以进程Start-all.sh7)查看进程Jps//5RMNMNNDN2NN8)查看文件系统Hadoopfs-ls9)创建文件系统Hadoopfs–mkdir–p/user/Ubuntu/dataHadoopfs–ls–R/c)Hadoop包含三个模块1)Hadoopcommon:支持其他模块的工具模块2)HadoopDistributedFileSystem(HDFS)分布式文件系统,提供了对应用程序数据的高吞吐量访问。进程:NameNode名称节点NNDataNode数据节点DNSecondaryNamenode辅助名称节点2ndNN3)HadoopYARN:作业调度与集群资源管理的框架。进程ResourceManager资源管理—RMNodeManager节点管理器—NM4)HadoopMapReduce:基于yarn系统的对大数据集进行并行处理技术配置hadoop1)Standelone/locald)完全分布式安装:1)准备5台客户机本人集群机器:211212215216四台对应s100s101s102s103s104ip主机名ip主机名1.Ubuntu204-server-64-2-00100s100211node12.Ubuntu204-server-64-2-01101s101212node23.Ubuntu204-server-64-2-02102s102215node34.Ubuntu204-server-64-2-03103s103215node35.Ubuntu204-server-64-2-04104s104216node42)安装ssh1)安装ssh$sudoapt-getinstallssh2)生成密钥对ssh-keygen–trsa–P‘’–f~/.ssh/cd~/.ssh3)导入公钥数据到授权库中cat~/.ssh/id_rsa.pub~/.ssh/authorized_keysscp/root/.ssh/*node2@:/root/.ssh/scp/root/.ssh/*node3@:/root/.ssh/scp/root/.ssh/*node2@:/root/.ssh/4)登录其他机器:sshnode1ifconfig3)安装jdk1.rpm-ivh/opt/jdk-7u79-linux-x64.rpm2.ln–s/soft/jdk-xxxjdk3.配置环境变量4.JAVA_HOME=/soft/jdk5.PATH=”…;/soft/jdk/bin”6.source/etc/profile7.java–version4)安装hadoop1.tar–zxvfhadoop-2.7.3.tar.gz2.Hadoopversion3.配置环境变量4.HADOOP_HOME=/soft/hadoop-2.7.35.PATH=…:$HADOOP_HOME/bin:$HADOOP_HOME/sbin6.配置hadoop查看文件系统hadoopfs–ls/配置文件/etc/hadoop/core-site.xmlconfigurationpropertynamefs.default.name/namevaluehdfs://node1:8020/value/property/configurationHdfs-site.xmlconfigurationpropertynamedfs.replication/namevalue3/value/property/configurationMapred-site.xmlconfigurationpropertynamemapreduce.framework.name/namevalueyarn/value/property/configurationYarn-site.xmlconfigurationpropertynameyarn.resourcemanager/namevaluelocalhost/value/propertypropertynameyarn.nodemanager.aux-services/namevaluemapreduce_shuffle/value/property/configuration5)配置文件-rw-r--r--.1rootroot8616月610:41core-site.xml-rw-r--r--.1rootroot9506月610:41hdfs-site.xml-rw-r--r--.1rootroot8446月610:41mapred-site.xml-rw-r--r--.1rootroot7286月610:43yarn-site.xml-rw-r--r--.1rootroot126月610:43slaves/soft/hadoop/etc/hadoop/core-site.xmlfs.defaultFS=hdfs://node1//soft/hadoop/etc/hadoop/hdfs-site.xmlreplication=3dfs.namenode.secondary.http-address=node4:50090/soft/hadoop/etc/hadoop/mapred-site.xmlmapreduce.framework.name=yarnsoft/hadoop/etc/hadoop/yarn-site.xmlyarn.resourcemanager.hostname=node1/soft/hadoop/etc/hadoop/slavesnode2node3在集群上分发以上三个文件cd/soft/hadoop/etc/hadoopxsynccore-site.xmlxsyncyarn-site.xmlxsyncslaves6)首次启动hadoop1)格式化文件系统$hadoopnamenode-format2)启动所有进程$start-all.sh3)查询进程jsp4)停止所有进程Stop-all.sh使用webui访问hadoophdfs1)hdfshttp:/node1:500702)dataNode)2nn删除hadoop临时目录数据:默认:/tmp/hadoop-roothadoop-root-datanode.pid(伪分布式产生)e)几种脚本scprsyncxsyncxcall1)scp2)rsync远程同步工具主要备份和镜像支持链接,设备。rsync–rvl/soft/*ubuntu@s101:/soft3)自定义脚本xsync,在集群上分发文件循环复制文件到所以节点的相同目录下rsync–rvl/home/Ubuntuubuntu@s101:xsynchello.txt[/usr/local/bin/xsync][root@node1bin]#vimxsync#!/bin/bashpcount=$#if((pcount1));thenechonoargs;exit;fip1=$1;#获取文件名称fname=`basename$p1`echofname=$fname;#获取上级目录的绝对路径pdir=`cd-P$(dirname$p1);pwd`echopdir=$pdir;#echo$p1;cuser=`whoami`for((host=2;host5;host=host+1));doecho----------node$host---------#echo$pdir/$fname$cuser@node$host:$pdirscp$pdir/$fname$cuser@node$host:$pdirdone4)编写/usr/local/bin/xcall脚本,在所有主机上执行相同的命令xcallrm–rf/soft/jdk[/usr/local/bin/xcall][root@node1bin]#cd/usr/local/bin[root@node1bin]#xcallls-l/soft/[root@node1bin]#xcallrmhello.txt[root@node1bin]#vimxcall#!/bin/bashpcount=$#if((pcount1));thenechonoargs;exit;fiecho--------localhost-------$@echo$@for((host=2;host5;host=host+1));doecho----------node$host---------#scp$pdir/$fname$cuser@node$host:$pdirsshnode$host$@donef)整理hadoop的所有类库和配置文件解压缩hadoop-2.7.2.tar.gz到目录下整理jar包抽取所有配置文件[core_default.xml]hadoop-common-2.7.2.jar/core-default.xml[hdfs-default.xml]hadoop-hdfs-2.7.2.jar/hdfs-default.