Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark大数据分析平台第1周Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区法律声明【声明】本视频和幻灯片为炼数成金网络课程的教学资料,所有资料只能在课程内使用,不得在课程以外范围散播,违者将可能被追究法律和经济责任。课程详情访问炼数成金培训网站大数据分析平台Spark概述Spark安装及配置Spark源代码编译SparkStandalone运行SparkStandaloneHA安装SparktoolSpark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark大数据分析平台2014年2月27日Apache软件基金会宣布,Spark成为Apache的一个顶级项目,标志着Spark进入高速发展期。Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark大数据分析平台Spark1.5.2(2015-11-09)Spark1.5.1(2015-10-02)Spark1.5.0(2015-09-09)Spark1.4.1(2015-07-15)Spark1.4.0(2015-06-11)Spark1.3.1(2015-04-17)Spark1.3.0(2015-03-13)Spark1.2.2(2015-04-17)Spark1.2.1(2015-02-09)Spark1.2.0(2014-12-18)Spark1.1.1(2014-11-26)Spark1.1.0(2014-09-11)Spark1.0.2(2014-08-05)Spark1.0.1(2014-07-11)Spark1.0.0(2014-05-30)Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark大数据分析平台2015SparkSummitSpark最大的集群来自腾讯——8000个节点单个Job最大分别是阿里巴巴和Databricks——1PBIBM将对开源实时大数据分析项目Spark进行大规模资助Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark安装1.运行环境配置A.下载及配置JDK,Scala,sbt,MavenJDKjdk-7u79-linux-x64.gzScala解压tarzxfjdk-7u79-linux-x64.gztarzxfscala-2.10.5.tgzB.配置vi~/.bash_profileexportJAVA_HOME=$HOME/jdk1.7.0_79exportPATH=$JAVA_HOME/bin:$PATHexportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexportSCALA_HOME=$HOME/scala/scala-2.10.5Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark安装1.运行环境配置exportPATH=$PATH:$SCALA_HOME/binsource~/.bash_profileC.测试[jifeng@feng03~]$java-versionjavaversion1.7.0_79Java(TM)SERuntimeEnvironment(build1.7.0_79-b15)JavaHotSpot(TM)64-BitServerVM(build24.79-b02,mixedmode)[jifeng@feng03~]$scala-versionScalacoderunnerversion2.10.5--Copyright2002-2013,LAMP/EPFLSpark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark安装1.运行环境配置D.Maven,sbt配置exportMAVEN_HOME=$HOME/apache-maven-3.2.5exportSBT_HOME=$HOME/sbtexportPATH=$PATH:$SCALA_HOME/bin:$MAVEN_HOME/bin:$SBT_HOME/bin[jifeng@feng03~]$mvn--versionApacheMaven3.2.5(12a6b3acb947671f09b81f49094c53f426d8cea1;2014-12-15T01:29:23+08:00)Mavenhome:/home/jifeng/apache-maven-3.2.5Javaversion:1.7.0_79,vendor:OracleCorporationJavahome:/home/jifeng/jdk1.7.0_79/jreDefaultlocale:en_US,platformencoding:UTF-8OSname:linux,version:2.6.32-504.el6.x86_64,arch:amd64,family:unix[jifeng@feng03~]$sbt--versionsbtlauncherversion0.13.7Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark安装2.Spark配置A.下载Hadoop,Sparkspark-1.4.0://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz解压[jifeng@feng03~]$tarzxfspark-1.4.0-bin-hadoop2.6.tgz[jifeng@feng03~]$tarzxfhadoop-2.6.0.tar.gz[jifeng@feng03~]$lsapache-maven-3.2.5codehadoophadoop-2.6.0jdk1.7.0_79sbtscalasoftspark-1.4.0-bin-hadoop2.6Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark安装B.配置Hadoop,Spark的安装目录vi~/.bash_profileexportJAVA_HOME=$HOME/jdk1.7.0_79exportPATH=$JAVA_HOME/bin:$PATHexportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexportSCALA_HOME=$HOME/scala/scala-2.10.5exportSPARK_HOME=$HOME/spark-1.4.0-bin-hadoop2.6exportHADOOP_HOME=$HOME/hadoop-2.6.0exportHADOOP_CONF_DIR=$HOME/hadoop-2.6.0/etc/hadoopexportMAVEN_HOME=$HOME/apache-maven-3.2.5exportSBT_HOME=$HOME/sbtexportPATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$MAVEN_HOME/bin:$SBT_HOME/binsource~/.bash_profileSpark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark安装2.Spark配置文件A.复制配置文件cpspark-env.sh.templatespark-env.sh在spark-env.sh最后添加下面exportSCALA_HOME=/home/jifeng/scala/scala-2.10.5exportSPARK_MASTER_IP=feng03exportSPARK_WORKER_MEMORY=2GexportJAVA_HOME=/home/jifeng/jdk1.7.0_79B.配置slaves在slaves最后添加下面feng033.启劢master./sbin/start-master.sh4.启劢worker./sbin/start-slaves.shspark://feng03:70775.启劢Shell./bin/spark-shell--masterspark://feng03:7077Spark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark源代码编译1.Maven编译下载代码wget=2.6.0-DskipTestscleanpackage./make-distribution.sh--tgz-Phadoop-2.6-Pyarn-DskipTests-Dhadoop.version=2.6.0-Phive-Phive-thriftservercleanpackage2.SBT编译build/sbt-Pyarn-Phadoop-2.6assemblySpark大数据分析平台讲师冰风影DATAGURU专业数据分析社区Spark源代码编译[INFO]SparkProjectParentPOM...........................SUCCESS[06:15min][INFO]SparkLauncherProject.............................SUCCESS[09:23min][INFO]SparkProjectNetworking...........................SUCCESS[27.511s][INFO]SparkProjectShuffleStreamingService............SUCCESS[9.344s][INFO]SparkProjectUnsafe...............................SUCCESS[7.916s][INFO]SparkProjectCore.................................SUCCESS[07:38min][INFO]SparkProjectBagel................................SUCCESS[12.237s][INFO]SparkProjectGraphX...............................SUCCESS[47.515s][INFO]SparkProjectStreaming............................SUCCESS[01:01min][INFO]SparkProjectCatalyst.............................SUCCESS[01:11min][INFO]SparkProjectSQL..................................SUCCESS[02:33min][INFO]SparkProjectMLLibrary...........................SUCCESS[03:12min][INFO]SparkProjec