经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » Java相关 » Scala » 查看文章
spark本地环境的搭建到运行第一个spark程序
来源:cnblogs  作者:小小小华  时间:2018/11/30 9:35:22  对本文有异议

搭建spark本地环境

搭建Java环境

(1)到官网下载JDK

官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

(2)解压缩到指定的目录

  1. >sudo tar -zxvf jdk-8u91-linux-x64.tar.gz -C /usr/lib/jdk //版本号视自己安装的而定

(3)设置路径和环境变量

  1. >sudo vim /etc/profile

在文件的最后加上

  1. export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_91
  2. export JRE_HOME=${JAVA_HOME}/jre
  3. export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
  4. export PATH=${JAVA_HOME}/bin:$PATH

(4)让配置生效

  1. source /etc/profile

(5)验证安装是否成功

  1. ~$ java -version
  2. java version "1.8.0_181"
  3. Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
  4. Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

 

安装Scala

(1)到官网下载安装包

官网链接:https://www.scala-lang.org/download/

(2)解压缩到指定目录

  1. sudo tar -zxvf scala-2.11.8.tgz -C /usr/lib/scala //版本号视自己安装的而定

(3)设置路径和环境变量

  1. >sudo vim /etc/profile

在文件最后加上

  1. export SCALA_HOME=/usr/lib/scala/scala-2.11.8 //版本号视自己安装的而定
  2. export PATH=${SCALA_HOME}/bin:$PATH

(4)让配制生效

  1. source /etc/profile

(5)验证安装是否成功

  1. :~$ scala
  2. Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
  3. Type in expressions for evaluation. Or try :help.
  4. scala>

 

安装Spark

(1)到官网下载安装包

官网链接:http://spark.apache.org/downloads.html

(2)解压缩到指定目录

  1. sudo tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz -C /usr/lib/spark //版本号视自己安装的而定

(3)设置路径和环境变量

  1. >sudo vim /etc/profile

在文件最后加上

  1. export SPARK_HOME=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6
  2. export PATH=${SPARK_HOME}/bin:$PATH

(4)让配置生效

  1. source /etc/profile

(5)验证安装是否成功

  1. :~$ cd spark-1.6.1-bin-hadoop2.6
  2. :~/spark-1.6.1-bin-hadoop2.6$ ./bin/spark-shell
  3. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  4. Setting default log level to "WARN".
  5. To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  6. 18/09/30 20:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  7. 18/09/30 20:59:32 WARN Utils: Your hostname, pxh resolves to a loopback address: 127.0.1.1; using 10.22.48.4 instead (on interface wlan0)
  8. 18/09/30 20:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
  9. 18/09/30 20:59:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
  10. Spark context Web UI available at http://10.22.48.4:4040
  11. Spark context available as 'sc' (master = local[*], app id = local-1538312374870).
  12. Spark session available as 'spark'.
  13. Welcome to
  14. ____ __
  15. / __/__ ___ _____/ /__
  16. _\ \/ _ \/ _ `/ __/ '_/
  17. /___/ .__/\_,_/_/ /_/\_\ version 2.2.0
  18. /_/
  19. Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
  20. Type in expressions to have them evaluated.
  21. Type :help for more information.

 

安装sbt

(1)到官网下载安装包

官网链接:https://www.scala-sbt.org/download.html

(2)解压缩到指定目录

  1. tar -zxvf sbt-0.13.9.tgz -C /usr/local/sbt

(3)在/usr/local/sbt 创建sbt脚本并添加以下内容

  1. $ cd /usr/local/sbt
  2. $ vim sbt
  3. # 在sbt文本文件中添加如下信息:
  4. BT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
  5. java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar "$@"

(4)保存后,为sbt脚本增加执行权限

  1. $ chmod u+x sbt

(5)设置路径和环境变量

  1. >sudo vim /etc/profile

在文件最后加上

  1. export PATH=/usr/local/sbt/:$PATH

(6)让配置生效

  1. source /etc/profile

(7)验证安装是否成功

  1. $ sbt sbt-version
  2. //如果这条命令运行不成功请改为以下这条 >sbt sbtVersion
  3. $ sbt sbtVersion
  4. Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
  5. [info] Loading project definition from /home/pxh/project
  6. [info] Set current project to pxh (in build file:/home/pxh/)
  7. [info] 1.2.1

 

编写Scala应用程序

(1)在终端创建一个文件夹sparkapp作为应用程序根目录

  1. cd ~
  2. mkdir ./sparkapp
  3. mkdir -p ./sparkapp/src/main/scala #创建所需的文件夹结构

 

(2)./sparkapp/src/main/scala在建立一个SimpleApp.scala的文件并添加以下代码

  1. import org.apache.spark.SparkContext
  2. import org.apache.spark.SparkContext._
  3. import org.apache.spark.SparkConf
  4. object SimpleApp {
  5. def main(args:Array[String]){
  6. val logFile = "file:///home/pxh/hello.ts"
  7. val conf = new SparkConf().setAppName("Simple Application")
  8. val sc = new SparkContext(conf)
  9. val logData = sc.textFile(logFile,2).cache()
  10. val numAs = logData.filter(line => line.contains("a")).count()
  11. println("Lines with a: %s".format(numAs))
  12. }
  13. }

 

(3)添加该独立应用程序的信息以及与Spark的依赖关系

  1. vim ./sparkapp/simple.sbt

在文件中添加如下内容

  1. name:= "Simple Project"
  2. version:= "1.0"
  3. scalaVersion :="2.11.8"
  4. libraryDependencies += "org.apache.spark"%% "spark-core" % "2.2.0"

 

(4)检查整个应用程序的文件结构

  1. cd ~/sparkapp
  2. find .

文件结构如下

  1. .
  2. ./simple.sbt
  3. ./src
  4. ./src/main
  5. ./src/main/scala
  6. ./src/main/scala/SimpleApp.scala

 

(5)将整个应用程序打包成JAR(首次运行的话会花费较长时间下载依赖包,请耐心等待)

  1. sparkapp$ /usr/local/sbt/sbt package
  2. Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
  3. [info] Loading project definition from /home/pxh/sparkapp/project
  4. [info] Loading settings for project sparkapp from simple.sbt ...
  5. [info] Set current project to Simple Project (in build file:/home/pxh/sparkapp/)
  6. [success] Total time: 2 s, completed 2018-10-1 0:04:59

 

(6)将生成的jar包通过spark-submit提交到Spark中运行

  1. :~$ /home/pxh/spark-2.2.0-bin-hadoop2.7/bin/spark-submit --class "SimpleApp" /home/pxh/sparkapp/target/scala-2.11/simple-project_2.11-1.0.jar 2>&1 | grep "Lines with a:"
  2. Lines with a: 3

 

END........

 

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号