Spark Windows开发环境搭建(1.3.0) 2015-04-23 20:00

版本配套

Spark: 1.3.0
Scala: 2.10.6

软件安装

1、安装JDK

手工配置JAVA_HOME环境变量,并将JDK的bin目录加入Path环境变量中。

2、安装Scala Windows版

通过.msi软件包安装。安装完成后自动配置环境变量SCALA_HOME,并将scala下的bin目录加入Path环境变量中。

下载地址:htpp://www.scala-lang.org

3、下载Spark软件包

解压即可。

4、安装IntelliJ IDEA

下载地址:http://www.jetbrains.com/idea/,使用免费的社区版即可。

配置

  • 安装IntelliJ的Scala插件。

  • 配置Java SDK和Scala SDK

创建新项目,勾选Scala类型。New一个SDK。选择JDK,再选择JDK所在的目录。New一个Scala的SDK,再选择Scala所在的目录

  • 配置Global libraries

“File” -> “Project Structure…” -> “Global Libraries"选择"scala-sdk-2.10.5”,点击"+“将其他所有未加入的scala lib目录下的jar加入进来。

  • 配置Global libraries

"File” -> “Project Structure…” -> “Libraries"增加一个Spark SDK。Library选择Spark lib目录下的spark-assembly-1.3.0-hadoop2.4.0.jar文件。

  • 配置项目的源代码目录

"File” -> “Project Structure…” -> “Modules” -> “Sources”, 创建src\main\scala目录,并将此目录设置为源码目录,取消src为源码目录。

  • 写代码

创建net.cheyo包路径,创建类SparkPi,代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
package net.cheyo

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    val count = spark.parallelize(1 until n, slices).map { i =>
        val x = random * 2 - 1
        val y = random * 2 - 1
        if (x*x + y*y < 1) 1 else 0
      }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}
  • 编译工程

“Build” -> “Make Project”

  • 运行程序

“Run” -> “Edit configuration…” 增加一个Application, main class填"net.cheyo.SparkPi",VM options填"-Dspark.master=local"

打包成jar

依次选择"File"–> “Project Structure” –> “Artifact",选择”+“–> "Jar” –> “From Modules with dependencies",选择Main Class, 点击OK. 并在弹出框中选择输出jar位置,并选择"OK"。

打包jar包时: 按OK后,对artifacts进行配置,删除Output Layout中week2.jar中的几个依赖包,只剩MySparkPi项目本身

最后依次选择"Build"–> "Build Artifact"编译生成jar包

常用问题

Scala版本不配套引起运行错误

  • 现象描述

运行时出现如下错误:

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
    at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
    at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
    at akka.actor.RootActorPath.$div(ActorPath.scala:159)
    at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
    at akka.remote.RemoteActorRefProvider.<init>(RemoteActorRefProvider.scala:124)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
    at scala.util.Try$.apply(Try.scala:191)
    at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
    at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
    at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
    at scala.util.Success.flatMap(Try.scala:230)
    at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
    at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1832)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1823)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:270)
    at SparkPi$.main(SparkPi.scala:12)
    at SparkPi.main(SparkPi.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
  • 问题原因

Scala的版本与Spark版本存在兼容性问题。版本是配套的,估计此问题是Bug。

  • 解决措施

将Scala换成与Spark相匹配的版本。Spark是1.3.0时,该Scala从2.11版本换成2.10版本。

Tags: #Spark #Scala    Post on Spark