Spark函数之map和flatMap 2015-08-21 21:07

map

RDD中的元素进行一对一的映射。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
scala> val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"), 3)
a: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[41] at parallelize at <console>:21

scala> val b = a.map(_.length)
b: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[42] at map at <console>:23

scala> val c = a.zip(b)
c: org.apache.spark.rdd.RDD[(String, Int)] = ZippedPartitionsRDD2[43] at zip at <console>:25

scala> c.collect
res18: Array[(String, Int)] = Array((dog,3), (salmon,6), (salmon,6), (rat,3), (elephant,8))

flatMap

与map类似,但可以将一个元素,映射到多个元素。

1
2
3
4
5
6
7
scala> val a = sc.parallelize(10 to 12)
a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[39] at parallelize at <console>:21

scala> a.flatMap(8 to _).collect
res17: Array[Int] = Array(8, 9, 10, 8, 9, 10, 11, 8, 9, 10, 11, 12)

scala>
Tags: #Spark    Post on Spark-API