Spark函数之mapValues和flatMapValues 2015-08-21 21:08

mapValues

对key-value形式的RDD中的value进行映射。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
scala> val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
a: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[44] at parallelize at <console>:21

scala> val b = a.map(x => (x.length, x))
b: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[45] at map at <console>:23

scala> b.mapValues("x" + _ + "x").collect
res19: Array[(Int, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))

scala>

flatMapValues

与mapValues类似,但可以将一个value展开成多个value。

1
2
3
4
val a = sc.parallelize(List(("fruit", "apple,banana,pear"), ("animal", "pig,cat,dog,tiger")))
a.flatMapValues(_.split(",")).collect
res23: Array[(String, String)] = Array((fruit,apple), (fruit,banana), (fruit,pear), 
    (animal,pig), (animal,cat), (animal,dog), (animal,tiger))
Tags: #Spark    Post on Spark-API