Spark函数之cogroup和groupWith 2015-08-20 21:06

说明

将两个RDD中key相同的值组合在一起形成一个新的RDD。新RDD中,原相同key的各个value会变成新RDD相应key的集合型的value。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
val a = sc.parallelize(List(1, 2, 1, 3), 1)
val b = a.map((_, "b"))
val c = a.map((_, "c"))
b.cogroup(c).collect
res40: Array[(Int, (Iterable[String], Iterable[String]))] = 
Array((1,(CompactBuffer(b, b),CompactBuffer(c, c))), 
(3,(CompactBuffer(b),CompactBuffer(c))), 
(2,(CompactBuffer(b),CompactBuffer(c))))

val d = a.map((_, "d"))
b.cogroup(c, d).collect
res41: Array[(Int, (Iterable[String], Iterable[String], Iterable[String]))] = 
Array((1,(CompactBuffer(b, b),CompactBuffer(c, c),CompactBuffer(d, d))), 
(3,(CompactBuffer(b),CompactBuffer(c),CompactBuffer(d))), 
(2,(CompactBuffer(b),CompactBuffer(c),CompactBuffer(d))))


val x = sc.parallelize(List((1, "apple"), (2, "banana"), (3, "orange"), (4, "kiwi")), 2)
val y = sc.parallelize(List((5, "computer"), (1, "laptop"), (1, "desktop"), (4, "iPad")), 2)
x.cogroup(y).collect
res42: Array[(Int, (Iterable[String], Iterable[String]))] = 
Array((4,(CompactBuffer(kiwi),CompactBuffer(iPad))), 
(2,(CompactBuffer(banana),CompactBuffer())), 
(1,(CompactBuffer(apple),CompactBuffer(laptop, desktop))), 
(3,(CompactBuffer(orange),CompactBuffer())), 
(5,(CompactBuffer(),CompactBuffer(computer))))
Tags: #Spark    Post on Spark-API