Spark函数之fold和foldByKey 2015-08-21 21:09

fold

fold是aggregate的简化,将aggregate中的seqOp和combOp使用同一个函数。

1
2
3
val a = sc.parallelize(1 to 6, 3)
a.fold(4)(_ + _)
res34: Int = 37

foldByKey

与fold类似,但是是基于相同的key进行计算。并且初始化只运用于Partition内的reduce操作。

1
2
3
4
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
b.foldByKey("=")(_ + _).collect
res43: Array[(Int, String)] = Array((4,=lion), (3,=dog=cat), (7,=panther), (5,=tiger=eagle))
Tags: #Spark    Post on Spark-API