countByKey
reduce
fold
first
take 取前n个(位置上)
top
7.takeSample
7. takeOrdered
rdd = sc.parallelize([1,3,2,4,7,9,6],1) print(rdd.takeOrdered(3)) # 1,2,3 print(rdd.takeOrdered(3,lambda x:-x)) # 9,7,6
8.foreach
8. saveAsTextFile
9. foreachPartition
rdd = sc.parallelize([1,3,2,4,7,9,6],3) def rid10(data): print("-------------------") result = list() for i in data: result.append(i*10) print(result) rdd.foreachPartition(rid10)
groupByKey与reduceByKey
总结:
- partitionBy
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)