티스토리 뷰

공부

[spark] repartition() vs coalesce()

승가비 2022. 9. 26. 01:57
728x90

### repartition

Node 1 = 1,2,3
Node 2 = 4,5,6
Node 3 = 7,8,9
Node 4 = 10,11,12

 

### coalesce: minimize data movement

Node 1 = 1,2,3 + (10,11,12)
Node 3 = 7,8,9 + (4,5,6)

https://stackoverflow.com/questions/31610971/spark-repartition-vs-coalesce

 

Spark - repartition() vs coalesce()

According to Learning Spark Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that allows avoid...

stackoverflow.com

 

728x90

'공부' 카테고리의 다른 글

[java] PreparedStatement parameter indices start at 1  (0) 2022.09.26
[github] rest api  (0) 2022.09.26
[docker] apt-get update && apt-get install vim  (0) 2022.09.25
[Presto] JSON `( json, '$.root.child')`  (0) 2022.09.25
[Hive] `Tez` vs `MR`  (0) 2022.09.24
댓글