티스토리 뷰

공부

[spark] joins

승가비 2023. 2. 26. 15:28
728x90

https://towardsdatascience.com/demystifying-joins-in-apache-spark-38589701a88e

 

Demystifying Joins in Apache Spark

This story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…

towardsdatascience.com

https://yeo0.tistory.com/entry/Spark-BroadCast-Hash-JoinBHJ-Shuffle-Sort-Merge-JoinSMJ

 

[Spark] BroadCast Hash Join(BHJ) / Shuffle Sort Merge Join(SMJ)

0. Overview Spark의 Join연산은 Executor들 사이의 방대한 데이터 이동을 일으킨다. 그것을 Shuffle이 일어난다고 표현하는데 어떤 데이터를 생성하고, 어떤 Key 관련된 데이터를 Disk에 쓰고, 어떻게 Key와

yeo0.tistory.com

https://spark.apache.org/docs/latest/sql-performance-tuning.html

 

Performance Tuning - Spark 3.3.2 Documentation

 

spark.apache.org

https://velog.io/@rymyung/Apache-Spark-Join-Strategy-f5csjtxo

 

Apache Spark - Join Strategy

Apache Spark의 Join 전략

velog.io

 

728x90

'공부' 카테고리의 다른 글

[spark] AQE (Adaptive Query Execution)  (0) 2023.03.01
[spark] spark.sql.cbo.enabled=true  (0) 2023.03.01
[readme] data engineer  (0) 2023.02.26
[spark] zipWithIndex  (0) 2023.02.26
[spark] broadcast nested loop join  (0) 2023.02.26
댓글