[spark] joins
https://towardsdatascience.com/demystifying-joins-in-apache-spark-38589701a88e
Demystifying Joins in Apache Spark
This story is exclusively dedicated to the Join operation in Apache Spark, giving you an overall perspective of the foundation on which…
towardsdatascience.com
https://yeo0.tistory.com/entry/Spark-BroadCast-Hash-JoinBHJ-Shuffle-Sort-Merge-JoinSMJ
[Spark] BroadCast Hash Join(BHJ) / Shuffle Sort Merge Join(SMJ)
0. Overview Spark의 Join연산은 Executor들 사이의 방대한 데이터 이동을 일으킨다. 그것을 Shuffle이 일어난다고 표현하는데 어떤 데이터를 생성하고, 어떤 Key 관련된 데이터를 Disk에 쓰고, 어떻게 Key와
yeo0.tistory.com
https://spark.apache.org/docs/latest/sql-performance-tuning.html
Performance Tuning - Spark 3.3.2 Documentation
spark.apache.org
https://velog.io/@rymyung/Apache-Spark-Join-Strategy-f5csjtxo
Apache Spark - Join Strategy
Apache Spark의 Join 전략
velog.io