티스토리 뷰

공부

[spark] broadcast nested loop join

승가비 2023. 2. 26. 14:51
728x90
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
sql("select * from table_withNull where id not in (select id from tblA_NoNull)").explain(true)

not exists를 사용하면 쿼리가 SortMergeJoin과 함께 실행됩니다.

 

https://www.bigdatainrealworld.com/how-does-broadcast-nested-loop-join-work-in-spark/

 

How does Broadcast Nested Loop Join work in Spark?

Broadcast Nested Loop join works by broadcasting one of the entire datasets and performing a nested loop to join the data. So essentially every record from […]

www.bigdatainrealworld.com

https://learn.microsoft.com/ko-kr/azure/databricks/kb/sql/disable-broadcast-when-broadcastnestedloopjoin

 

쿼리 계획에 BroadcastNestedLoopJoin이 있는 경우 브로드캐스트 사용하지 않도록 설정 - Azure Databricks

쿼리 계획에 BroadcastNestedLoopJoin이 있을 때 브로드캐스트를 사용하지 않도록 설정하는 방법

learn.microsoft.com

 

728x90

'공부' 카테고리의 다른 글

[readme] data engineer  (0) 2023.02.26
[spark] zipWithIndex  (0) 2023.02.26
[kotlin] jsoup & retries  (0) 2023.02.26
[terraform] command (init, apply, plan)  (0) 2023.02.26
[hive] `VARCHAR` vs `STRING` -> STRING is winner  (0) 2023.02.26
댓글