[spark] sql repartition (by query hint) `SELECT /*+ COALESCE(1) */` is better than `SELECT /*+ REPARTITION(1) */`

티스토리 뷰

공부

[spark] sql repartition (by query hint) `SELECT /+ COALESCE(1) /` is better than `SELECT /+ REPARTITION(1) /`

승가비 2022. 9. 22. 22:08

728x90

fun insertOverwrite(target: String, db: DB, table: Table, partition: String, columns: List<String>) {
    execute(
        """
        INSERT OVERWRITE TABLE ${db.name}.${table.name}
        PARTITION ($partition)
        SELECT /*+ REPARTITION(1) */ ${columns.concat(ConstUtil.COMMA)}
        FROM $target
        """.trimIndent()
    )
}

https://stackoverflow.com/questions/46932373/how-to-consolidate-results-of-a-spark-sql-query-to-avoid-lots-of-small-files-a

How to consolidate results of a spark SQL query to avoid lots of small files / avoid empty files

Context: In our data pipeline, we use spark SQL to run lots of queries that are supplied from our end users as text files that we then parameterise. Situation: Our queries look like: INSERT OVE...

stackoverflow.com

https://kontext.tech/article/1155/use-spark-sql-partitioning-hints

Use Spark SQL Partitioning Hints

In Spark or PySpark, we can use coalesce and repartition functions to change the partitions of a DataFrame. In article Spark repartition vs. coalesce , I summarized the key differences between these two. If we are using Spark SQL directly, how do we repa

kontext.tech

https://github.com/dhkdn9192/data_engineer_should_know/blob/master/interview/hadoop/difference_between_repartition_and_coalesce_in_spark.md

GitHub - dhkdn9192/data_engineer_should_know: 데이터 엔지니어가 알아야 하는 것들

데이터 엔지니어가 알아야 하는 것들. Contribute to dhkdn9192/data_engineer_should_know development by creating an account on GitHub.

github.com

728x90

저작자표시 비영리

'공부' 카테고리의 다른 글

[Hive] `Tez` vs `MR` (0)	2022.09.24
[kotlin] collection.forEachIndexed { index, element -> (0)	2022.09.24
[kotlin] camelToSnake (0)	2022.09.21
[Algorithm] Binary Search Tree Check (2)	2022.09.19
Deprecating the "X-" Prefix and Similar Constructs in Application Protocols (1)	2022.09.17

250x250

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

글 보관함

배우기를 멈추는 사람은 20세건 80세건 늙은 것이다.

티스토리 뷰

[spark] sql repartition (by query hint) `SELECT /+ COALESCE(1) /` is better than `SELECT /+ REPARTITION(1) /`

'공부' 카테고리의 다른 글

티스토리툴바