[hive] `PARTITIONED BY` & `CLUSTERED BY`

티스토리 뷰

공부

[hive] `PARTITIONED BY` & `CLUSTERED BY`

승가비 2022. 12. 20. 22:37

728x90

PARTITIONED BY (dt string)
CLUSTERED BY (user_key) 
SORTED BY (user_key ASC) 
INTO 256 BUCKETS

CLUSTERED BY ~ SORTED BY ~ INTO {size} BUCKETS 을 사용해도,
spark sql plan partitioning 작업에는 영향 없음.

비용이 많이 나온 것과 관련해서는, 로드되는 data size가 커서 발생하는 것 같음.
향후에, small files merge 를 통해서, 비용을 최적화할 수 있음.

https://sparkbyexamples.com/apache-hive/hive-partitioning-vs-bucketing-with-examples/

Hive Partitioning vs Bucketing with Examples?

In this article, I will explain what is Hive Partitioning and Bucketing, the difference between Hive Partitioning vs Bucketing by exploring the advantages

sparkbyexamples.com

https://medium.com/nerd-for-tech/hive-data-organization-partitioning-clustering-3e14ef6ab121

Hive data organization — Partitioning & Clustering

Data organization impacts the query performance of any warehouse system. Hive is no exception to that. This blog aim at discussing…

medium.com

https://aws.amazon.com/ko/s3/pricing/?nc=sn&loc=4

Amazon S3 Simple Storage Service 요금 - Amazon Web Services

S3 버킷 및 객체를 대상으로 수행한 요청에 대해 요금을 지불합니다. S3 요청 요금은 요청 유형을 기준으로 하며 아래 표와 같이 요청 수에 따라 요금이 부과됩니다. Amazon S3 콘솔을 사용하여 스토

aws.amazon.com

728x90

저작자표시 비영리

'공부' 카테고리의 다른 글

[sh] grep exclude word `grep -v 'exclude_word' file` (0)	2022.12.28
[spark] text files (0)	2022.12.20
[scp] tez.tar.gz (0)	2022.12.20
[hive] DESCRIBE FORMATTED {db}.{table} (0)	2022.12.20
[spark] explain(true), explain("cost") (0)	2022.12.20

250x250

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

글 보관함

배우기를 멈추는 사람은 20세건 80세건 늙은 것이다.

티스토리 뷰

[hive] `PARTITIONED BY` & `CLUSTERED BY`

'공부' 카테고리의 다른 글

티스토리툴바