티스토리 뷰
728x90
Spark – Slow Load Into Partitioned Hive Table on S3 – Direct Writes, Output Committer Algorithms – Large-Scale Data Engine
I have a Spark job that transforms incoming data from compressed text files into Parquet format and loads them into a daily partition of a Hive table. This is a typical job in a data lake, it is quite simple but in my case it was very slow. Initially it to
cloudsqale.com
https://medium.com/arabamlabs/small-files-in-hadoop-88708e2f6a46
Small files in Hadoop
Problem
medium.com
728x90
'공부' 카테고리의 다른 글
[Python] flask swagger parameter model (0) | 2020.06.26 |
---|---|
[Java] list files in directory (0) | 2020.06.26 |
[Spark] small files in hadoop too slow (0) | 2020.06.26 |
[Java] /Contents/Home/jre/bin/javac not found (0) | 2020.06.26 |
[Kibana] filter greater than count (0) | 2020.06.26 |
[Python] binary string to int (0) | 2020.06.26 |
댓글