티스토리 뷰

공부

`orc` vs `parquet` vs `avro`

승가비 2022. 9. 10. 10:43
728x90

https://qkqhxla1.tistory.com/1136

 

parquet vs orc vs avro (big data file format )

1. www.datanami.com/2018/05/16/big-data-file-formats-demystified/ 공통점. 3개 타입은 전부 하둡에 저장하는데에 최적화되어있다. orc, parquet, avro 3개 전부 기계가 읽을수 있는 바이너리 포맷이다. orc, p..

qkqhxla1.tistory.com

https://www.quora.com/Why-is-parquet-best-for-Spark-and-not-ORC-although-both-are-columnar-based-file-formats

 

Why is parquet best for Spark and not ORC, although both are columnar-based file formats?

Answer: In order to understand why Parquet file format is best suited for your requirement when using Apache Spark (as the execution engine), we have to understand and appreciate the features of it to arrive at the answer. To begin with Apache Spark is opt

www.quora.com

https://medium.com/@dhareshwarganesh/benchmarking-parquet-vs-orc-d52c39849aef

 

Benchmarking PARQUET vs ORC

In this article, we conduct few experiments on Parquet and ORC file system and conclude the advantages and disadvantages over each other.

medium.com

 

728x90

'공부' 카테고리의 다른 글

[spark] dropTempView  (0) 2022.09.10
[python] sort dict by key  (0) 2022.09.10
[aws] emr.sh  (0) 2022.09.10
[flask] trouble shooting (requirements.txt)  (0) 2022.09.10
[Jsoup] GET & POST crawling  (0) 2022.09.10
댓글