티스토리 뷰

공부

[spark] zipWithIndex

승가비 2023. 2. 26. 14:54
728x90

https://stackoverflow.com/questions/60645256/how-do-you-get-batches-of-rows-from-spark-using-pyspark

 

How do you get batches of rows from Spark using pyspark

I have a Spark RDD of over 6 billion rows of data that I want to use to train a deep learning model, using train_on_batch. I can't fit all the rows into memory so I would like to get 10K or so at a...

stackoverflow.com

https://www.tabnine.com/code/java/methods/org.apache.spark.api.java.JavaRDD/zipWithIndex

 

org.apache.spark.api.java.JavaRDD.zipWithIndex java code examples | Tabnine

.distinct().sortBy(s -> s, true, parsedRDD.getNumPartitions()) .zipWithIndex().mapValues(Long::intValue)

www.tabnine.com

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.zipWithIndex.html

 

pyspark.RDD.zipWithIndex — PySpark 3.3.2 documentation

Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the lar

spark.apache.org

 

728x90

'공부' 카테고리의 다른 글

[spark] joins  (0) 2023.02.26
[readme] data engineer  (0) 2023.02.26
[spark] broadcast nested loop join  (0) 2023.02.26
[kotlin] jsoup & retries  (0) 2023.02.26
[terraform] command (init, apply, plan)  (0) 2023.02.26
댓글