티스토리 뷰

공부

[Pyspark] functions

승가비 2020. 11. 7. 11:41
728x90
print((df.count(), len(df.columns)))

 

https://jaeyung1001.tistory.com/59

 

[Pyspark] pyspark 함수 정리(1)

csv, parquet파일 읽어오기 1 2 3 4 5 6 7 8 9 10 # CSV 파일 읽기 df = spark.read.csv("...") df.printSchema() df.show() # parquet으로 저장된 파일 읽기 df2 = spark.read.parquet("...") df2.pr..

jaeyung1001.tistory.com

https://stackoverflow.com/questions/39652767/pyspark-2-0-the-size-or-shape-of-a-dataframe

 

PySpark 2.0 The size or shape of a DataFrame

I am trying to find out the size/shape of a DataFrame in PySpark. I do not see a single function that can do this. In Python I can do data.shape() Is there a similar function in PySpark. This is...

stackoverflow.com

https://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=foreach

 

pyspark package — PySpark 3.0.1 documentation

Compute a histogram using the provided buckets. The buckets are all open to the right except for the last which is closed. e.g. [1,10,20,50] means the buckets are [1,10) [10,20) [20,50], which means 1<=x<10, 10<=x<20, 20<=x<=50. And on the input of 1 and 5

spark.apache.org

 

728x90
댓글