티스토리 뷰

공부

[Pyspark] df.repartition(10).write.format('jdbc')...

seunggabi 승가비 2020. 12. 13. 01:00
df
.repartition(10)        // No. of concurrent connection Spark to PostgreSQL
.write.format('jdbc').options(
      url=psql_url_spark,
      driver=spark_env['PSQL_DRIVER'],
      dbtable="{schema}.{table}".format(schema=schema, table=table),
      user=spark_env['PSQL_USER'],
      password=spark_env['PSQL_PASS'],
      batchsize=2000000,
      queryTimeout=690
      ).mode(mode).save()

https://stackoverflow.com/questions/58676909/how-to-speed-up-spark-df-write-jdbc-to-postgres-database

 

How to speed up spark df.write jdbc to postgres database?

I am new to spark and am attempting to speed up appending the contents of a dataframe, (that can have between 200k and 2M rows) to a postgres database using df.write: df.write.format('jdbc').optio...

stackoverflow.com

 

댓글
댓글쓰기 폼