Dataproc Serverless – Slow writes to GCS
I have Dataproc Serverless app using PySpark. The job is fairly straight forward, it reads from GCS from a structure like this: bucket ├── _d_date_sale=2023-11-23 │ └── xx.parquet ├── d_date_sale=2023-11-24 │ ├── 1.parquet │ └── 2.parquet | ….. ├── d_date_sale=2023-11-25 │ └── xx.parquet └── d_date_sale=2023-11-26 └── xx.parquet … Overall the structure might include 800+ parquet … Read more