how can we optimize snowpark proc to parse around 30k XML files? ❤️ {UPDATED 2023}

We need to parse more than 30k xml files in one batch. Right now the procedure is taking 26 mins to parse 2.6 k files. When we tried to use snowpark optimized warehouse with max_concurrency_level=1,it took 30 mins. What are the best practices or ways to optimize memory usage and performance?

We are using standard xml.etree.ElementTree library for xml parsing. Each file is getting parsed and results are getting appended to dataframe immediately. But the performance is not good.

1

“optimize memory usage and performance” Very often you have to choose one of them.

–
Is it feasible to replace xml.etree.ElementTree?

–

Leave a Comment Cancel reply