Running Custom Jobs in parallet/distributed manner in Apache Spark ❤️ {UPDATED 2023}

I have a function and list of dictionaries. Each dictionary represent some configuration. Function takes one configuration and processes it.

I wanted to distribute tasks/jobs (each task consists of function and a configuration) over, say n servers and was wondering if this can be done using Apache Spark.

I googled resources but they were mostly around SQL, processing the data frame etc.

Question:

Can this be done using Apache Spark? if yes, can we also control number of cores that can be used by each task running on a server?
Is Apache Spark the right tool for this setup?

Edit: Seems like Kubernetes is the right choice for such setups.

Leave a Comment Cancel reply