I have a function and list of dictionaries. Each dictionary represent some configuration. Function takes one configuration and processes it.
I wanted to distribute tasks/jobs (each task consists of function and a configuration) over, say n
servers and was wondering if this can be done using Apache Spark.
I googled resources but they were mostly around SQL, processing the data frame etc.
Question:
- Can this be done using Apache Spark? if yes, can we also control number of cores that can be used by each task running on a server?
- Is Apache Spark the right tool for this setup?
Edit: Seems like Kubernetes is the right choice for such setups.