I’m using asyncio and ProcessPoolExecutor to run a piece of blocking code like this:
result = await loop.run_in_executor(pool, long_running_function)
I don’t want long_running_function to last more than 2 seconds and I can’t do proper timeout handling within it because that function comes from a third-party library.
What are my options?
- Add a timeout dectorator to
long_running_function?
https://pypi.org/project/timeout-decorator/ - Use Pebble https://pebble.readthedocs.io/en/latest/#pools-and-asyncio
Which is the best option?
This q is very close to Timeout handling while using run_in_executor and asyncio, for which no workaround was reported.
It’s been a few years. Is there a workaround?
If you have no control over the target long-running code, this is tricky. If that is Python, however, it should be possible to modify the called function by decorating it (i.e. having a wrapper call to the other function) – the problem is, wrappers can’t add timeout checking code inside the other function. And they can’t “kill” it after a timeout, even it is called in another thread.
But, since you are running things in other process: unlike threads, subprocess can be killed from an external thread in another process. But you are running a concurrent.futures compatible executor: simply killing the target process would kill one of the workers, and ordinry stdlib’s ProcessPoolExecutor
would just be crippled if you did that – probably taking the whole executor down.
If Pebble’s docs say that one of their specializations is exactly to allow timeouts when running things in an executor – so, that is the way to go.
Without recurring to Pebble, though, here is what I’d do: have a target controlling wrapper function that would be called by the executor: it would create a single working process and handle timeouts by killing their working process if needed. (And it could even handle control events sent over a queue).
Or, to reduce the overhead of one extra process per worker, I’d maybe create a specialized Executor that would handle the timeouts, using the main process asyncio loop itself (that is likely what Pebble already does).
But yes, having a “timeout” for ProcessPoolExecutor tasks could be an enhancement Python could see in a future version – one can “vote” for it in the language’s official topic for ideas at https://discuss.python.org/c/ideas/6
(It would be possible to write such control code here, but taking in account all edge cases relating to subprocesses would be reinventing well oiled wheels that already exist on the stdlib and Pebble, and likely not be suitable for all cases – although it would certainly work for simple cases)