Let’s say I have this Lambda function, which needs to do three things: 1/ import some “heavy modules” (ex Pandas, Numpy), 2/ request some nontrivial volume of data and 3/ perform some analysis on such data.
What I think might be a plausible solution is to define three asyc functions, heavy_import_handler
, query
, and analyze
. Importing modules in global scope via functions is a high-interest Q/A.
So, I should be able to initiate query, free up CPA while waiting for response, begin heavy_import_handler, and block analyze until the two previous functions complete.
Is this an anti-pattern? Is there a simpler approach?
Or perhaps this is a standard solution for Lambda where the heavy imports would be released from memory at end of execution?
(Bonus points: Would provisioned concurrency keep these imports “hot” in memory, or would the execution environments simply be cached so the latency is lower?)
What exactly are you trying to optimize? Lambda is billed by the GB×second, which is calculated as the duration of its invocation (plus the first-time initialization of the container) multiplied by the RAM it allocates (regardless of how much of it it actually uses). CPU consumption is not in the equation at all: if your lambda just sits there waiting for an IO response, you still get billed the same as if it were crunching numbers all that time.
Numpy and pandas can take hundreds of milliseconds to import. I want that concurrent so I’m not wasting my users time. Could be doing imports while the I/O bound operation is happening instead of executing two slow operations sequentially 🙂
so you want to parallelize import and query, right?
Well I believe parallelization in python explicitly means using multiple cores because multiple threads in parallel is not an option due to the global interpreter lock. I use the word concurrent, but yes, you understand the gist!