AWS Lambda: Async imports and data query to reduce latency

Let’s say I have this Lambda function, which needs to do three things: 1/ import some “heavy modules” (ex Pandas, Numpy), 2/ request some nontrivial volume of data and 3/ perform some analysis on such data.

What I think might be a plausible solution is to define three asyc functions, heavy_import_handler, query, and analyze. Importing modules in global scope via functions is a high-interest Q/A.

So, I should be able to initiate query, free up CPA while waiting for response, begin heavy_import_handler, and block analyze until the two previous functions complete.

Is this an anti-pattern? Is there a simpler approach?

Or perhaps this is a standard solution for Lambda where the heavy imports would be released from memory at end of execution?

(Bonus points: Would provisioned concurrency keep these imports “hot” in memory, or would the execution environments simply be cached so the latency is lower?)

  • What exactly are you trying to optimize? Lambda is billed by the GB×second, which is calculated as the duration of its invocation (plus the first-time initialization of the container) multiplied by the RAM it allocates (regardless of how much of it it actually uses). CPU consumption is not in the equation at all: if your lambda just sits there waiting for an IO response, you still get billed the same as if it were crunching numbers all that time.

    – 




  • Numpy and pandas can take hundreds of milliseconds to import. I want that concurrent so I’m not wasting my users time. Could be doing imports while the I/O bound operation is happening instead of executing two slow operations sequentially 🙂

    – 

  • 1

    so you want to parallelize import and query, right?

    – 

  • Well I believe parallelization in python explicitly means using multiple cores because multiple threads in parallel is not an option due to the global interpreter lock. I use the word concurrent, but yes, you understand the gist!

    – 




Leave a Comment