I am just a beginner of AWS and faced with an architecture decision.
I want to use a rest api in AWS as e.g. AWS Lambda to trigger a model inference by a http request. The problem is that the model is really computational expensive and so is the inference of it. I would prefer to use a gpu instance for infering the model but it seems to be difficult in AWS and I read that Lambda is not compatible with cuda.
It seems that the alternative is to run the model on a cpu instance and use a gpu-based acceleration service called EC2 Elastic Inference.
However, it seems really difficult for me as I am beginner in AWS. Am I on the right path of finding the right service for my issue or am I already looking in the wrong direction at all?
Moreover, would I pay in AWS just for the inferences / inference time if I configure such a service or do I have to pay the cpu/gpu instances for every hour over the whole time without infering actively?
I would be really happy if you can help me.