Vertex AI – Autoscaling Behavior for Inference Tasks
Is there a way to tweak the scaling behaviour for custom model deployments on Vertex AI? While we can set the min and max instances on custom model deployments, is there a method for determining how many requests are sent to a given instance? We’re performing GPU intensive operations that typically only allow us to … Read more