Introducing Amazon SageMaker Asynchronous Inference, a new inference option for workloads with large payload sizes and long inference processing times - devamazonaws.blogspot.com

We are introducing Amazon SageMaker Asynchronous Inference, a new inference option in Amazon SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for inferences with large payload sizes (up to 1GB) and/or long processing times (up to 15 minutes) that need to be processed as requests arrive. Asynchronous inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

Post Updated on August 20, 2021 at 08:31PM

Comments

Popular posts from this blog

[MS] Pulling a single item from a C++ parameter pack by its index, remarks - devamazonaws.blogspot.com

[MS] Debugger breakpoints are usually implemented by patching the in-memory copy of the code - devamazonaws.blogspot.com

[MS] The case of the crash when destructing a std::map - devamazonaws.blogspot.com