Today, Amazon SageMaker introduces new multi-adapter inference capabilities that unlock exciting possibilities for customers using pre-trained language models. This feature allows you to deploy hundreds of fine-tuned LoRA (Low-Rank Adaptation) model adapters behind a single endpoint, dynamically loading the appropriate adapters in milliseconds based on the request. This enables you to efficiently host many specialized LoRA adapters built on a common base model, delivering high throughput and cost-savings compared to deploying separate models.
With multi-adapter inference, you can quickly customize pre-trained models to meet diverse business needs. For example, marketing and SaaS companies can personalize AI/ML applications using each customer’s unique images, communication style, and documents to generate tailored content in seconds. Similarly, enterprises in industries like healthcare and financial services can reuse a common LoRA-powered base model to tackle a variety of specialized tasks, from medical diagnosis to fraud detection, by simply swapping in the appropriate fine-tuned adapter. This flexibility and efficiency unlocks new opportunities to deploy powerful, adaptable AI across your organization.
The multi-adapter inference feature is generally available in: Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney, Jakarta), Canada (Central), Europe (Frankfurt, Stockholm, Ireland, London), Middle East (UAE), South America (Sao Paulo), US East (N. Virginia, Ohio), and US West (Oregon).
To get started, refer to the Amazon SageMaker developer guide
for information on using LoRA and managing model adapters.