Today, AWS announces that Amazon Bedrock now supports prompt caching. Prompt caching is a new capability that can reduce costs by up to 90% and latency by up to 85% for supported models by caching frequently used prompts across multiple API calls. It allows you to cache repetitive inputs and avoid reprocessing context, such as long system prompts and common examples that help guide the model’s response. When cache is used, fewer computing resources are needed to generate output. As a result, not only can we process your request faster, but we can also pass along the cost savings from using fewer resources.
Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies via a single API. Amazon Bedrock also provides a broad set of capabilities customers need to build generative AI applications with security, privacy, and responsible AI capabilities built in. These capabilities help you build tailored applications for multiple use cases across different industries, helping organizations unlock sustained growth from generative AI while providing tools to build customer trust and data governance.
Prompt caching is now available on Claude 3.5 Haiku and Claude 3.5 Sonnet v2 in US West (Oregon) and US East (N. Virginia) via cross-region inference, and Nova Micro, Nova Lite, and Nova Pro models in US East (N. Virginia). At launch, only a select number of customers will have access to this feature. To learn more about participating in the preview, see this page . To learn more about prompt caching, see our documentation and blog .