AWS customers can now access the Llama 3.3 70B model from Meta through Amazon SageMaker JumpStart. The Llama 3.3 70B model balances high performance with computational efficiency. It also delivers output quality comparable to larger Llama versions while requiring significantly fewer resources, making it an excellent choice for cost-effective AI deployments.
Llama 3.3 70B features an enhanced attention mechanism that substantially reduces inference costs. Trained on approximately 15 trillion tokens, including web-sourced content and synthetic examples, the model underwent extensive supervised fine-tuning and Reinforcement Learning from Human Feedback (RLHF). This approach aligns outputs more closely with human preferences while maintaining high performance standards. According to Meta, this efficiency gain translates to nearly five times more cost-effective inference operations, making it an attractive option for production deployments.
Customers can deploy Llama 3.3 70B through the SageMaker JumpStart user interface or programmatically using the SageMaker Python SDK. SageMaker AI’s advanced inference capabilities help optimize both performance and cost efficiency for your deployments, allowing you to take full advantage of Llama 3.3 70B’s inherent efficiency while benefiting from a streamlined deployment process.
The Llama 3.3 70B model is available in all AWS Regions where Amazon SageMaker AI is available. To learn more about deploying Llama 3.3 70B on Amazon SageMaker JumpStart, see the documentation or read the blog.