Today, AWS announces the release of Neuron 2.21 , introducing support for AWS Trainium2 chips and Amazon EC2 Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. This release also adds support for PyTorch 2.5 and introduces NxD Inference and Neuron Profiler 2.0 (beta). NxD Inference , is a new PyTorch-based library integrated with vLLM, simplifies the deployment of large language and multi-modality models and enables PyTorch model onboarding with minimal code changes, and Neuron Profiler 2.0 (beta), is new profiler that enhances capabilities and usability, including support for distributed workloads.
Neuron 2.21 also introduces Llama 3.1 405B model inference support using NxD Inference on a single trn2.48xlarge instance. The release updates Deep Learning Containers (DLCs) and Deep Learning AMIs (DLAMIs), and adds support for various model architectures, including Llama 3.2, Llama 3.3, and Mixture-of-Experts (MoE) models. New inference features include FP8 weight quantization and flash decoding for speculative decoding in Transformers NeuronX (TNx). Additionally, new training examples and features have been added, such as support for HuggingFace Llama 3/3.1 70B on Trn2 instances and DPO support for post-training model alignment.
AWS Neuron SDK supports training and deploying models on Trn1, Trn2, and Inf2 instances, available in AWS Regions as On-Demand Instances, Reserved Instances, Spot Instances, or part of Savings Plan.
For a full list of new features and enhancements in Neuron 2.21 and to get started with Neuron, see: