Amazon Elastic Compute Cloud (EC2) is AWS's on-demand virtual server service. It lets you rent compute capacity by the second without committing to physical hardware. You choose the instance type, operating system, and region; AWS handles the rest.
First launched in 2006, EC2 now has a catalog of hundreds of instances. This guide will cover the most popular GPU accelerated instance families.
Takeaways
- P-Series vs. G-Series:
- Use P-series (P5/P6) for heavy, multi-node AI training.
- Choose G-series (G6/G7) for cost-effective AI inference.
- Combine Compute Savings Plans with Spot Instances to save up to 90% on fault-tolerant training.
- Storage Performance:
- Ephemeral Storage for fast temporary caches and checkpoints.
- EBS gp3 for persistent data that must survive shutdowns.
For simple, affordable access to A100 and H100 GPUs, create a Thunder Compute instance
How EC2 Fits Into the AWS Ecosystem
EC2 connects directly to services like S3 (object storage), VPC (networking), IAM (access control), and CloudWatch (monitoring). For GPU workloads, you'll also use EBS for persistent storage and EFA (Elastic Fabric Adapter) for low-latency multi-node networking.
This tight integration is both EC2's strength and its complexity. Every piece of infrastructure needs configuration, and costs accumulate across compute, storage, and data transfer.
EC2 Instance Types: An Overview
AWS groups EC2 instances into families based on their primary workload. The main categories are:
- Accelerated computing (P and G families): GPU-powered workloads
- General purpose (M and T families): balanced CPU, memory, and networking
- Compute optimized (C family): CPU-intensive workloads
- Memory optimized (R and X families): in-memory databases and analytics
- Storage optimized (I and D families): high-throughput local storage
Within each family, sizes scale consistently from nano to 48xlarge, with larger sizes getting proportionally more vCPUs, memory, and bandwidth.
How to Read an EC2 Instance Name (Family, Generation, Size)
EC2 instance names follow the pattern [family][generation][attributes].[size] . For example:
p5.48xlargemeans P family (high-performance GPU), fifth generation, 48xlarge sizeg6e.12xlargemeans G family (graphics/inference GPU), sixth generation, enhanced memory (e), 12xlarge size.
Common GPU instance attributes include:
d: local NVMe storagen: high-bandwidth networkinge:enhanced GPU memory
Knowing the naming convention for AWS instances lets you compare options quickly.
EC2 GPU Instances
AWS organizes its GPU instances into two main families each covering different cost-versus-performance:
- The P series for high-performance compute for ML training and HPC
- The G series for graphics and inference workloads.
| Family | GPU Models | On-Demand Pricing |
|---|---|---|
| P6 | NVIDIA B200 and B300 | $113.93–$142.42/hr |
| P5 | NVIDIA H100 and H200 | $6.88–$79.12/hr |
| P4 | NVIDIA A100 40GB and 80GB | $21.96–$27.96/hr |
| P3 | NVIDIA V100 16GB | $3.06-$24.48 |
| P2 | NVIDIA K80 | $0.90-$14.10 |
| G7 | NVIDIA RTX Pro 6000 | $3.36-$33.14/hr |
| G6 | NVIDIA L4 | $0.80-$13.35/hr |
| G5 | NVIDIA A10G | $1.00-$16.29/hr |
| G4 | NVIDIA T4 | $0.53-$0.98/hr |
P-Series: High-Performance Training
The P family is AWS's flagship line for compute-heavy AI and HPC workloads. These instances run NVIDIA's most powerful data center GPUs and include Elastic Fabric Adapter (EFA) networking for low-latency communication across large GPU clusters.
P instances are the right choice for training large foundation models, running dynamics simulations, or doing seismic analysis at scale. The tradeoff is cost as these instances are among the most expensive compute resources.
P6
The P6 family is AWS's newest generation of AI training infrastructure, built around NVIDIA Blackwell GPUs. Designed for frontier model training and large-scale HPC, P6 instances support tightly connected multi-GPU and multi-node deployments for the most demanding workloads.
| Instance | GPUs | VRAM | vCPUs | System RAM | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| p6-b200.36xlarge | 4 x NVIDIA B200 | 740 HBM3e | 144 | 960GB | 3.2 | Not available |
| p6-b200.48xlarge | 8 x NVIDIA B200 | 1,432 HBM3e | 192 | 2,048 | 3.2 | $113.93 |
| p6-b300.48xlarge | 8 x NVIDIA B300 | 2,144 HBM3e | 192 | 4,096 | 6.4 | $142.42 |
| u-p6e-gb200x36 | 36 x NVIDIA B300 | 6,660 | 1,296 | 8,640 | 14.4 | Not available |
| u-p6e-gb200x72 | 72 x NVIDIA B200 | 13,320 | 2,592 | 17,280 | 28.8 | Not available |
P5
The EC2 P5 family brings Hopper architecture to AWS with NVIDIA H100 and H200 GPUs. These instances target large language model training, fine-tuning, and distributed AI workloads, offering a major leap in performance over previous generations.
For a full overview of P5 instances, read AWS P5 vs Thunder Compute.

| Instance | GPUs | VRAM | vCPUs | System RAM | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| p5.4xlarge | 1 x NVIDIA H100 | 80 HBM3 | 16 | 256 | 0.1 | $6.88 |
| p5.48xlarge | 8 x NVIDIA H100 | 640 HBM3 | 192 | 2,048 | 3.2 | $55.04 - $68.80 |
| p5e.48xlarge | 8 x NVIDIA H200 | 1128 HBM3e | 192 | 2,048 | 3.2 | Not available |
| p5en.48xlarge | 8 x NVIDIA H200 | 1128 HBM3e | 192 | 2,048 | 3.2 | $63.30 - $79.12 |
P4
The P4 family is based on NVIDIA A100 GPUs and remains widely used for training and inference workloads. Although newer generations have arrived, P4 instances continue to provide strong performance for large models and are common in existing AI clusters.
| Instance | GPUs | VRAM | vCPUs | System RAM | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| p4d.24xlarge | 8 x NVIDIA A100 | 320 | 96 | 1,152 | 0.4 | $21.96 |
| p4de.24xlarge | 8 x NVIDIA A100 | 640 | 96 | 1,152 | 0.4 | $27.45 |
P3
The P3 family introduced NVIDIA V100 GPUs to AWS and played a major role in the early growth of deep learning in the cloud. While no longer cutting-edge, P3 instances remain suitable for many research and legacy training workloads.
| Instance | GPUs | VRAM | vCPUs | System RAM | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| p3.2xlarge | 1 x NVIDIA V100 | 16 | 8 | 61 | 0.01 | $3.06 |
| p3.8xlarge | 4 x NVIDIA V100 | 64 | 32 | 244 | 0.025 | $12.24 |
| p3.16xlarge | 8 x NVIDIA V100 | 128 | 64 | 488 | 0.10 | $24.48 |
P2
The P2 family is AWS's oldest GPU instance generation still available in some regions. Based on NVIDIA K80 GPUs, these instances are largely intended for legacy applications and older machine learning frameworks rather than modern AI training.
| Instance | GPUs | VRAM | vCPUs | System RAM | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| p2.xlarge | 1 x NVIDIA K80 | 24 GDDR5 | 4 | 61 | 0.01 | $0.90 |
| p2.8xlarge | 8 x NVIDIA K80 | 192 GDDR5 | 32 | 488 | 0.01 | $7.20 |
| p2.16xlarge | 16 x NVIDIA K80 | 384 GDDR5 | 64 | 732 | 0.02 | $14.40 |
G-Series: Graphics and Inference (G4, G6, G7)
The G family is AWS's more accessible GPU tier. These instances use power-efficient NVIDIA GPUs suited for real-time inference, media encoding, and virtual desktops. They cost significantly less than P instances and cover a wide range of configurations.
Unlike the P family, G instances target latency-sensitive single-node workloads rather than distributed training. The newer G6 and G7 generations have expanded AI inference capabilities considerably.
G7
Launched in January 2026, the G7e family is built around NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. AWS positions these instances for AI inference, agentic applications, multimodal models, and graphics workloads that need high memory capacity without moving to the more expensive P-series.
| Instance | GPUs | VRAM (GB) | vCPUs | System RAM (GB) | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| g7e.2xlarge | 1 x NVIDIA RTX Pro 6000 | 96 GDDR7 | 8 | 64 | 0.05 | $3.36 |
| g7e.4xlarge | 1 x NVIDIA RTX Pro 6000 | 96 GDDR7 | 16 | 128 | 0.05 | $4.00 |
| g7e.8xlarge | 1 x NVIDIA RTX Pro 6000 | 96 GDDR7 | 32 | 256 | 0.10 | $5.27 |
| g7e.12xlarge | 2 x NVIDIA RTX Pro 6000 | 192 GDDR7 | 48 | 512 | 0.40 | $8.29 |
| g7e.24xlarge | 4 x NVIDIA RTX Pro 6000 | 384 GDDR7 | 96 | 1,024 | 0.80 | $16.57 |
| g7e.48xlarge | 8 x NVIDIA RTX Pro 6000 | 768 GDDR7 | 192 | 2,048 | 1.60 | $33.14 |
G6
The G6 family brings NVIDIA L4 GPUs to AWS and represents a significant upgrade for inference workloads. These instances strike a balance between performance and cost, making them popular for serving LLMs, computer vision models, and media processing pipelines.
| Instance | GPUs | VRAM (GB) | vCPUs | System RAM (GB) | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| g6.xlarge | 1 x NVIDIA L4 | 24 GB | 4 | 16 | Up to 0.01 | $0.80 |
| g6.2xlarge | 1 x NVIDIA L4 | 24 GB | 8 | 32 | Up to 0.01 | $0.98 |
| g6.4xlarge | 1 x NVIDIA L4 | 24 GB | 16 | 64 | Up to 0.025 | $1.32 |
| g6.8xlarge | 1 x NVIDIA L4 | 24 GB | 32 | 128 | 0.025 | $2.01 |
| g6.12xlarge | 4 x NVIDIA L4 | 96 GB | 48 | 192 | 0.04 | $4.60 |
| g6.16xlarge | 1 x NVIDIA L4 | 24 GB | 64 | 256 | 0.025 | $3.40 |
| g6.24xlarge | 4 x NVIDIA L4 | 96 GB | 96 | 384 | 0.05 | $6.68 |
| g6.48xlarge | 8 x NVIDIA L4 | 192 GB | 192 | 768 | 0.10 | $13.35 |
G5
The G5 family uses NVIDIA A10G GPUs and serves as a versatile middle ground between graphics workloads and AI inference. It is widely used for model serving, virtual workstations, rendering, and smaller-scale training jobs.
| Instance | GPUs | VRAM (GB) | vCPUs | System RAM (GB) | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| g5.xlarge | 1 x NVIDIA A10G | 24 | 4 | 16 | Up to 0.01 | $1.00 |
| g5.2xlarge | 1 x NVIDIA A10G | 24 | 8 | 32 | Up to 0.01 | $1.21 |
| g5.4xlarge | 1 x NVIDIA A10G | 24 | 16 | 64 | Up to 0.025 | $1.62 |
| g5.8xlarge | 1 x NVIDIA A10G | 24 | 32 | 128 | 0.025 | $2.45 |
| g5.12xlarge | 4 x NVIDIA A10G | 96 | 48 | 192 | 0.04 | $5.672 |
| g5.16xlarge | 1 x NVIDIA A10G | 24 | 64 | 256 | 0.025 | $4.10 |
| g5.24xlarge | 4 x NVIDIA A10G | 96 | 96 | 384 | 0.05 | $8.14 |
| g5.48xlarge | 8 x NVIDIA A10G | 192 | 192 | 768 | 0.10 | $16.29 |
G4
The G4dn family is one of AWS's longest-running and most affordable GPU offerings. Powered by NVIDIA T4 GPUs, it remains a popular choice for cost-sensitive inference workloads, video transcoding, and virtual desktop infrastructure.
| Instance | GPUs | VRAM (GB) | vCPUs | System RAM (GB) | Network bandwidth (Tbps) | Price per GPU hour |
|---|---|---|---|---|---|---|
| g4dn.xlarge | 1 x NVIDIA T4 | 16 | 4 | 16 | Up to 0.025 | $0.53 |
| g4dn.2xlarge | 1 x NVIDIA T4 | 16 | 8 | 32 | Up to 0.025 | $0.75 |
| g4dn.4xlarge | 1 x NVIDIA T4 | 16 | 16 | 64 | Up to 0.025 | $1.20 |
| g4dn.8xlarge | 1 x NVIDIA T4 | 16 | 32 | 128 | 0.05 | $2.18 |
| g4dn.16xlarge | 1 x NVIDIA T4 | 16 | 64 | 256 | 0.05 | $4.35 |
| g4dn.12xlarge | 4 x NVIDIA T4 | 64 | 48 | 192 | 0.05 | $0.98 |
| g4dn.metal | 8 x NVIDIA T4 | 128 | 96 | 384 | 0.10 | $0.98 |
EC2 Pricing Models: How to Pay Less for More
EC2 offers five ways to pay for compute, each suited to a different usage pattern. Choosing the right model (or combining them) can dramatically cut your GPU bill. For GPU instances, the differences can mean thousands of dollars per month.
On-Demand Instances: Flexible but Expensive
On-demand is the default: you pay a fixed hourly rate with no upfront cost and no commitment. Billing is per-second (60-second minimum), which makes it the best option for short or unpredictable workloads.
The downside is cost. On-demand GPU rates are high, especially for the P family. For consistent usage (training jobs running 8 or more hours per day), on-demand is the most expensive option available.
EC2 Spot Instances: Up to 90% Off for Fault-Tolerant Workloads
Spot instances let you use unused EC2 capacity at discounts of up to 90% vs on-demand rates. The catch: AWS can reclaim them with two minutes of notice, so they only suit workloads that can tolerate interruption.
For GPU training jobs, spot instances work well when your code supports checkpointing. A job that saves progress every 15–30 minutes can survive an interruption and resume without much lost work. Spot discounts for GPU instances typically land in the 60–90% range, though availability for high-demand types like P5 can be limited.
EC2 Reserved Instances: Up to 72% Off for Predictable Workloads
Reserved instances (RIs) let you commit to a specific instance type and region for one or three years in exchange for discounts up to 72% off on-demand rates. Payment options range from all upfront (maximum discount) to no upfront (smaller discount).
The catch is inflexibility. Standard RIs lock you into a specific family, region, and OS. If your needs change or AWS releases a better GPU generation, you're stuck with your reservation or must sell it on the Reserved Instance Marketplace. Convertible RIs let you swap instance types but offer lower discounts.
Savings Plans: A More Flexible Alternative to Reserved Instances
Compute Savings Plans commit you to a consistent hourly spend in dollars (e.g., $5/hr) rather than a specific instance type. They apply automatically across EC2, Fargate, and Lambda, covering any instance family, size, region, and OS.
For GPU workloads, Compute Savings Plans are generally preferable to RIs unless you need the capacity reservation guarantee that RIs provide. Discounts are slightly below the best RI rates, but the flexibility makes them easier to manage. As of June 2025, AWS extended Savings Plan coverage to P6-B200 Blackwell instances.
EC2 Free Tier: What's Included and What's Not
The AWS Free Tier covers 750 hours per month of t2.micro or t3.micro instances for the first 12 months of a new account. It does not include any GPU instances, as there's no free tier for P-series or G-series.
The free tier is useful for learning EC2 networking and storage. After 12 months, it expires entirely and all usage is billed at standard on-demand rates.
EC2 Storage: EBS vs Ephemeral Storage
Every EC2 instance needs storage, and your choice affects performance, cost, and data persistence. For GPU workloads, storage speeds can become a training bottleneck when loading large datasets.
EC2 Ephemeral Storage
Some EC2 GPU instances include local NVMe SSDs called instance store volumes. This storage is physically attached to the host, giving it very high throughput and low latency compared to network-attached alternatives.
The critical limitation: instance store data is lost when the instance stops, terminates, or fails. For ML training, it works well for temporary files, dataset caches, and intermediate checkpoints.
EBS Volumes for Persistent Storage
Amazon Elastic Block Store (EBS) provides persistent block storage that survives instance stops and terminations. EBS volumes are network-attached, so throughput is lower than local NVMe, but data persists independently of the instance lifecycle.
For most GPU workloads, gp3 volumes offer the best balance of throughput and cost. If you need faster dataset reads, you can provision additional IOPS on gp3 or switch to io2 volumes. Note that EBS is billed separately and idle volumes still get charged even without an attached instance.
EC2 Cost Optimization: How to Cut Your AWS Bill
GPU instances are expensive, and small inefficiencies add up fast. A P5 left running overnight when idle can cost hundreds of dollars. These strategies help keep GPU spend in check without sacrificing productivity.
Right-Sizing Your Instances with AWS Compute Optimizer
AWS Compute Optimizer analyzes CloudWatch utilization metrics and recommends better-matched instance types. For GPU workloads, it often flags instances with consistently low GPU utilization.
Compute Optimizer is free to enable and covers EC2, EBS, Lambda, and Auto Scaling groups. Running it frequently can enable easy savings, especially as workloads evolve and new GPU generations become available.
Using Spot Instances for GPU Training Workloads
Spot instances are the highest-impact cost lever for GPU training. Switching a P4d training run from on-demand to spot can cut costs by 60% or more, even accounting for occasional interruptions.
The key is checkpointing. Write model checkpoints to EBS or S3 at regular intervals so your job can resume after an interruption. PyTorch, TensorFlow, and JAX all have native checkpointing support. Pair spot instances with Auto Scaling groups using mixed instance types to improve capacity availability.
Combining Reserved Instances and Savings Plans
For continuously running GPU workloads, mixing Savings Plans with spot capacity is a practical strategy. Cover your steady-state usage with a Compute Savings Plan commitment, then use on-demand or spot for peaks and experiments.
A good starting point: review your last 90 days of EC2 spend in AWS Cost Explorer, find your minimum consistent GPU usage, and commit that baseline to a Savings Plan. Add spot on top for bursty training runs. Avoid over-committing to specific GPU families via RIs; AWS releases new generations regularly, and old commitments age quickly.
Auto Scaling: Only Pay for What You Actually Use
EC2 Auto Scaling automatically adjusts the number of running instances based on demand signals such as CPU utilization, custom CloudWatch metrics, or a schedule. For GPU inference workloads, it can scale your fleet down to zero overnight and back up before business hours.
Setting up Auto Scaling for GPU instances requires an AMI with NVIDIA drivers pre-installed, and your application needs to handle cold start latency (the time a new GPU instance takes to initialize and accept traffic). For batch training, scheduled scaling is simpler: just set a schedule to terminate instances once the job finishes.
Is There a Simpler Way to Access GPUs?
EC2 is powerful, but it comes with real operational overhead. Between IAM policies, VPC configuration, EBS setup, driver installation, and pricing model complexity.
For a full analysis of new players in the cloud GPU market, read Neoclouds: The New Cloud GPUs.
