How does EC2 fit into the AWS ecosystem?

EC2 connects directly with core AWS services like S3, VPC, IAM, and CloudWatch. For GPU workloads, it integrates with EBS for persistent storage and EFA for low-latency multi-node networking, though this adds configuration overhead.

How do you read an EC2 instance name?

Read EC2 instance names using the `[family][generation][attributes].[size]` pattern. For example, `p5.48xlarge` denotes a P family, 5th-generation instance in the 48xlarge size, with attributes like `e` indicating enhanced memory.

What are P-Series EC2 instances used for?

P-series instances are used for high-performance AI training and complex HPC workloads. They run NVIDIA's most powerful data center GPUs and feature EFA networking for low-latency cluster communication, representing AWS's most expensive tier.

What are G-Series EC2 instances used for?

G-series instances are used for real-time AI inference, graphics rendering, media encoding, and virtual desktops. They utilize power-efficient NVIDIA GPUs, cost much less than the P-series, and target latency-sensitive, single-node workloads.

What is the difference between On-Demand and Spot instances?

On-Demand instances provide flexible compute with no commitment at fixed hourly rates, while Spot instances offer up to 90% discounts using spare capacity. However, AWS can reclaim Spot instances with a two-minute notice.

How do Savings Plans compare to Reserved Instances?

Compute Savings Plans offer flexible discounts based on a dollar-per-hour spend commitment, whereas Reserved Instances lock you into specific instance types, regions, and operating systems. Savings Plans are preferred for shifting GPU workloads.

What is the difference between EC2 Ephemeral Storage and EBS?

Ephemeral storage provides physically attached, temporary local NVMe SSDs that lose data upon instance termination, whereas EBS provides network-attached, persistent block storage that survives instance lifecycle changes.

How can you optimize EC2 GPU costs?

Optimize costs by right-sizing via AWS Compute Optimizer, using Spot instances with checkpointing, scheduling Auto Scaling for idle instances, and covering baseline usage with Compute Savings Plans.

Go back

EC2 GPU Instances: A Full Guide to AWS GPUs (July 2026)

Carl PetersonJuly 1, 202615 min read

Amazon Elastic Compute Cloud (EC2) is AWS's on-demand virtual server service. It lets you rent compute capacity by the second without committing to physical hardware. You choose the instance type, operating system, and region; AWS handles the rest.

First launched in 2006, EC2 now has a catalog of hundreds of instances. This guide will cover the most popular GPU accelerated instance families.

Takeaways

P-Series vs. G-Series:
- Use P-series (P5/P6) for heavy, multi-node AI training.
- Choose G-series (G6/G7) for cost-effective AI inference.
Combine Compute Savings Plans with Spot Instances to save up to 90% on fault-tolerant training.
Storage Performance:
- Ephemeral Storage for fast temporary caches and checkpoints.
- EBS gp3 for persistent data that must survive shutdowns.

For simple, affordable access to A100 and H100 GPUs, create a Thunder Compute instance

How EC2 Fits Into the AWS Ecosystem

EC2 connects directly to services like S3 (object storage), VPC (networking), IAM (access control), and CloudWatch (monitoring). For GPU workloads, you'll also use EBS for persistent storage and EFA (Elastic Fabric Adapter) for low-latency multi-node networking.

This tight integration is both EC2's strength and its complexity. Every piece of infrastructure needs configuration, and costs accumulate across compute, storage, and data transfer.

EC2 Instance Types: An Overview

AWS groups EC2 instances into families based on their primary workload. The main categories are:

Accelerated computing (P and G families): GPU-powered workloads
General purpose (M and T families): balanced CPU, memory, and networking
Compute optimized (C family): CPU-intensive workloads
Memory optimized (R and X families): in-memory databases and analytics
Storage optimized (I and D families): high-throughput local storage

Within each family, sizes scale consistently from nano to 48xlarge, with larger sizes getting proportionally more vCPUs, memory, and bandwidth.

How to Read an EC2 Instance Name (Family, Generation, Size)

EC2 instance names follow the pattern [family][generation][attributes].[size] . For example:

p5.48xlarge means P family (high-performance GPU), fifth generation, 48xlarge size
g6e.12xlarge means G family (graphics/inference GPU), sixth generation, enhanced memory (e), 12xlarge size.

Common GPU instance attributes include:

d: local NVMe storage
n: high-bandwidth networking
e:enhanced GPU memory

Knowing the naming convention for AWS instances lets you compare options quickly.

EC2 GPU Instances

AWS organizes its GPU instances into two main families each covering different cost-versus-performance:

The P series for high-performance compute for ML training and HPC
The G series for graphics and inference workloads.

Family	GPU Models	On-Demand Pricing
P6	NVIDIA B200 and B300	$113.93–$142.42/hr
P5	NVIDIA H100 and H200	$6.88–$79.12/hr
P4	NVIDIA A100 40GB and 80GB	$21.96–$27.96/hr
P3	NVIDIA V100 16GB	$3.06-$24.48
P2	NVIDIA K80	$0.90-$14.10
G7	NVIDIA RTX Pro 6000	$3.36-$33.14/hr
G6	NVIDIA L4	$0.80-$13.35/hr
G5	NVIDIA A10G	$1.00-$16.29/hr
G4	NVIDIA T4	$0.53-$0.98/hr

On-demand pricing as of June 2026. Prices vary by region and OS.

P-Series: High-Performance Training

The P family is AWS's flagship line for compute-heavy AI and HPC workloads. These instances run NVIDIA's most powerful data center GPUs and include Elastic Fabric Adapter (EFA) networking for low-latency communication across large GPU clusters.

P instances are the right choice for training large foundation models, running dynamics simulations, or doing seismic analysis at scale. The tradeoff is cost as these instances are among the most expensive compute resources.

P6

The P6 family is AWS's newest generation of AI training infrastructure, built around NVIDIA Blackwell GPUs. Designed for frontier model training and large-scale HPC, P6 instances support tightly connected multi-GPU and multi-node deployments for the most demanding workloads.

Instance	GPUs	VRAM	vCPUs	System RAM	Network bandwidth (Tbps)	Price per GPU hour
p6-b200.36xlarge	4 x NVIDIA B200	740 HBM3e	144	960GB	3.2	Not available
p6-b200.48xlarge	8 x NVIDIA B200	1,432 HBM3e	192	2,048	3.2	$113.93
p6-b300.48xlarge	8 x NVIDIA B300	2,144 HBM3e	192	4,096	6.4	$142.42
u-p6e-gb200x36	36 x NVIDIA B300	6,660	1,296	8,640	14.4	Not available
u-p6e-gb200x72	72 x NVIDIA B200	13,320	2,592	17,280	28.8	Not available

P5

The EC2 P5 family brings Hopper architecture to AWS with NVIDIA H100 and H200 GPUs. These instances target large language model training, fine-tuning, and distributed AI workloads, offering a major leap in performance over previous generations.

For a full overview of P5 instances, read AWS P5 vs Thunder Compute.

AWS EC2 P5 page

Instance	GPUs	VRAM	vCPUs	System RAM	Network bandwidth (Tbps)	Price per GPU hour
p5.4xlarge	1 x NVIDIA H100	80 HBM3	16	256	0.1	$6.88
p5.48xlarge	8 x NVIDIA H100	640 HBM3	192	2,048	3.2	$55.04 - $68.80
p5e.48xlarge	8 x NVIDIA H200	1128 HBM3e	192	2,048	3.2	Not available
p5en.48xlarge	8 x NVIDIA H200	1128 HBM3e	192	2,048	3.2	$63.30 - $79.12

P4

The P4 family is based on NVIDIA A100 GPUs and remains widely used for training and inference workloads. Although newer generations have arrived, P4 instances continue to provide strong performance for large models and are common in existing AI clusters.

Instance	GPUs	VRAM	vCPUs	System RAM	Network bandwidth (Tbps)	Price per GPU hour
p4d.24xlarge	8 x NVIDIA A100	320	96	1,152	0.4	$21.96
p4de.24xlarge	8 x NVIDIA A100	640	96	1,152	0.4	$27.45

P3

The P3 family introduced NVIDIA V100 GPUs to AWS and played a major role in the early growth of deep learning in the cloud. While no longer cutting-edge, P3 instances remain suitable for many research and legacy training workloads.

Instance	GPUs	VRAM	vCPUs	System RAM	Network bandwidth (Tbps)	Price per GPU hour
p3.2xlarge	1 x NVIDIA V100	16	8	61	0.01	$3.06
p3.8xlarge	4 x NVIDIA V100	64	32	244	0.025	$12.24
p3.16xlarge	8 x NVIDIA V100	128	64	488	0.10	$24.48

P2

The P2 family is AWS's oldest GPU instance generation still available in some regions. Based on NVIDIA K80 GPUs, these instances are largely intended for legacy applications and older machine learning frameworks rather than modern AI training.

Instance	GPUs	VRAM	vCPUs	System RAM	Network bandwidth (Tbps)	Price per GPU hour
p2.xlarge	1 x NVIDIA K80	24 GDDR5	4	61	0.01	$0.90
p2.8xlarge	8 x NVIDIA K80	192 GDDR5	32	488	0.01	$7.20
p2.16xlarge	16 x NVIDIA K80	384 GDDR5	64	732	0.02	$14.40

G-Series: Graphics and Inference (G4, G6, G7)

The G family is AWS's more accessible GPU tier. These instances use power-efficient NVIDIA GPUs suited for real-time inference, media encoding, and virtual desktops. They cost significantly less than P instances and cover a wide range of configurations.

Unlike the P family, G instances target latency-sensitive single-node workloads rather than distributed training. The newer G6 and G7 generations have expanded AI inference capabilities considerably.

G7

Launched in January 2026, the G7e family is built around NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. AWS positions these instances for AI inference, agentic applications, multimodal models, and graphics workloads that need high memory capacity without moving to the more expensive P-series.

Instance	GPUs	VRAM (GB)	vCPUs	System RAM (GB)	Network bandwidth (Tbps)	Price per GPU hour
g7e.2xlarge	1 x NVIDIA RTX Pro 6000	96 GDDR7	8	64	0.05	$3.36
g7e.4xlarge	1 x NVIDIA RTX Pro 6000	96 GDDR7	16	128	0.05	$4.00
g7e.8xlarge	1 x NVIDIA RTX Pro 6000	96 GDDR7	32	256	0.10	$5.27
g7e.12xlarge	2 x NVIDIA RTX Pro 6000	192 GDDR7	48	512	0.40	$8.29
g7e.24xlarge	4 x NVIDIA RTX Pro 6000	384 GDDR7	96	1,024	0.80	$16.57
g7e.48xlarge	8 x NVIDIA RTX Pro 6000	768 GDDR7	192	2,048	1.60	$33.14

G6

The G6 family brings NVIDIA L4 GPUs to AWS and represents a significant upgrade for inference workloads. These instances strike a balance between performance and cost, making them popular for serving LLMs, computer vision models, and media processing pipelines.

Instance	GPUs	VRAM (GB)	vCPUs	System RAM (GB)	Network bandwidth (Tbps)	Price per GPU hour
g6.xlarge	1 x NVIDIA L4	24 GB	4	16	Up to 0.01	$0.80
g6.2xlarge	1 x NVIDIA L4	24 GB	8	32	Up to 0.01	$0.98
g6.4xlarge	1 x NVIDIA L4	24 GB	16	64	Up to 0.025	$1.32
g6.8xlarge	1 x NVIDIA L4	24 GB	32	128	0.025	$2.01
g6.12xlarge	4 x NVIDIA L4	96 GB	48	192	0.04	$4.60
g6.16xlarge	1 x NVIDIA L4	24 GB	64	256	0.025	$3.40
g6.24xlarge	4 x NVIDIA L4	96 GB	96	384	0.05	$6.68
g6.48xlarge	8 x NVIDIA L4	192 GB	192	768	0.10	$13.35

G5

The G5 family uses NVIDIA A10G GPUs and serves as a versatile middle ground between graphics workloads and AI inference. It is widely used for model serving, virtual workstations, rendering, and smaller-scale training jobs.

Instance	GPUs	VRAM (GB)	vCPUs	System RAM (GB)	Network bandwidth (Tbps)	Price per GPU hour
g5.xlarge	1 x NVIDIA A10G	24	4	16	Up to 0.01	$1.00
g5.2xlarge	1 x NVIDIA A10G	24	8	32	Up to 0.01	$1.21
g5.4xlarge	1 x NVIDIA A10G	24	16	64	Up to 0.025	$1.62
g5.8xlarge	1 x NVIDIA A10G	24	32	128	0.025	$2.45
g5.12xlarge	4 x NVIDIA A10G	96	48	192	0.04	$5.672
g5.16xlarge	1 x NVIDIA A10G	24	64	256	0.025	$4.10
g5.24xlarge	4 x NVIDIA A10G	96	96	384	0.05	$8.14
g5.48xlarge	8 x NVIDIA A10G	192	192	768	0.10	$16.29

G4

The G4dn family is one of AWS's longest-running and most affordable GPU offerings. Powered by NVIDIA T4 GPUs, it remains a popular choice for cost-sensitive inference workloads, video transcoding, and virtual desktop infrastructure.

Instance	GPUs	VRAM (GB)	vCPUs	System RAM (GB)	Network bandwidth (Tbps)	Price per GPU hour
g4dn.xlarge	1 x NVIDIA T4	16	4	16	Up to 0.025	$0.53
g4dn.2xlarge	1 x NVIDIA T4	16	8	32	Up to 0.025	$0.75
g4dn.4xlarge	1 x NVIDIA T4	16	16	64	Up to 0.025	$1.20
g4dn.8xlarge	1 x NVIDIA T4	16	32	128	0.05	$2.18
g4dn.16xlarge	1 x NVIDIA T4	16	64	256	0.05	$4.35
g4dn.12xlarge	4 x NVIDIA T4	64	48	192	0.05	$0.98
g4dn.metal	8 x NVIDIA T4	128	96	384	0.10	$0.98

EC2 Pricing Models: How to Pay Less for More

EC2 offers five ways to pay for compute, each suited to a different usage pattern. Choosing the right model (or combining them) can dramatically cut your GPU bill. For GPU instances, the differences can mean thousands of dollars per month.

On-Demand Instances: Flexible but Expensive

On-demand is the default: you pay a fixed hourly rate with no upfront cost and no commitment. Billing is per-second (60-second minimum), which makes it the best option for short or unpredictable workloads.

The downside is cost. On-demand GPU rates are high, especially for the P family. For consistent usage (training jobs running 8 or more hours per day), on-demand is the most expensive option available.

EC2 Spot Instances: Up to 90% Off for Fault-Tolerant Workloads

Spot instances let you use unused EC2 capacity at discounts of up to 90% vs on-demand rates. The catch: AWS can reclaim them with two minutes of notice, so they only suit workloads that can tolerate interruption.

For GPU training jobs, spot instances work well when your code supports checkpointing. A job that saves progress every 15–30 minutes can survive an interruption and resume without much lost work. Spot discounts for GPU instances typically land in the 60–90% range, though availability for high-demand types like P5 can be limited.

EC2 Reserved Instances: Up to 72% Off for Predictable Workloads

Reserved instances (RIs) let you commit to a specific instance type and region for one or three years in exchange for discounts up to 72% off on-demand rates. Payment options range from all upfront (maximum discount) to no upfront (smaller discount).

The catch is inflexibility. Standard RIs lock you into a specific family, region, and OS. If your needs change or AWS releases a better GPU generation, you're stuck with your reservation or must sell it on the Reserved Instance Marketplace. Convertible RIs let you swap instance types but offer lower discounts.

Savings Plans: A More Flexible Alternative to Reserved Instances

Compute Savings Plans commit you to a consistent hourly spend in dollars (e.g., $5/hr) rather than a specific instance type. They apply automatically across EC2, Fargate, and Lambda, covering any instance family, size, region, and OS.

For GPU workloads, Compute Savings Plans are generally preferable to RIs unless you need the capacity reservation guarantee that RIs provide. Discounts are slightly below the best RI rates, but the flexibility makes them easier to manage. As of June 2025, AWS extended Savings Plan coverage to P6-B200 Blackwell instances.

EC2 Free Tier: What's Included and What's Not

The AWS Free Tier covers 750 hours per month of t2.micro or t3.micro instances for the first 12 months of a new account. It does not include any GPU instances, as there's no free tier for P-series or G-series.

The free tier is useful for learning EC2 networking and storage. After 12 months, it expires entirely and all usage is billed at standard on-demand rates.

EC2 Storage: EBS vs Ephemeral Storage

Every EC2 instance needs storage, and your choice affects performance, cost, and data persistence. For GPU workloads, storage speeds can become a training bottleneck when loading large datasets.

EC2 Ephemeral Storage

Some EC2 GPU instances include local NVMe SSDs called instance store volumes. This storage is physically attached to the host, giving it very high throughput and low latency compared to network-attached alternatives.

The critical limitation: instance store data is lost when the instance stops, terminates, or fails. For ML training, it works well for temporary files, dataset caches, and intermediate checkpoints.

EBS Volumes for Persistent Storage

Amazon Elastic Block Store (EBS) provides persistent block storage that survives instance stops and terminations. EBS volumes are network-attached, so throughput is lower than local NVMe, but data persists independently of the instance lifecycle.

For most GPU workloads, gp3 volumes offer the best balance of throughput and cost. If you need faster dataset reads, you can provision additional IOPS on gp3 or switch to io2 volumes. Note that EBS is billed separately and idle volumes still get charged even without an attached instance.

EC2 Cost Optimization: How to Cut Your AWS Bill

GPU instances are expensive, and small inefficiencies add up fast. A P5 left running overnight when idle can cost hundreds of dollars. These strategies help keep GPU spend in check without sacrificing productivity.

Right-Sizing Your Instances with AWS Compute Optimizer

AWS Compute Optimizer analyzes CloudWatch utilization metrics and recommends better-matched instance types. For GPU workloads, it often flags instances with consistently low GPU utilization.

Compute Optimizer is free to enable and covers EC2, EBS, Lambda, and Auto Scaling groups. Running it frequently can enable easy savings, especially as workloads evolve and new GPU generations become available.

Using Spot Instances for GPU Training Workloads

Spot instances are the highest-impact cost lever for GPU training. Switching a P4d training run from on-demand to spot can cut costs by 60% or more, even accounting for occasional interruptions.

The key is checkpointing. Write model checkpoints to EBS or S3 at regular intervals so your job can resume after an interruption. PyTorch, TensorFlow, and JAX all have native checkpointing support. Pair spot instances with Auto Scaling groups using mixed instance types to improve capacity availability.

Combining Reserved Instances and Savings Plans

For continuously running GPU workloads, mixing Savings Plans with spot capacity is a practical strategy. Cover your steady-state usage with a Compute Savings Plan commitment, then use on-demand or spot for peaks and experiments.

A good starting point: review your last 90 days of EC2 spend in AWS Cost Explorer, find your minimum consistent GPU usage, and commit that baseline to a Savings Plan. Add spot on top for bursty training runs. Avoid over-committing to specific GPU families via RIs; AWS releases new generations regularly, and old commitments age quickly.

Auto Scaling: Only Pay for What You Actually Use

EC2 Auto Scaling automatically adjusts the number of running instances based on demand signals such as CPU utilization, custom CloudWatch metrics, or a schedule. For GPU inference workloads, it can scale your fleet down to zero overnight and back up before business hours.

Setting up Auto Scaling for GPU instances requires an AMI with NVIDIA drivers pre-installed, and your application needs to handle cold start latency (the time a new GPU instance takes to initialize and accept traffic). For batch training, scheduled scaling is simpler: just set a schedule to terminate instances once the job finishes.

Is There a Simpler Way to Access GPUs?

EC2 is powerful, but it comes with real operational overhead. Between IAM policies, VPC configuration, EBS setup, driver installation, and pricing model complexity.

For a full analysis of new players in the cloud GPU market, read Neoclouds: The New Cloud GPUs.