Cloud GPU Spot Instances: Availability, Interruption Rates, and When to Use Them (2026)

Carl PetersonJuly 3, 202613 min read

GPU spot instances offer discounts of 60-90% compared to on-demand rates, but in 2026 the math changed. AWS cut H100 on-demand prices 44% in June 2025, compressing the absolute savings advantage of spot. Meanwhile, neocloud on-demand pricing has dropped to the point where providers like Thunder Compute offer A100 access at rates below typical hyperscaler spot pricing, with no interruption risk.

What follows covers GPU spot instance interruption rates across AWS, GCP, Azure, and the neocloud providers most relevant to ML teams, when spot still makes sense, and when a lower-cost on-demand provider is the better call.

Key takeaways:

Spot interruption rates vary widely by GPU type: H100 spot on AWS interrupts at under 5%, while A100 spot runs 15-20%.
GCP gives only 30 seconds of preemption notice; AWS gives 2 minutes. Design shutdown scripts accordingly.
Neocloud on-demand now undercuts hyperscaler spot on A100-class hardware, removing a major reason to accept interruption risk.
Checkpointed batch jobs still benefit from spot. Real-time inference and non-checkpointed training jobs do not.

What Is a Spot Instance?

A spot instance is a virtual machine that uses a cloud provider's excess capacity, offered at a significant discount compared to standard on-demand pricing. Because these instances draw on spare resources, the provider can reclaim them at any time if that capacity is needed elsewhere.

Spot instances go by different names across providers: AWS calls them Spot Instances, Google Cloud calls them Spot VMs (replacing the older Preemptible VM name), and Azure calls them Spot Virtual Machines. The mechanics are identical: lower price, no availability guarantee, short warning before termination.

Why Interruption Rates Matter for GPU Workloads

Interruption rates are the primary risk variable when choosing spot over on-demand. A job interrupted five times in a day costs more in engineering time than the spot discount saves, particularly when training runs lack checkpointing.

GPU workloads carry more exposure than general-purpose spot workloads. Model weights are large, checkpoint writes are expensive, and a preempted instance mid-epoch with no recent save loses all progress since the last checkpoint. Warning windows are short: GCP gives 30 seconds, AWS gives 2 minutes.

Spot Instance Interruption Rates by Provider and GPU (2026)

Interruption rates reflect how often spot capacity was reclaimed over the trailing 30 days. AWS publishes this data directly through the Spot Instance Advisor. GCP and Azure publish eviction rate guidance through their respective portals.

Provider	GPU	Instance	Interruption rate	Warning window	Typical discount vs on-demand
AWS	H100	p5.48xlarge	<5%	2 min	60–80%
AWS	A100	p4d.24xlarge	15–20%	2 min	60–70%
AWS	V100	p3.16xlarge	>20%	2 min	70–91%
GCP	A100	a2-highgpu	~2.3% per hour¹	30 sec	60–80%
GCP	H100	a3-highgpu	~4.1% per hour¹	30 sec	60–80%
Azure	H100	ND H100 v5	Variable (see Eviction Advisor)	30 sec	60–90%
Azure	A100	NC A100 v4	Variable (see Eviction Advisor)	30 sec	60–90%
¹ GCP interruption rates per Introl analysis of 10M spot instance hours. Individual availability zones vary. Rates fluctuate with supply and demand.

AWS GPU Spot Instance Availability

AWS provides the most transparency into spot interruption risk through its Spot Instance Advisor, which publishes trailing 30-day interruption frequency by instance type and region. H100 spot on p5.48xlarge instances runs at under 5% interruption, making it relatively stable for a spot tier. A100 spot on p4d.24xlarge is more volatile at 15-20%, meaning teams should expect at least one interruption per day on longer runs.

AWS gives a 2-minute warning before reclaiming an instance, which is enough time for a well-configured shutdown script to write a checkpoint to S3 and terminate cleanly. Persistent Spot Requests automatically relaunch the instance when capacity becomes available again, reducing the need for manual intervention.

GCP Spot VMs and Preemptible GPUs

GCP Spot VMs replaced Preemptible VMs as the recommended option for interruptible workloads. The key difference: Preemptible VMs are terminated after 24 hours regardless of capacity conditions, making them unsuitable for long training runs. Spot VMs have no fixed time limit and are only reclaimed when GCP needs the capacity back.

GCP gives 30 seconds of notice before preemption, significantly less than AWS. Shutdown scripts must be fast: write the checkpoint, flush to Cloud Storage, and exit within 30 seconds or the instance is terminated mid-write. GCP does not automatically restart Spot VMs after preemption; use a Managed Instance Group to recreate them when capacity returns.

Azure Spot GPU VMs

Azure Spot VMs offer 60-90% discounts on GPU instances and provide an Eviction Rate Advisor in the Azure portal for checking historical eviction frequency by instance type and region. Unlike AWS, Azure also evicts instances when the spot price exceeds the maximum price you set at launch, in addition to capacity-driven evictions. Set your maximum price high enough to avoid price-based eviction while still benefiting from the spot discount.

Azure's 30-second warning window is consistent with GCP. Some teams find Azure spot capacity more stable in under-subscribed regions like Canada Central for A100 workloads, though this varies by time of day and demand conditions.

The 2026 Spot Economics Shift

Hyperscaler spot's cost advantage over on-demand neoclouds has largely closed. AWS reduced H100 on-demand pricing by 44% in June 2025, shrinking the absolute dollar savings from spot. Neocloud on-demand pricing has fallen to where it directly competes with or beats hyperscaler spot, without the interruption risk.

For A100 workloads, the math now favors neocloud on-demand over hyperscaler spot for most teams. Thunder Compute A100 80GB on-demand starts at $1.09/hr with minute-level billing and no egress fees. AWS A100 40GB spot on p4d.24xlarge typically runs $1.20 to $1.65/hr, carries a 15-20% interruption rate, and adds egress charges when moving checkpoints out. For A100 80GB specifically (p4de.24xlarge), on-demand rates are higher and spot pricing follows proportionally.

Option	A100 80GB rate	Interruption risk	Egress fees	Best for
AWS A100 40GB spot (p4d.24xlarge)¹	~$1.20–$1.65/hr	15–20%	$0.09/GB	Teams already in AWS ecosystem with checkpoint infrastructure
AWS A100 40GB on-demand (p4d.24xlarge)	~$4.10/hr (instance ÷ 8 GPUs)	None	$0.09/GB	Mission-critical or deadline-bound jobs
Thunder Compute A100 on-demand	$1.09/hr	None	$0	Cost-sensitive teams who cannot tolerate interruptions
RunPod Community Cloud (spot)	~$1.19/hr	<5 min notice²	$0	Budget-first batch workloads with checkpointing
Vast.ai (interruptible)	From ~$0.67/hr³	Variable by host	$0–$0.06/GB	Maximum cost reduction, fault-tolerant experimentation
Hyperbolic	N/A	None	Varies	H100 access at $1.49/hr on-demand
On-demand and spot rates as of July 2026. Spot prices fluctuate. ¹p4d.24xlarge provides 8x A100 40GB GPUs; p4de.24xlarge provides 8x A100 80GB. ² RunPod Community Cloud gives approximately 30 seconds to 5 minutes of SIGTERM warning. ³ Vast.ai interruptible pricing varies by host; reliability and bandwidth also vary.

Neocloud Spot Options for GPU Workloads

Three platforms offer genuine spot or interruptible GPU pricing worth considering for ML workloads.

RunPod Community Cloud functions as RunPod's spot tier. A100 80GB instances run around $1.19/hr, compared to $1.49/hr on Secure Cloud. The discount is smaller than hyperscaler spot percentages, but the absolute price is competitive and no egress fees apply. RunPod's job queue API can automatically requeue jobs on interruption. Community Cloud instances run on third-party hosts, so hardware consistency and network throughput vary more than on Secure Cloud.

Vast.ai operates a decentralized marketplace where individual hosts set interruptible pricing, often 50% or more below their on-demand rates. H100 interruptible instances have reached as low as $0.34/hr in some marketplace listings, though host reliability, bandwidth, and uptime vary significantly. Vast.ai suits teams optimizing for maximum cost reduction on fault-tolerant experimentation, not for production workloads requiring consistent performance.

Hyperbolic offers H100 access at $1.49/hr on-demand, which is competitive with many hyperscaler spot prices in absolute terms without the interruption risk. For teams on the approved provider list who want H100 economics without spot complexity, Hyperbolic is worth comparing against AWS spot.

A Middle Path: On-Demand Neoclouds

Neocloud on-demand can sit below hyperscaler spot , with no interruption to manage. Thunder Compute GPU instances have cost-effective, per minute billing; you pay only for active use without managing spot lifecycle.

Teams migrating from hyperscaler spot to Thunder Compute on-demand typically report 40-60% total cost savings, with no changes to training code and no checkpoint-on-interrupt infrastructure. See current A100 and H100 pricing and availability →

Learn how neoclouds consistently undercut hyperscaler spot pricing on the same hardware.

AWS Capacity Blocks: A Third Option

Between spot and on-demand, AWS offers Capacity Blocks for ML: reserved time windows for GPU cluster access, priced between on-demand and reserved instance rates. Capacity Blocks guarantee access to a specific GPU cluster for a defined window, typically ranging from a few hours to several weeks, which makes them useful for deadline-bound training runs where spot interruptions are unacceptable but full on-demand pricing is hard to justify.

Capacity Blocks are most relevant for multi-node H100 cluster jobs that cannot tolerate any interruption but also cannot commit to a 1-year reserved instance. For single-GPU or small-cluster workloads, neocloud on-demand typically delivers better economics without the reservation complexity.

Spot vs On-Demand: GPU Workload Fit

Spot makes sense when your workload can tolerate interruption and restarting from a checkpoint costs less than the discount saves. Eight common GPU workloads mapped below:

Workload	Spot-suitable?	Reason
LLM training with checkpointing (>2 hours)	Yes	Savings compound with job length; checkpoint resume limits lost work to minutes
LoRA / QLoRA fine-tuning	Yes	Short enough to checkpoint frequently; savings are material at A100 or H100 rates
Hyperparameter search	Yes	Parallel jobs; individual failures are cheap and frameworks like Ray Tune handle retries natively
Batch inference / embedding generation	Yes	Stateless; the next batch restarts automatically on a new instance
Interactive development / notebooks	No	Interruption loses session state and flow; use on-demand for iterative work
Real-time inference API	No	Availability requirement is incompatible with spot; downtime directly impacts users
Jobs under 30 minutes	No	Setup overhead and interruption risk erode savings; on-demand is simpler and comparable in cost
Training without checkpointing	No	Any interruption requires a full restart from zero; the engineering cost exceeds the spot discount

Configuring Spot Instances to Manage Interruptions

Checkpoint frequency is the primary lever for managing spot risk. Writing every 15 to 30 minutes limits lost progress to at most half an hour of compute, and PyTorch Lightning and Ray Train both provide built-in checkpoint-resume that handles this without custom logic.

On AWS, enable persistent Spot Requests so the instance relaunches automatically when capacity returns. On GCP, use a Managed Instance Group to recreate preempted Spot VMs. On Azure, configure VM Scale Sets with an on-demand fallback for critical workloads.

Aggressive checkpointing on multi-GPU workloads can introduce its own bottlenecks. Read the multi-GPU training guide before tuning checkpoint frequency.

Last Thoughts on Cloud GPU Spot Instances

Spot instances remain the right choice for fault-tolerant, checkpointed batch workloads where savings compound over long training runs.

The landscape is different though: hyperscaler on-demand prices have dropped, neocloud on-demand often undercuts hyperscaler spot in absolute terms, and interruption handling carries real engineering cost.

Treat interruption rates as a risk budget. If your workload tolerates interruption and checkpointing is solid, spot delivers genuine savings. If not, neocloud on-demand is the correct answer.

Learn how the GPU cloud pricing model works across all tiers in our GPU-as-a-Service guide.

What is spot instance availability?

Spot instance availability reflects how often spare capacity is actually usable for your GPU type and region. When availability is low, interruption rates rise and your VM can be reclaimed more frequently.

What is a cloud GPU spot instance?

A GPU spot instance is a virtual machine that uses a cloud provider's spare capacity, offered at 50-90% below on-demand rates. The tradeoff is that the provider can reclaim the instance at any time, typically with 30 seconds to 2 minutes of warning.

What is the interruption rate for GPU spot instances?

Interruption rates vary by provider and GPU type. AWS H100 spot instances (p5.48xlarge) interrupt at under 5%, while A100 instances (p4d.24xlarge) run 15-20%. GCP A100 interrupts at roughly 2.3% and H100 at 4.1% per hour. High-demand GPUs like H100 and H200 carry higher risk than older generations.

When should I use GPU spot instances for AI training?

Spot is a strong fit for training runs with checkpointing, LoRA fine-tuning, hyperparameter search, and batch inference jobs. If your workload can save state every 15-30 minutes and tolerate restarts, the discount typically outweighs the risk.

What workloads should avoid GPU spot instances?

Real-time inference APIs, interactive development sessions, and any job without checkpointing implemented should use on-demand. An interruption that restarts production inference or loses hours of non-checkpointed training work erases the cost savings.

Are neocloud on-demand GPUs cheaper than hyperscaler spot instances?

Yes, in many cases. Thunder Compute A100 80GB on-demand at $1.09/hr undercuts AWS A100 spot pricing, which typically runs $1.60-$2.00/hr. For A100-class workloads, neocloud on-demand now offers a better combination of price and reliability than hyperscaler spot.

What is the difference between spot and on-demand GPU instances?

Spot instances use spare capacity at steep discounts but can be reclaimed at any time with short notice. On-demand instances cost more but guarantee availability and uninterrupted operation for the life of the job.

How often should ML jobs checkpoint on spot instances?

Every 15 to 30 minutes is the standard guideline. PyTorch Lightning and Ray Train both support checkpoint-resume natively, making this straightforward to implement without custom logic.

What is a Capacity Block?

A Capacity Block is a reserved time window for GPU access, typically priced between on-demand and reserved instances. AWS offers Capacity Blocks for ML to guarantee cluster access for defined periods, useful for deadline-sensitive training runs that cannot tolerate spot interruptions.

What is the difference between GCP Spot VMs and Preemptible VMs?

Preemptible VMs were forcibly terminated after 24 hours regardless of demand. Spot VMs replaced them and have no fixed time limit, only being reclaimed when GCP needs capacity back. Use --provisioning-model=SPOT for new workloads; the --preemptible flag is deprecated.

Which neoclouds offer spot or interruptible GPU pricing?

RunPod Community Cloud offers A100 instances at roughly $1.19/hr with no egress fees. Vast.ai's interruptible tier reaches as low as $0.34/hr for H100s through a marketplace model, with variable host reliability. Hyperbolic offers H100 on-demand at $1.49/hr without spot complexity.