Today in AI

Cloud GPU Spot Instance Availability and Interruption Rates (2026)

Last update:
April 1, 2026
6 mins read

Spot instance availability can determine whether your training job finishes today or gets restarted five times. In 2026, supply has grown, but demand for H200 and B200 class GPUs has made availability more volatile for many teams.

Key takeaways

<ul><li>Spot instance availability is more volatile than pricing for top tier GPUs in 2026.</li><li>Interruption rates are a probability metric, not a guarantee, so design for restarts.</li><li>Checkpointed, batch-style jobs benefit most from spot discounts.</li><li>When deadlines matter, on-demand or reserved capacity is still the safer path.</li></ul>

What is a spot instance?

A spot instance is a type of cloud virtual machine that uses excess compute capacity, offered at a significantly discounted price compared to standard on-demand instances.

Because these instances rely on spare capacity, they can be interrupted or terminated by the cloud provider at any time if the infrastructure is needed elsewhere.

Spot instances are typically used for flexible, fault-tolerant workloads that can handle interruptions in exchange for much lower cost.

Why Interruption Rates Matter

Spot instances offer better prices because the provider can reclaim them at any time. Interruption rates describe the probability that your VM will be taken back, so they are the real risk variable to manage.

Most clouds offer only a short warning window. Google Cloud Spot VMs provide a best-effort preemption notice of up to 30 seconds, while AWS Spot instances typically issue a two-minute interruption notice before termination.

Latest GPU Spot Instance Availability (April 2026)

[THUNDERTABLE:eyJoZWFkZXJzIjpbIkdQVSBmYW1pbHkiLCJJbnN0YW5jZSBUeXBlIiwiVHlwaWNhbCBpbnRlcnJ1cHRpb24gYmFuZCoiLCJOb3RlcyJdLCJyb3dzIjpbWyJBMTAwIiwicDRkLjI0eGxhcmdlIiwiNeKAkzEwJSIsIlN0YWJsZSBpbiBtb3N0IHJlZ2lvbnMiXSxbIkgxMDAiLCJwNS40OHhsYXJnZSIsIjE14oCTMjAlIiwiQ2FwYWNpdHkgc2NhcmNlOyBxdWV1ZXMgY29tbW9uIl0sWyJMNCIsImc2Zi54bGFyZ2UiLCI14oCTMTAlIiwiR29vZCBidWRnZXQgb3B0aW9uIl1dfQ==]

*Data pulled from AWS Spot Instance Advisor on February 24, 2026. Individual availability zones vary.

AWS also states that “95% of Spot instances run to completion” across all types, but high-end GPUs sit in the noisy 5%. nOps

Spot Instances vs. On-Demand: When to Opt for the Discount

Spot succeeds when your workflow is interruption tolerant. If you can checkpoint every 15 to 30 minutes, batch through a queue, and restart without human intervention, discounts can be worth the occasional restart.

<ul><li><strong>Long train with checkpoints</strong> – Restarting from the last save costs minutes, not hours.</li><li><strong>Stateless batch inference</strong> – Interruptions just re-queue the next batch.</li><li><strong>CI/CD test jobs</strong> – Failures are retried automatically.</li></ul>

Configure Spot Instances to Manage Interruptions

Configure automatic checkpointing every 15 minutes and use persistent Spot requests so AWS restarts your VM when capacity returns.

See the interruption guide for exact flags.

Spot vs On-Demand Instances

The key difference between spot instances and on-demand instances boils down to price vs reliability:

<ul><li><strong>Spot instances</strong> are heavily discounted because they use spare cloud capacity. However, they can be interrupted at any time, making them best suited for flexible, fault-tolerant workloads.</li><li><strong>On-demand instances</strong> provide guaranteed availability with no risk of interruption. You pay a higher, fixed rate, but in return get consistent performance and uptime.</li></ul>

[THUNDERTABLE:eyJoZWFkZXJzIjpbIkZlYXR1cmUiLCJTcG90IEluc3RhbmNlcyIsIk9uLURlbWFuZCBJbnN0YW5jZXMiXSwicm93cyI6W1siUHJpY2luZyIsIlVwIHRvIDcw4oCTOTAlIGNoZWFwZXIiLCJTdGFuZGFyZCwgZml4ZWQgcHJpY2luZyJdLFsiQXZhaWxhYmlsaXR5IiwiTm90IGd1YXJhbnRlZWQiLCJHdWFyYW50ZWVkIl0sWyJJbnRlcnJ1cHRpb25zIiwiQ2FuIGJlIHRlcm1pbmF0ZWQgYXQgYW55IHRpbWUiLCJObyBpbnRlcnJ1cHRpb25zIl0sWyJCZXN0IEZvciIsIkJhdGNoIGpvYnMsIE1MIHRyYWluaW5nLCByZW5kZXJpbmciLCJQcm9kdWN0aW9uIGFwcHMsIHJlYWwtdGltZSBzeXN0ZW1zIl0sWyJDb3N0IFByZWRpY3RhYmlsaXR5IiwiVmFyaWFibGUiLCJQcmVkaWN0YWJsZSJdXX0=]

For spot instances, hidden costs show up when interruptions become frequent. For example, if an H200 spot instance is interrupted five times in a day, the engineering time spent babysitting the job can exceed the on-demand premium. This risk is amplified by rising guaranteed capacity prices.

When Spot Instances Make Sense

Spot instances are ideal for workloads that can tolerate interruptions without significant impact. If your application can recover quickly or retry tasks automatically, you can take full advantage of their lower cost.

<ul><li><strong>Long Training Runs with Robust Checkpoints:</strong> Restarting from the last save point costs mere minutes of compute rather than hours of lost progress if your model saves state every 15–30 minutes.<li><strong>Stateless Batch Inference:</strong> Since each request is independent, an interruption simply causes the specific batch to be re-queued on the next available instance without impacting overall data integrity.</li><li><strong>CI/CD Automated Test Jobs:</strong> These workflows are designed to handle failures; spot interruptions are treated as a transient error and are retried automatically by the pipeline runner.</li></ul>

AWS On-Demand vs Spot Instances

AWS spot instances can be interrupted given a two-minute interruption notice to checkpoint or gracefully shut down workloads. These discounted EC2 instances use spare AWS capacity making them significantly cheaper

AWS also provides built-in tools to reduce the likelihood of interruptions and maintain availability when using spot instances:

<ul><li>Spot Fleet.</li><li>EC2 Auto Scaling. </li><li>Capacity-optimized allocation strategies.</li></ul>

These features allow workloads to automatically shift across instance types and availability zones, improving resilience while still benefiting from lower costs.

For full details, review AWS documentation.

A Middle Path: Maximizing Availability Through Idle-Time Reuse

Network-attached GPUs from Thunder Compute are loaned to your process, then returned to a pool in seconds when idle, without you having to think about it.

In practice, teams converting from Spot to Thunder report 40-60 % total savings without code changes and with jobs that never go dark. Use it when you want Spot-level pricing but can't tolerate interruptions.

Website Home

Choosing the Right Availability Model

Use this decision matrix to choose a model that matches your risk tolerance and deadlines.

[THUNDERTABLE:eyJoZWFkZXJzIjpbIkZhY3RvciIsIkJlc3QgRml0IiwiV2h5IGl0IG1hdHRlcnMiXSwicm93cyI6W1siSm9iIGlzIHRpbWUtc2Vuc2l0aXZlIiwiT24tZGVtYW5kIG9yIGNhcGFjaXR5IGJsb2NrcyIsIk1pc3NlZCBkZWFkbGluZXMgY29zdCBtb3JlIHRoYW4gYSBwcmVtaXVtIl0sWyJTdXBwb3J0cyBjaGVja3BvaW50aW5nIiwiU3BvdCBvciBtYXJrZXRwbGFjZSIsIlJlc3RhcnRzIGFyZSBtYW5hZ2VhYmxlIGlmIHN0YXRlIGlzIHNhdmVkIl0sWyJNdWx0aS1ub2RlIGNsdXN0ZXIiLCJPbi1kZW1hbmQgb3IgcmVzZXJ2ZWQiLCJPbmUgc3BvdCBpbnRlcnJ1cHRpb24gY2FuIGtpbGwgdGhlIHdob2xlIGpvYiJdLFsiQnVkZ2V0IGlzIHRoZSB0b3AgY29uc3RyYWludCIsIlNwb3QiLCJIaWdoZXN0IGRpc2NvdW50IGlmIHlvdSBjYW4gdG9sZXJhdGUgY2h1cm4iXV19]

Key Takeaways for Your 2026 GPU Strategy

Availability is now a commodity you must hedge, not a nice-to-have. Treat interruption rates like a risk budget.

If you want a low cost, easy to use GPU cloud without the hyperscaler complexity, Thunder Compute makes it simple to spin up GPUs and keep costs predictable. It is built for indie developers, researchers, and startups that need fast access to reliable GPU capacity.

Get the world's
cheapest GPUs

Low prices, developer-first features, simple UX. Start building today.

Get started