Technical insights

Cloud GPU Spot Instance Availability and Interruption Rates (2026)

September 16, 2025
6 mins read

Spot instance availability can determine whether your training job finishes today or gets restarted five times. In 2026, supply has grown, but demand for H200 and B200 class GPUs has made availability more volatile for many teams.

Key takeaways

<ul>
<li>Spot instance availability is more volatile than pricing for top tier GPUs in 2026.</li>
<li>Interruption rates are a probability metric, not a guarantee, so design for restarts.</li>
<li>Checkpointed, batch-style jobs benefit most from spot discounts.</li>
<li>When deadlines matter, on-demand or reserved capacity is still the safer path.</li>
</ul>

Why Interruption Rates Matter

Spot instances offer better prices because the provider can reclaim them at any time. Interruption rates describe the probability that your VM will be taken back, so they are the real risk variable to manage.

Most clouds offer only a short warning window. Google Cloud Spot VMs provide a best-effort preemption notice of up to 30 seconds, while AWS Spot instances typically issue a two-minute interruption notice before termination.

Latest GPU Spot Instance Availability (February 2026)

[THUNDERTABLE:eyJoZWFkZXJzIjpbIkdQVSBmYW1pbHkiLCJJbnN0YW5jZSBUeXBlIiwiVHlwaWNhbCBpbnRlcnJ1cHRpb24gYmFuZCoiLCJOb3RlcyJdLCJyb3dzIjpbWyJBMTAwIiwicDRkLjI0eGxhcmdlIiwiNeKAkzEwJSIsIlN0YWJsZSBpbiBtb3N0IHJlZ2lvbnMiXSxbIkgxMDAiLCJwNS40OHhsYXJnZSIsIjE14oCTMjAlIiwiQ2FwYWNpdHkgc2NhcmNlOyBxdWV1ZXMgY29tbW9uIl0sWyJMNCIsImc2Zi54bGFyZ2UiLCI14oCTMTAlIiwiR29vZCBidWRnZXQgb3B0aW9uIl1dfQ==]

*Data pulled from AWS Spot Instance Advisor on February 24, 2026. Individual availability zones vary.

AWS also states that “95% of Spot instances run to completion” across all types, but high-end GPUs sit in the noisy 5%. nOps

Spot Instances vs. On-Demand: When to Opt for the Discount

Spot succeeds when your workflow is interruption tolerant. If you can checkpoint every 15 to 30 minutes, batch through a queue, and restart without human intervention, discounts can be worth the occasional restart.

<ul>
<li><strong>Long train with checkpoints</strong> – Restarting from the last save costs minutes, not hours.</li>
<li><strong>Stateless batch inference</strong> – Interruptions just re-queue the next batch.</li>
<li><strong>CI/CD test jobs</strong> – Failures are retried automatically.</li>
</ul>

Configure Spot Instances to Manage Interruptions

Configure automatic checkpointing every 15 minutes and use persistent Spot requests so AWS restarts your VM when capacity returns.

See the interruption guide for exact flags.

On-Demand vs. Spot Instances: When "Saving" Money Backfires

Hidden costs show up when interruptions become frequent. If an H200 spot instance is interrupted five times in a day, the engineering time spent babysitting the job can exceed the on-demand premium. This risk is amplified by rising guaranteed capacity prices.

Long Training Runs with Robust Checkpoints: Restarting from the last save point costs mere minutes of compute rather than hours of lost progress if your model saves state every 15–30 minutes.

<ul>
<li><strong>Stateless Batch Inference:</strong> Since each request is independent, an interruption simply causes the specific batch to be re-queued on the next available instance without impacting overall data integrity.</li>
<li><strong>CI/CD Automated Test Jobs:</strong> These workflows are designed to handle failures; spot interruptions are treated as a transient error and are retried automatically by the pipeline runner.</li>
</ul>

In January 2026, AWS raised EC2 Capacity Blocks for ML prices by about 15 percent, making guaranteed GPU windows more expensive but often necessary for deadline driven work.

A Middle Path: Maximizing Availability Through Idle-Time Reuse

Network-attached GPUs from Thunder Compute are loaned to your process, then returned to a pool in seconds when idle, without you having to think about it.

In practice, teams converting from Spot to Thunder report 40-60 % total savings without code changes and with jobs that never go dark. Use it when you want Spot-level pricing but can’t tolerate interruptions.

Website Home

Choosing the Right Availability Model

Use this decision matrix to choose a model that matches your risk tolerance and deadlines.

[THUNDERTABLE:eyJoZWFkZXJzIjpbIkZhY3RvciIsIkJlc3QgRml0IiwiV2h5IGl0IG1hdHRlcnMiXSwicm93cyI6W1siSm9iIGlzIHRpbWUtc2Vuc2l0aXZlIiwiT24tZGVtYW5kIG9yIGNhcGFjaXR5IGJsb2NrcyIsIk1pc3NlZCBkZWFkbGluZXMgY29zdCBtb3JlIHRoYW4gYSBwcmVtaXVtIl0sWyJTdXBwb3J0cyBjaGVja3BvaW50aW5nIiwiU3BvdCBvciBtYXJrZXRwbGFjZSIsIlJlc3RhcnRzIGFyZSBtYW5hZ2VhYmxlIGlmIHN0YXRlIGlzIHNhdmVkIl0sWyJNdWx0aS1ub2RlIGNsdXN0ZXIiLCJPbi1kZW1hbmQgb3IgcmVzZXJ2ZWQiLCJPbmUgc3BvdCBpbnRlcnJ1cHRpb24gY2FuIGtpbGwgdGhlIHdob2xlIGpvYiJdLFsiQnVkZ2V0IGlzIHRoZSB0b3AgY29uc3RyYWludCIsIlNwb3QiLCJIaWdoZXN0IGRpc2NvdW50IGlmIHlvdSBjYW4gdG9sZXJhdGUgY2h1cm4iXV19]

Key Takeaways for Your 2026 GPU Strategy

Availability is now a commodity you must hedge, not a nice-to-have. Treat interruption rates like a risk budget.

If you want a low cost, easy to use GPU cloud without the hyperscaler complexity, Thunder Compute makes it simple to spin up GPUs and keep costs predictable. It is built for indie developers, researchers, and startups that need fast access to reliable GPU capacity.

Get the world's
cheapest GPUs

Low prices, developer-first features, simple UX. Start building today.

Get started