What is Thunder Compute?

Thunder Compute is a GPU-over-TCP virtualization system that lets GPUs detach from idle servers and transparently reconnect to workloads when needed, usually in double-digit milliseconds.

How does GPU-over-TCP work?

GPU-over-TCP places the GPU across a high-speed network instead of a PCIe slot. Thunder Compute's CUDA-layer virtualization translates CUDA calls into network messages while the workload continues to run unchanged.

Does Thunder Compute affect latency?

The initial GPU connection takes about 10-20 milliseconds. The larger possible impact is runtime overhead from network delays across many CUDA calls, which Thunder Compute optimizes heavily at the systems level.

Do Thunder Compute workloads share GPUs?

No. When a process uses a GPU, that GPU is dedicated to that instance with access to the full VRAM and compute of the card until the process exits or sits idle.

How much efficiency does Thunder Compute add?

Within Thunder Compute's cloud, GPU virtualization lets the same fleet serve about 1.8x more users than it could without virtualization, with higher gains possible for I/O-bound or long-running reserved workloads.

Go back

How Thunder Compute works (GPU-over-TCP)

Q: Is Thunder Compute secure?

Yes. When a job ends, Thunder Compute wipes GPU memory and resets the card before passing it to another workload, and sole tenancy provides strong isolation guarantees.

Carl PetersonJune 24, 20263 min read

1. Why make GPUs more efficient?

GPUs are expensive and they often sit idle while developers read logs or tweak hyper-parameters. With Thunder Compute, instead of your GPU sitting there doing nothing, it detaches from your server. When you need a GPU again, your workload transparently claims a GPU, on the order of double-digit milliseconds. This is different from a scheduler like slurm; everything happens behind the scenes, in real time, without waiting.

2. How does Thunder Compute work?

Network-attached: The GPU sits across a high-speed network instead of a PCIe slot. Each virtual machine communicates with its GPU over TCP via the data center's network fabric.
Invisible to the workload: This virtualization sits at the CUDA layer, below the inference, training, or other GPU workload. This means that each program stays the same while behind the scenes our virtualization layer translates CUDA calls into network messages.
Sole tenancy: When a process on an instance uses a GPU, that GPU is entirely dedicated to that instance. The instance has access to the full VRAM and compute of the card for the duration of the process. When the process exits (or sits idle), we can pass that GPU to another workload.

3. Does this affect latency?

Thunder Compute has a negligible effect on latency in the conventional sense. Establishing an initial connection with a GPU takes ~10-20 milliseconds (blinking is ~200 milliseconds), latency which is only incurred once on initial program startup. The main potential impact is not to latency but rather runtime - slight network delays add up across thousands of CUDA calls to make the overall program take longer to run. Fortunately, this is something we are able to heavily optimize at the systems level. By strategically tuning the way your program runs behind the scenes, we can prevent network latency from affecting your GPU computation. For common workloads this impact is negligible; for less common edge cases you can see a slowdown of ~2x from native. Even in the worst cases we find that the slowdown is dramatically outweighed by efficiency improvements across the cluster.

4. How much efficiency does this add?

Within our cloud we can serve ~1.8x more users on our GPU fleet than we would be able to without virtualization. This means we get nearly double the revenue from the same GPUs. In many cases the benefit is much larger, particularly for I/O-bound agentic workloads such as GPU sandboxes, or fleets with long-term reservations that leave instances running for months. Because our instances are entirely on demand, fewer sit idle, so 1.8x oversubscription is on the lower end of what we see in other GPU fleets.

5. Is Thunder Compute secure?

In short, yes, and as a group of systems nerds we care a lot about security. When a job ends, we wipe GPU memory and reset the card so no data leaks to the next user. Single tenancy provides strong guarantees here.

6. Learn more

If you're interested in learning more, or if you're interested in deploying Thunder Compute's virtualization to increase revenue in your fleet of GPUs, contact us!