Virtualization is a concept in computer science for creating virtual representations of physical hardware. While virtualization is commonly associated with Virtual Machines (VMs), it extends to other domains, including GPUs. GPU virtualization is essential for efficient resource sharing in high-performance computing, AI, and machine learning. However, it’s often misunderstood, especially when applied to GPUs, where the term can have multiple meanings.
GPU virtualization currently exists in three main forms:
The first two operate within a single physical server and are widely used today. Thunder Compute is pioneering the third approach, which operates across multiple servers or 'nodes'."
Divides a physical GPU into multiple virtual GPUs. This allows several virtual machines (VMs) to simultaneously use portions of the same GPU, improving resource utilization in scenarios where VMs don't need the full power of a GPU.
Assigns an entire physical GPU to a single VM. While this doesn't split the GPU, it's considered virtualization because it allows a VM to directly access the GPU, providing near-native performance for applications that require the full power of a GPU.
The third approach, network-based GPU pooling, is a newer concept that requires deeper explanation.
At its core, Thunder Compute is a network-based GPU virtualization solution. This works by extending physical PCIe connections with virtual connections over a network.
In practice, this means that any computer can access any GPU across a network. Traditionally, adding a GPU to a server requires physically connecting it to the motherboard. With Thunder Compute, a virtual GPU can be "plugged in" via software, behaving just like a physically connected GPU.
Thunder Compute's solution acts as a bridge between the application and the GPU. It replaces the standard GPU software interface (like NVIDIA CUDA) with a network-aware version. This allows applications to interact with GPUs on remote servers as if they were locally attached.
The end result is that a computer without a physical GPU can behave exactly as if it has a GPU, without any hardware changes. This creates a flexible, distributed GPU resource pool that can be dynamically allocated and shared across the network.
Traditional GPU virtualization is limited by physical hardware constraints, typically supporting a maximum of 8 GPUs per server. Expanding GPU capacity requires vertical scaling, which involves upgrading individual servers. However, this method often leads to inefficient resource utilization as VMs tend to reserve entire GPUs.
Thunder Compute's network-distributed approach overcomes these limitations by enabling GPUs to be accessed across multiple servers (also called 'nodes') in a data center. This creates a data center-wide pool of GPU resources, rather than limiting each server to its own physically attached GPUs.
This ability to expand GPU resources by adding more servers (known as horizontal scalability) allows for flexible, on-demand allocation of GPU power. It dramatically increases efficiency by ensuring GPUs are used to their full capacity across the entire data center.
While Thunder Compute's approach is unique, it's helpful to compare it to existing solutions:
As with other virtualization technologies, network-based GPU virtualization faces performance challenges but continues to improve. Thunder Compute's early tests showed AI inference tasks running 100 times slower than on attached hardware. Within a month, performance improved to ~2 times slower for most AI workloads.
This rapid progress points to a future where network-virtualized GPUs will match the performance of physically attached GPUs. As the technology matures, applications will extend beyond data centers to slower networks, including connections between data centers and even home networks. We envision a future where developers can access vast GPU resources from their laptops over standard WiFi connections.
The advantages of network-based GPU virtualization—flexibility, efficiency, and scalability—position it as the likely future standard for GPU management in data centers and clouds. Try Thunder Compute to experience this technology firsthand.