The concept of virtualization originated with CPUs but has also been applied to storage, memory, and, most recently, GPUs. Generally, virtualization refers to software that creates an abstraction of computer hardware, enabling programming tasks to be executed without directly relying on a specific computer. The goal is for virtualized hardware to behave exactly like physical hardware, with the added benefit of increased flexibility; however, in practice, the drawback of virtualization is limited performance. In particular, the performance of virtualization systems is especially limited early in the lifecycle of a given hardware category. For instance, the earliest CPU virtualization systems were several orders of magnitude slower than their underlying hardware, but today their performance is nearly identical. As virtualization systems continue to improve, they increasingly replace direct reliance on physical hardware.
Despite performance limitations, virtualized hardware is significantly more flexible than the underlying hardware on which it runs. In a virtualized system, computer programs are not constrained by physical hardware limitations, allowing them to run on any available capacity within a data center. This flexibility allows greater utilization of limited hardware—often by 5-10x. As a cloud platform, this means that with a software change, you can instantly serve 5-10x more customers without buying more costly hardware. In a CapEx-heavy data center, this translates to tens of millions of dollars in added profit. Scaled across every cloud platform, which includes some of the biggest businesses in the world, the potential impact is enormous.
IBM created the first virtualization technology, CP-40, which reached production use in 1967. CP-40 allowed multiple users to share a single mainframe computer. At the time, a whole mainframe computer was prohibitively expensive for most users. Virtualization allowed up to 14 customers to share each computer, dramatically improving accessibility.
Over the next 30 years, the decline in the cost of consumer x86 hardware reduced the need for virtualization. However, in the 1990s, VMWare revived the concept of virtualization. VMWare noticed that data center hardware was utilized less than 15% of the time, and by developing virtualization technology, the company could improve utilization to 80% or more, making the same hardware available to more users. Additionally, to reduce costs at the time, data centers typically used Intel’s low-end chips. However, VMWare’s virtualization technology allowed them to share premium hardware across multiple users, making the experience for developers faster and cost-effective. Although this advancement was exciting for data centers and users at the time, virtualization initially meant increased program execution time. It took several years until the impact of CPU virtualization on performance became negligible, and today, nearly every cloud platform uses virtualized CPUs (vCPUs) rather than physical CPUs.
After recognizing the benefits of CPU virtualization, companies began to explore virtualization beyond the CPU, which led to the concept of a fully virtualized data center. Storage was a critical next step in this evolution. Amazon significantly advanced virtual storage with their Elastic Block Store (EBS) offering on AWS. EBS provided users with scalable, on-demand block storage that could be shared across a pool of physical storage resources. Notably, this technology offers performance that closely rivals directly attached storage, making it a key component of virtualized environments.
RAM virtualization came next. The most challenging parts of a computer to virtualize have been those with the lowest latency requirements, specifically RAM and GPUs. Several companies have experimented with RAM virtualization, achieving results in small-scale implementations like VMware's vNUMA. However, these implementations often come with performance drawbacks, such as increased latency and reduced memory bandwidth, making them less suitable for high-performance applications.
Early research into GPU virtualization began with rCUDA in 2013. More recently, Thunder Compute has developed one of the first practical, publicly available GPU virtualization technologies. GPU virtualization faces many of the same performance challenges as early CPU virtualization. For example, Thunder Compute’s initial performance was nearly 100 times slower than that of a physical GPU. However, like other virtualization technologies, performance has steadily improved. As of today, Thunder Compute's performance is approximately 2 times slower than a physical GPU, which improves by the day.