NVIDIA's Rubin architecture is the most ambitious GPU platform the company has ever shipped. It is not just a new chip: it is an AI factory combining GPUs, CPUs, networking, storage, and security silicon.
First announced at Computex 2024, Rubin entered full production at CES 2026. This guide covers every confirmed product, spec, and technology behind it.
Everything We Know So Far
NVIDIA confirmed Rubin is in full production at CES 2026, ahead of the original H2 2026 target. Jensen Huang used the keynote to detail all six co-designed chips and announce H2 2026 partner availability.
In March, GTC 2026 finalized the system architecture details and cloud partner list.
Rubin succeeds Blackwell and targets the core demands of agentic AI:
<ul><li>Long-context inference and low latency at scale</li><li>Trillion-parameter mixture-of-experts (MoE) models</li></ul>
Compared to Blackwell, NVIDIA claims this new generation features:
<ul><li>5x rack-level inference performance, </li><li>10x lower inference token cost </li><li>4x fewer GPUs for MoE training</li></ul>
Cloud availability is expected in H2 2026 through Hyperscalers (AWS, Google Cloud, Microsoft Azure) and select Neoclouds (CoreWeave, Lambda, Nebius).
Read AI GPU rental market trends to understand how supply is shifting heading into H2 2026.
The NVIDIA Rubin Announced Product Line
The Rubin platform ships in several form factors, from single-GPU server nodes to full supercomputer configurations. Here is a breakdown of the three headline products.
| Specification | Vera Rubin NVL72 | Vera Rubin Superchip | Rubin GPU |
|---|---|---|---|
| Configuration | 72 Rubin GPUs + 36 Vera CPUs | 2 Rubin GPUs + 1 Vera CPU | 1 Rubin GPU |
| Inference (NVFP4) | 3,600 PFLOPS | 100 PFLOPS | 50 PFLOPS |
| Training (NVFP4) | 2,520 PFLOPS | 70 PFLOPS | 35 PFLOPS |
| GPU Memory | 20.7 TB HBM4 | 576 GB HBM4 | 288 GB HBM4 |
| Memory Bandwidth | 1,580 TB/s | 44 TB/s | 22 TB/s |
| NVLink Bandwidth | 260 TB/s | 7.2 TB/s | 3.6 TB/s |
| NVLink-C2C Bandwidth | 65 TB/s | 1.8 TB/s | - |
| CPU Cores | 3,168 Olympus cores | 88 Olympus cores | - |
| CPU Memory | 54 TB LPDDR5X | 1.5 TB LPDDR5X | - |
| Cooling | Full liquid | ||
NVIDIA Vera Rubin NVL72: The Rack-Scale AI Powerhouse
The Vera Rubin NVL72 is NVIDIA's flagship rack-scale system and the direct successor to the GB200 NVL72. It houses 72 Rubin GPUs and 36 Vera CPUs in a single liquid-cooled rack, connected by NVLink 6.

Vera Rubin Superchip: One Package to Run the AI Era
The Vera Rubin Superchip is the base compute unit of the platform. It pairs one Vera CPU and two Rubin GPUs in a single package. Applications can treat LPDDR5X and HBM4 as a unified memory pool, reducing data movement overhead.
The Rubin GPU uses TSMC's 3nm process with a dual-die design and 336 billion transistors (a 1.6x increase over Blackwell).
The Vera CPU adds 227 billion transistors and 88 Arm Olympus cores, with spatial multi-threading bringing the effective count to 176 threads.
NVIDIA R100: The GPU at the Heart of Rubin
The NVIDIA R100 is the official product name for the Rubin GPU. It is the inference and training accelerator at the center of every Vera Rubin system. Each R100 carries 288 GB of HBM4 with up to 22 TB/s of bandwidth, nearly triple Blackwell's 8 TB/s.
The R100 is optimized for FP4 and FP6 workloads, long-context inference, and multi-modal generative tasks. For teams running models above 70 billion parameters, it is a meaningful step change from Blackwell.
The NVIDIA Vera Rubin Platform: A Full-Stack AI Revolution
The Vera Rubin platform is a complete AI factory blueprint, not a single chip. NVIDIA describes it as a seven-chip, five-rack architecture where every component is co-designed to avoid the bottlenecks of off-the-shelf parts.
The seven chips cover the full lifecycle of agentic AI workloads:
<ul><li><strong>Rubin GPU</strong> and <strong>Vera CPU</strong> for compute</li><li><strong>NVLink 6 Switch ASIC</strong> for rack-scale scale-up networking</li><li><strong>ConnectX-9 SuperNIC</strong> and <strong>Spectrum-6 Ethernet</strong> for scale-out connectivity</li><li><strong>BlueField-4 DPU</strong> for storage and infrastructure security</li><li><strong>Groq 3 LPU</strong> for high-throughput decode inference</li></ul>
A full Vera Rubin POD reaches 40 racks, 1,152 GPUs, and 60 exaflops.

The Engineering Breakthroughs Inside the Rubin Architecture
NVIDIA Vera CPU: The Brain of the Rubin Platform
The Vera CPU replaces Grace from the Blackwell generation. It is built on TSMC's 3nm process with 227 billion transistors and 88 custom Arm Olympus cores. Spatial Multi-Threading doubles the effective thread count to 176, with each thread maintaining full single-core throughput.
According to NVIDIA claims, each Vera CPU has 2x gains in data processing and compression versus Grace. It is purpose-built for agentic AI: managing context memory, routing tokens, and keeping inference pipelines saturated.
NVIDIA Groq 3 LPU: Redefining Inference Speed
The NVIDIA Groq 3 LPU is the inference accelerator co-designed for the Vera Rubin NVL72. Following NVIDIA's licensing agreement with Groq, the LPU was integrated into the Rubin platform as the decode-optimized engine for trillion-parameter and high-interactivity agentic models.
The LPX rack features 256 LPUs with 128 GB of SRAM each, 40 PB/s of memory bandwidth, and 640 TB/s of scale-up bandwidth per rack.
Paired with the Vera Rubin NVL72, NVIDIA claims 35x inference performance per watt and 10x more revenue per trillion-parameter model versus Blackwell. The LPX rack ships liquid-cooled in H2 2026 and requires no CUDA code changes. LPUs handle trillion-parameter decode; Rubin GPUs handle prefill and training.
HBM4: The Memory Leap That Makes Rubin Possible
HBM4 is the most significant DRAM leap in the Rubin platform. Each R100 carries 288 GB of HBM4 at up to 22 TB/s, a 2.75x improvement over Blackwell's HBM3e at the same capacity. The gain comes from doubling the bus width per stack, running at 10.8 GT/s per pin.
At the NVL72 rack level, total HBM4 capacity reaches 20.7 TB with 1.6 PB/s of aggregate bandwidth. The NVL144 CPX variant delivers up to 100 TB and 1.7 PB/s per rack. This headroom lets Rubin serve large-scale models from single-GPU memory, with the KV-cache tiering that agentic applications require. HBM4 stacks are supplied primarily by SK Hynix and Samsung.
NVLink 6: The Fastest GPU Interconnect
NVLink 6 is the networking backbone of the Vera Rubin platform. It delivers 3.6 TB/s of scale-up bandwidth per GPU, double the 1.8 TB/s of NVLink 5 on Blackwell. At the NVL72 rack level, total fabric bandwidth reaches 260 TB/s.
The doubled bandwidth cuts gradient sync time for communication-bound training and reduces latency for tensor-parallel inference. NVLink 6 enables large GPU clusters to act as a single coherent compute unit, a key advantage for rack-scale agentic AI workloads. For frontier-scale multi-node training, it is the key differentiator for Rubin.
Final Thoughts on the NVIDIA Rubin Architecture
Rubin is not an iteration on Blackwell: the R100, Vera CPU, Groq 3 LPU, HBM4, and NVLink 6 form a fully designed system that changes the architecture entirely. Broad cloud availability is still ramping through H2 2026 and into 2027.
Thunder Compute offers on-demand H100 GPU instances you can start using today. Get your workloads running now and be ready to migrate the moment Rubin capacity opens up.
