What are the key specifications of the Vera Rubin NVL72 rack-scale system?

The Vera Rubin NVL72 houses 72 Rubin GPUs and 36 Vera CPUs in a single liquid-cooled, fanless, tubeless, and cableless rack connected by NVLink 6. It delivers 3,600 PFLOPS of NVFP4 inference performance and contains 20.7 TB of HBM4 memory.

How does HBM4 memory improve performance in the Rubin GPU architecture?

Each Rubin GPU (R100) carries 288 GB of HBM4 memory with up to 22 TB/s of bandwidth, which is a 2.75x improvement over Blackwell's HBM3e memory bandwidth. This leap is achieved by doubling the bus width per stack.

Go back

Nvidia Rubin Architecture: Everything You Must Know (July 2026)

Q: What is the NVIDIA Rubin architecture?

NVIDIA's Rubin architecture is an AI factory platform combining GPUs, CPUs, networking, storage, and security silicon. It succeeds Blackwell and targets the core demands of agentic AI, including long-context inference, low latency at scale, and trillion-parameter mixture-of-experts (MoE) models.

Q: When will the NVIDIA Rubin architecture be available?

NVIDIA confirmed Rubin entered full production at CES 2026. Cloud availability is expected in H2 2026 through Hyperscalers (AWS, Google Cloud, Microsoft Azure) and select Neoclouds (CoreWeave, Lambda, Nebius), with broad availability ramping into 2027.

Q: What role does the NVIDIA Groq 3 LPU play in the Rubin platform?

Following a licensing agreement, the NVIDIA Groq 3 LPU is integrated into the Rubin platform as a decode-optimized engine for trillion-parameter and high-interactivity agentic models. LPUs handle trillion-parameter decode, while Rubin GPUs handle prefill and training.

Carl PetersonJuly 1, 20267 min read

NVIDIA's Rubin architecture is the most ambitious GPU platform the company has ever shipped. It is not just a new chip: it is an AI factory combining GPUs, CPUs, networking, storage, and security silicon.

First announced at Computex 2024, Rubin entered full production at CES 2026. This guide covers every confirmed product, spec, and technology behind it.

Everything We Know So Far

NVIDIA confirmed Rubin is in full production at CES 2026, ahead of the original H2 2026 target. Jensen Huang used the keynote to detail all six co-designed chips and announce H2 2026 partner availability.

In March, GTC 2026 finalized the system architecture details and cloud partner list.

Rubin succeeds Blackwell and targets the core demands of agentic AI:

Long-context inference and low latency at scale
Trillion-parameter mixture-of-experts (MoE) models

Compared to Blackwell, NVIDIA claims this new generation features:

5x rack-level inference performance,
10x lower inference token cost
4x fewer GPUs for MoE training

Cloud availability is expected in H2 2026 through Hyperscalers (AWS, Google Cloud, Microsoft Azure) and select Neoclouds (CoreWeave, Lambda, Nebius).

Read AI GPU rental market trends to understand how supply is shifting heading into H2 2026.

The NVIDIA Rubin Announced Product Line

The Rubin platform ships in several form factors, from single-GPU server nodes to full supercomputer configurations. Here is a breakdown of the three headline products.

Specification	Vera Rubin NVL72	Vera Rubin Superchip	Rubin GPU
Configuration	72 Rubin GPUs + 36 Vera CPUs	2 Rubin GPUs + 1 Vera CPU	1 Rubin GPU
Inference (NVFP4)	3,600 PFLOPS	100 PFLOPS	50 PFLOPS
Training (NVFP4)	2,520 PFLOPS	70 PFLOPS	35 PFLOPS
GPU Memory	20.7 TB HBM4	576 GB HBM4	288 GB HBM4
Memory Bandwidth	1,580 TB/s	44 TB/s	22 TB/s
NVLink Bandwidth	260 TB/s	7.2 TB/s	3.6 TB/s
NVLink-C2C Bandwidth	65 TB/s	1.8 TB/s	-
CPU Cores	3,168 Olympus cores	88 Olympus cores	-
CPU Memory	54 TB LPDDR5X	1.5 TB LPDDR5X	-
Cooling	Full liquid

Source: NVIDIA Vera Rubin NVL72 product page. All values preliminary and subject to change. Official pricing not announced.

NVIDIA Vera Rubin NVL72: The Rack-Scale AI Powerhouse

The Vera Rubin NVL72 is NVIDIA's flagship rack-scale system and the direct successor to the GB200 NVL72. It houses 72 Rubin GPUs and 36 Vera CPUs in a single liquid-cooled rack, connected by NVLink 6.

The chassis is fanless, tubeless, and cableless, with assembly time down from over 1.5 hours to around 5 minutes.

NVIDIA Vera Rubin NVL72 rack-scale system in a fanless tubeless cableless liquid-cooled server rack with dense GPU and CPU modules and visible cooling plumbing in a dim data center aisle with cool blue accent lighting, conveying futuristic high-performance infrastructure.

Vera Rubin Superchip: One Package to Run the AI Era

The Vera Rubin Superchip is the base compute unit of the platform. It pairs one Vera CPU and two Rubin GPUs in a single package. Applications can treat LPDDR5X and HBM4 as a unified memory pool, reducing data movement overhead.

The Rubin GPU uses TSMC's 3nm process with a dual-die design and 336 billion transistors (a 1.6x increase over Blackwell).

The Vera CPU adds 227 billion transistors and 88 Arm Olympus cores, with spatial multi-threading bringing the effective count to 176 threads.

NVIDIA R100: The GPU at the Heart of Rubin

The NVIDIA R100 is the official product name for the Rubin GPU. It is the inference and training accelerator at the center of every Vera Rubin system. Each R100 carries 288 GB of HBM4 with up to 22 TB/s of bandwidth, nearly triple Blackwell's 8 TB/s.

The R100 is optimized for FP4 and FP6 workloads, long-context inference, and multi-modal generative tasks. For teams running models above 70 billion parameters, it is a meaningful step change from Blackwell.

The NVIDIA Vera Rubin Platform: A Full-Stack AI Revolution

The Vera Rubin platform is a complete AI factory blueprint, not a single chip. NVIDIA describes it as a seven-chip, five-rack architecture where every component is co-designed to avoid the bottlenecks of off-the-shelf parts.

The seven chips cover the full lifecycle of agentic AI workloads:

Rubin GPU and Vera CPU for compute
NVLink 6 Switch ASIC for rack-scale scale-up networking
ConnectX-9 SuperNIC and Spectrum-6 Ethernet for scale-out connectivity
BlueField-4 DPU for storage and infrastructure security
Groq 3 LPU for high-throughput decode inference

A full Vera Rubin POD reaches 40 racks, 1,152 GPUs, and 60 exaflops.

NVIDIA Vera Rubin NVL72 rack-scale system featuring rows of fanless, tubeless, and cableless liquid-cooled server racks densely populated with GPU and CPU modules and visible blue liquid cooling tubing. The racks are arranged in a data center aisle with cool blue accent lighting, conveying a futuristic high-performance technical atmosphere that emphasizes cutting-edge AI infrastructure.

The Engineering Breakthroughs Inside the Rubin Architecture

NVIDIA Vera CPU: The Brain of the Rubin Platform

The Vera CPU replaces Grace from the Blackwell generation. It is built on TSMC's 3nm process with 227 billion transistors and 88 custom Arm Olympus cores. Spatial Multi-Threading doubles the effective thread count to 176, with each thread maintaining full single-core throughput.

According to NVIDIA claims, each Vera CPU has 2x gains in data processing and compression versus Grace. It is purpose-built for agentic AI: managing context memory, routing tokens, and keeping inference pipelines saturated.

NVIDIA Groq 3 LPU: Redefining Inference Speed

The NVIDIA Groq 3 LPU is the inference accelerator co-designed for the Vera Rubin NVL72. Following NVIDIA's licensing agreement with Groq, the LPU was integrated into the Rubin platform as the decode-optimized engine for trillion-parameter and high-interactivity agentic models.

The LPX rack features 256 LPUs with 128 GB of SRAM each, 40 PB/s of memory bandwidth, and 640 TB/s of scale-up bandwidth per rack.

Paired with the Vera Rubin NVL72, NVIDIA claims 35x inference performance per watt and 10x more revenue per trillion-parameter model versus Blackwell. The LPX rack ships liquid-cooled in H2 2026 and requires no CUDA code changes. LPUs handle trillion-parameter decode; Rubin GPUs handle prefill and training.

HBM4: The Memory Leap That Makes Rubin Possible

HBM4 is the most significant DRAM leap in the Rubin platform. Each R100 carries 288 GB of HBM4 at up to 22 TB/s, a 2.75x improvement over Blackwell's HBM3e at the same capacity. The gain comes from doubling the bus width per stack, running at 10.8 GT/s per pin.

At the NVL72 rack level, total HBM4 capacity reaches 20.7 TB with 1.6 PB/s of aggregate bandwidth. The NVL144 CPX variant delivers up to 100 TB and 1.7 PB/s per rack. This headroom lets Rubin serve large-scale models from single-GPU memory, with the KV-cache tiering that agentic applications require. HBM4 stacks are supplied primarily by SK Hynix and Samsung.

NVLink 6: The Fastest GPU Interconnect

NVLink 6 is the networking backbone of the Vera Rubin platform. It delivers 3.6 TB/s of scale-up bandwidth per GPU, double the 1.8 TB/s of NVLink 5 on Blackwell. At the NVL72 rack level, total fabric bandwidth reaches 260 TB/s.

The doubled bandwidth cuts gradient sync time for communication-bound training and reduces latency for tensor-parallel inference. NVLink 6 enables large GPU clusters to act as a single coherent compute unit, a key advantage for rack-scale agentic AI workloads. For frontier-scale multi-node training, it is the key differentiator for Rubin.

Final Thoughts on the NVIDIA Rubin Architecture

Rubin is not an iteration on Blackwell: the R100, Vera CPU, Groq 3 LPU, HBM4, and NVLink 6 form a fully designed system that changes the architecture entirely. Broad cloud availability is still ramping through H2 2026 and into 2027.

Thunder Compute offers on-demand H100 GPU instances you can start using today. Get your workloads running now and be ready to migrate the moment Rubin capacity opens up.