Thunder Compute logo

What are Tensor Cores?

Specialized GPU cores for matrix multiply-accumulate operations

Tensor Cores are specialized processing units on NVIDIA GPUs (Volta and later) that accelerate matrix multiply-accumulate operations — the core computation in deep learning.

Example

import torch


# Tensor Cores are used automatically with mixed precision
with torch.autocast(device_type="cuda", dtype=torch.float16):
    x = torch.randn(512, 512, device="cuda")
    w = torch.randn(512, 512, device="cuda")
    y = x @ w  # matrix multiply — runs on Tensor Cores

An Overview of Tensor Cores

  • Perform 4x4 matrix multiplications in a single clock cycle
  • Require specific data types: FP16, BF16, TF32, INT8
  • Dramatically accelerate training and inference when used with mixed precision

Tensor Core Generations

NVIDIA has iterated on Tensor Core technology across several architectural generations to provide exponential leaps in deep learning performance.

  • Blackwell (5th Gen): Featured in the RTX PRO 6000, delivering up to 4,000 AI TOPS and introducing support for FP4 precision to maximize throughput for massive LLMs.
  • Hopper (4th Gen): Introduced the Transformer Engine in the H100, specifically designed to dynamically scale precision for Transformer-based models using FP8.
  • Ada Lovelace (4th Gen): Found in the RTX 6000 and RTX 4090, these cores include an enhanced 8-bit floating point (FP8) engine to double throughput over the previous generation.
  • Ampere (3rd Gen): Found in the A100, RTX A6000, and RTX 3090, this generation introduced TF32 (Tensor Float 32), providing speedups on FP32 workloads without requiring code changes.

NVIDIA GPU Tensor Core Comparison

Graphics CardArchitectureTensor CoresAI TOPSCUDA CoresFP32 TFLOPS
RTX PRO 6000NVIDIA Blackwell5th Gen4,00024,064125.0
RTX 6000NVIDIA Ada Lovelace4th Gen1,45718,17691.1
RTX A6000NVIDIA Ampere3rd Gen309.710,75238.7
A100 80GBNVIDIA Ampere3rd Gen6246,91219.5
H100 PCIeNVIDIA Hopper4th Gen1,51314,59251.2
H200 NVLNVIDIA Hopper4th Gen3,34116,89660.3
Recommended article

Read about how architecture choices translate to real-world training.

Read more

See Also