AI Workflows

How to Run Llama 4 with Ollama (June 2026)

How to run Llama 4 with Ollama (June 2026)
Last update:
June 17, 2026
11 mins read

Llama 4 is the most capable open-weight model family shipped by Meta. Released on April 5, 2025, it introduced native multimodal understanding, a Mixture-of-Experts (MoE) architecture, and a 10 million token context window.

This guide covers the full model lineup, benchmarks, and how to run Llama 4.

Official Llama 4 download on Meta website

How does Llama 4 improve over Llama 3?

Llama 4 is Meta's fourth-generation family of open-weight LLMs. Unlike the text-only Llama 3, Llama 4 is natively multimodal: processing text, images, and video. The family was trained on over 30 trillion tokens across 200 languages, doubling Llama 3's pre-training mix.

Feature Llama 3 Llama 4
Architecture Dense Transformer Mixture-of-Experts (MoE)
Modality Text-only Natively Multimodal (Text, Image, Video)
Context Window 128K tokens 1M-10M tokens
Training Tokens ~15T (estimated) >30T
Languages Multilingual support >100 languages
Knowledge Cutoff December 2023 March 2025
Refusal Rate Standard <2%
Training Hardware ~16,000 H100s 32,000 H100s
Sources: Official release post, What’s New in Llama 4

The architectural shift matters as much as the scale. Llama 3 was a dense text-only transformer with a 128K context window. Llama 4 replaces that with Mixture-of-Experts (MoE) layers, iRoPE positional embeddings, and FP8 precision training across 32,000 H100 GPUs, double the cluster used for Llama 3.

Llama 4 uses a Mixture-of-Experts design where only a subset of parameters activates per token during inference. Models can carry far more total knowledge than a dense architecture would allow.

Llama 4 also received 10x more multilingual training tokens than Llama 3, covering over 100 languages. The knowledge cutoff advanced to March 2025, and refusal rates fell to under 2%.

Llama 4 Release Date and Context Window Size

Llama 4 Scout and Llama 4 Maverick launched publicly on April 5, 2025. They are available via the official Llama website and Hugging Face under the Llama 4 Community License, allowing free use for products with less than 700M monthly active users.

The Llama 4 Model Lineup

Llama 4 consists of three models: Scout, Maverick, and Behemoth. Scout and Maverick are publicly available. Behemoth is unreleased but served as the teacher model for the other two through a process called codistillation.

Feature Llama 4 Scout Llama 4 Maverick Llama 4 Behemoth
Total Parameters 109 Billion 400 Billion ~2 Trillion
Active Parameters 17 Billion 17 Billion 288 Billion
Expert Count 16 Experts 128 Experts 16 Experts
Max Context Window 10 Million tokens 1 Million tokens Not Publicly Specified
Primary Use Case Long-context retrieval & document analysis General reasoning, coding & assistant tasks Teacher/Distillation model & Advanced STEM
Deployment Status Generally Available Generally Available Research Preview (Not Publicly Released)

Llama 4 Scout: Parameters and Hardware Requirements

Llama 4 Scout is the accessible member of the family. It has 17 billion active parameters across 16 experts, with 109 billion total parameters.

The 10 million token context window suits repository-level code analysis, multi-document summarization, and other long-context tasks. However, the usable context size depends on available VRAM; 32K to 128K is a realistic working window on consumer hardware.

Scout is the most practical model for self-hosting. At Q4_K_M quantization, it needs roughly 20–24 GB of VRAM, within reach of a single RTX 4090 or an Apple Silicon Mac with 32 GB.

Llama 4 Maverick: Parameter Size, Benchmarks, and Release Date

Llama 4 Maverick launched on April 5, 2025 alongside Scout. It shares Scout's 17 billion active parameters per token but routes through 128 experts instead of 16, for 400 billion total parameters. That larger knowledge base explains why Maverick outperforms Scout on reasoning, coding, and multimodal tasks at the same inference cost.

Maverick reached an ELO of 1417 on LMArena, beating GPT-4o and Gemini 2.0 Flash on multimodal and reasoning benchmarks. It also matches DeepSeek V3 on coding tasks while activating fewer parameters.

Maverick fits on a single H100 host without multi-GPU coordination, making it a viable production choice for cloud GPU users.

Llama 4 Behemoth: What We Know So Far

Llama 4 Behemoth was previewed alongside Scout and Maverick in April 2025 but was still in training at the time of the public release.

It was designed with approximately 2 trillion total parameters and 288 billion active parameters across 16 experts. This scale puts it in direct competition with closed frontier models on STEM benchmarks.

As of June 2026, public weights are unlikely to become available. Behemoth served its primary purpose as a teacher model for Scout and Maverick, and Meta's subsequent launch of its closed-weight Muse Spark model in April 2026 has reduced the urgency around a public Behemoth release.

Llama 4 Scout vs Maverick: Which Should You Run?

Choosing between Scout and Maverick is a matter of hardware and use case:

  • Scout is right if:
    • You need a very long context window; its 10M token limit is unique among locally runnable models
    • Your GPU has under 80GB of VRAM.
  • Maverick is right if:
    • You need output quality on reasoning, coding, and complex multimodal tasks
    • You Have access to hardware; a multi-GPU setup to run locally at full precision.

For most individual developers, Scout on a single GPU or a Thunder Compute A100 instance is the practical starting point.

Teams building production-grade assistants or inference APIs will find Maverick worth the extra compute, especially given its benchmark parity with GPT-4o at a fraction of the API cost.

Llama 4 Benchmarks: How Does It Compare?

Benchmark comparisons for Llama 4 have to be read carefully. Meta's internal benchmarks used model variants that differed from the ones publicly released, and the AI landscape has shifted considerably since April 2025.

The comparisons below reflect the public models against the competitors they were benchmarked against at launch.

Model Active Params Context Window Multimodal Open Weight LMArena ELO
Llama 4 Scout 17B (109B total) 10M tokens Yes Yes N/A
Llama 4 Maverick 17B (400B total) 1M tokens Yes Yes 1417
GPT-4o ~200B (est.) 128K tokens Yes No ~1380
Gemini 2.0 Flash Unknown 1M tokens Yes No ~1350
Llama 3.1 405B 405B (dense) 128K tokens No Yes ~1260

Llama 4 vs ChatGPT

When comparing Llama 4 against ChatGPT (which runs on GPT-4o and newer variants), the most honest framing is that Maverick benchmarks comparably to GPT-4o across multimodal, reasoning, and coding tasks, while costing significantly less to run via API. Maverick's DocVQA score of 91.6 and its MATH performance suggest near-parity on the tasks GPT-4o has historically led. The key difference is deployment freedom: Llama 4 weights are downloadable and self-hostable, while ChatGPT is a closed API with no ability to fine-tune the base weights or run inference on your own infrastructure.

Llama 4 vs Gemini

The comparison against Gemini is similarly nuanced. Maverick outperforms Gemini 2.0 Flash on Meta's benchmarks in multimodal and reasoning tasks, which is a meaningful result given that Gemini 2.0 Flash was one of Google's most competitive efficiency-oriented models at the time. Gemini 2.5 Pro narrows or reverses that gap on several tasks, but it comes with a substantially higher per-token cost and zero ability to self-host. For teams that want to run inference privately or tune the model on proprietary data, Llama 4 offers something neither Gemini model can match.

Llama 4 vs Llama 3

The improvement from Llama 3 to Llama 4 is large enough that it represents a different class of model rather than a straightforward upgrade. Llama 3's best open-weight option, the 405B dense model, had a 128K context window, no native multimodal capability, and a knowledge cutoff of December 2023. Llama 4 Scout beats it on multimodal benchmarks despite having far fewer total parameters, and does so while fitting on a single GPU with quantization. On LiveCodeBench, Maverick scores 43.4 versus Llama 3.1 405B's 27.7 (a 57% relative improvement on real-world coding tasks).

Running Llama 4 locally is feasible for Scout on high-end consumer hardware, but Maverick requires server-grade GPUs that most developers don't have at home. Thunder Compute solves this through on-demand access to A100 and H100 instances at rates up to 80% lower than AWS, Azure, and GCP.

How Much VRAM Do You Need for Llama 4?

VRAM requirements for Llama 4 depend on which model you're running and what quantization level you choose. The table below covers the most practical configurations.

Model Quantization VRAM Required Recommended Hardware
Llama 4 Scout Q4_K_M (4-bit) ~20–24 GB RTX 4090 / A100 40GB
Llama 4 Scout Q8_0 (8-bit) ~55 GB A100 80GB
Llama 4 Maverick Q4_K_M (4-bit) ~200 GB Multi-GPU / H100 host
Llama 4 Maverick 1.78-bit (Unsloth quant) ~100 GB H100 80GB x2
Llama 4 Scout (CPU only) Q4_K_M 32 GB system RAM CPU offload (very slow)

How to Install and Run Llama 4 on a Thunder Compute GPU

Thunder Compute provides pre-configured Ollama instance templates that handle GPU driver setup, Ollama installation, and model configuration.

Rather than spending an hour on environment setup, you can have a Llama 4 model answering prompts in a few minutes.

To go deeper into local LLM tooling options, see our guides on running LM Studio and Unsloth.

Step 1: Install the Thunder Compute CLI

Download and install tnr for Windows, or macOS.

Run this command for Linux:

curl -fsSL https://raw.githubusercontent.com/Thunder-Compute/thunder-cli/main/scripts/install.sh | bash

Step 2: Login

tnr login

Step 3: Launch and connect to an Ollama instance

tnr create --template ollama

Pick the hardware configuration for your instance.

Step 4: Connect to your instance

Establish a connection once the instance is created.

tnr connect 0

Start the Ollama UI. This will take around a minute. Once it's done loading, click the link provided by the terminal to open Ollama in a browser.

You'll be prompted to create an account.

start-ollama

Step 5: Load the desired model

  1. In the Ollama UI, click "Select a model".
  2. Add the URL of the model from the Ollama page
  3. Click "Pull [MODEL_URL]" in the dropdown.

Ollama Interface showing model selection dropdown

Your download will start.

A few good variants ordered from lightest to heaviest include:

Step 6: Start chatting

Once the model is downloaded you can start interacting with it. Keep in mind that the first response will take significantly longer because the GPU is loading the model onto memory.

Thunder Compute: The Fastest Way to Get Started with Llama 4

Thunder Compute's A100 80GB instances give you enough VRAM to run Llama 4 Scout at Q8_0 quality or at Q4_K_M with generous context headroom. Multi-GPU H100 instances are available for Maverick workloads.

The platform is built specifically for ML engineers who want to iterate quickly without infrastructure overhead. Instances spin up in seconds, VS Code integration turns your cloud GPU into a local development environment, and you only pay for what you use.

Get started with Thunder Compute and have Llama 4 running in minutes.

We make GPUs cheaper

Low prices, developer-first features, simple UX. Start building today.


                                                           `..`                                          `                                  
                                                                                                                                            
                                                                                                                                            
               ``        `                                                                                                                  
                        .;.                                                                                                                 
                                                                                                                                            
      .                                                                                                                                     
                                                                                                                                            
                                                                                                                                            
                                                                                                                                            
                                                                                                                                            
                                                                                                                                           .
                                                         `....                            ``                                                
                                                                                                                                            
                                            .`                                                                                              
                                             `  `.                                                                                          
                                             `                                                                                              
                                   `.                                                                                                       
                                     `                                                                                                      
                                                                               ;`                     .                                     
                                                                                                                                            
                                                 ````                                  .```                                                 
                                                                                                                                            
                                                                                                                                            
                                                                       .                                                                    
                                                 `+`                  `.                                                                    
                                                    .`                                  ;`                                                  
                                         ``       `;                               `;;`.;;`    `                                            
                                         .`                                                                                                 
                                                                                               ` `                                          
                                    `     `                     `   ;       ;`                 `;`                                          
                                    .  .    `` `                                   ```                                                      
                                                                                                                                            
                                                                +*******.    ``     `+++++`         `.`                                     
                                                 `.......       +```````     `.                                                             
                                `                               *             ;                                                             
                                `                              `+            `*                                                             
                                  `              ````...`                     *                         `;                                  
                                                                              *                                                             
                                                                              .       ``   .`           .;                                  
                         .;```      `                   `                                    `;                 `.``.;                      
                       ;.           ;                                               .``.`      `        `             ...                   
                    `;;          `.`                       .   `    .        .           ;`         `.`                 `;.                 
                  .+`        `*```                         *   ;;  `*`  +;   *             +`                    .+        ;+               
                ;.         ;;`          ;``....+          .;   +;   *`  +;   *               ;...  `;`             `;;       `;`            
                        `;;`          `.                       *;   *`  +;   ;.                                      `;.`                   
                      `+;                                      *.   *`  +;                 `..+.                        ;+`                 
                    ;+.        .+`           `` `              `        ++            .```         .  `+;                 `+.               
                             ;+.                ......;.                                    ;.`         `+;                                 
                          `.+.               `.;;                                                         `;;`                              
                         ++`                                          `*                                     +*`                            
                      `++                                             .*`            +`                        ;+.                          
                     .;`                                .;            .*              .                         `;;                         
                                                        .             .*              .                                                     
                                                        .             .*              .                                                     
                                       ````````     `.  .`            `+              ;+`    ``````.                                        
                                     `+*.   ```     ;.   ;          ;  ;+             +*;    ````` ;*;                                      
                                   `**;             `   ;+         **   +;            +*;            +*+                                    
                                 `+*;                   `          ;*   +.             `              `+*;                                  
                               `;*;                                     `                               `+*;                                
                              ++;                  ``                                                     `+*;                              
                           `++;                    ..               `              +*                       `++;        `                   
                   `       `.                      ;`               ;              +*                         `.                            
                      `                            ;`               +              ;*                                                       
                     .+                            ;`               +              ;*`                                                      
                                                   .;;             `+`             .**`                                                     
                                                    `+*.           `*+`             .**;`                                                   
                                                      +*+           ;**+              ;**;                                                  
                            ``                         .**;           +*+.              +**.                                                
                                                        `+*+`          .+*;              `+*+.                                              
                               .;  ``.`                   ;**;           ;**;              ;**;`                                            
                             `.;;   `..`                   `**+           `***.              +**;                                           
                              `       `..                    **             ;**+              `+*+                                          
         ``                             ..`                  ++              `**+               **                                          
         ;;                              ...                 ..  ..            *+               ;+                 `                        
        `;.                               ...                ++` ;++           ++                    ..          ```                        
        ;;`                                `..`             `**`  +*`           .  .+;           ..  ;++        ```                         
       `.;                                  `.;.```````     .*+   +*`          ;+. .**.         `+;   +*.      ```