Back
How to Fine-tune Llama 4
Fine‑tune Llama 4 on a single A100 GPU, with exact commands, runtimes, and cost math.
Published:
Apr 19, 2025
Last updated:
Aug 13, 2025

Why this guide
Meta’s Llama 4 Scout uses a Mixture-of-Experts design (17B active params, 16 experts, 109B total) and supports long context. With QLoRA and Unsloth, you can fine-tune it on a single A100 80 GB. This walkthrough gives commands, runtimes, and cost math—no infra expertise required.
Prerequisites
What you need | Why |
---|---|
Thunder Compute account | Fast access to an A100 80 GB at $0.78/hr |
One-click instance + integrated terminal | |
Clean, reproducible env | |
Model & dataset hub |
Tip: Follow the Thunder Compute Quick Start to install the VS Code extension. Most prerequisites come pre-installed in Thunder Compute instances.
1) Launch an A100 80 GB instance
Console: New Instance → A100 80 GB
VS Code: Thunder tab → + → A100 80 GB
Disk: 300 GB (room for model, checkpoints, dataset)
2) Connect from VS Code
Open Command Palette → Thunder Compute: Connect (or click ⇄). The integrated terminal now runs on the GPU box—no Remote-SSH add-on needed.
3) Request model access
Request Llama 4 access via llama.com or the official Meta Hugging Face org. Approvals are usually quick.
4) Minimal QLoRA project (Unsloth)
Why Unsloth? It’s currently the most stable stack for Llama 4 QLoRA—~71 GB VRAM for Scout at micro-batch size 1, 2k context, fitting on an A100 80 GB.
Shell:
train_llama4_qlora.py
:
Run:
5) VRAM & runtime
Llama 4 Scout (QLoRA, 4-bit, Unsloth): ~70–75 GB VRAM on A100 80 GB
Llama 3-8B (QLoRA, 4-bit): < 20 GB VRAM
Cost example: 2 hours × $0.78/hr = $1.56
6) Track spend & shut down
Use the Thunder console to monitor costs. Stopping the instance halts GPU billing; disk persists at storage rates.
7) Next steps
Swap in your dataset
Increase
num_train_epochs
until validation loss plateausIf VRAM allows, set
load_in_4bit=False
for 8-bit precision

Carl Peterson
Try Thunder Compute
Start building AI/ML with the world's cheapest GPUs
Other articles you might like
Learn more about GPUs and more