Run GPT‑OSS 120B on Thunder Compute

Looking for the cheapest way to self‑host GPT‑OSS 120B or just want to try it out without buying hardware? Thunder Compute lets you spin up pay‑per‑minute NVIDIA A100 GPUs, so you only pay for what you use. Follow the steps below to get the model running in minutes.

Prerequisite: Ensure your Thunder Compute account is ready. If not, start with our Quickstart Guide.

Step 1 — Create a Cost‑Effective Prototyping‑Mode GPU Instance

Launch an 80 GB A100 instance (large enough to host the full 120 B model):

tnr create \
  --gpu a100xl \
  --vcpus 4 \
  --mode prototyping \
  --disk-size-gb 200 \
  --template "ollama"

This command starts a lower‑cost prototyping‑mode instance with:

GPU: A100 80 GB
vCPUs: 4
Storage: 200 GB (from the Ollama template)

The GPU, vCPU Count, and Mode (Prototyping / Production), can be changed later if your requirements change, and the amount of storage can be increased if needed.

For details on templates, see the Instance Templates guide.

Step 2 — Check Status and Connect

Verify that the instance is running, it can take a minute to spin up:

tnr status

Connect to the instance:

tnr connect <instance‑id>

Step 3 — Start Ollama and Download the Model

Inside the instance, start Ollama (this also launches OpenWebUI and a Cloudflare tunnel):

start-ollama

While the UI is initializing, download the model, here we are downloading the 120B variant of GPT‑OSS, but any models can be downloaded from the Ollama Model Library:

ollama pull gpt-oss:120b

Tip: If you encounter issues, consult the troubleshooting guide.

Give the UI about 60 seconds to finish loading.

Step 4 — Access the Web UI and Select the Model

Open http://localhost:8080 in your browser.
Choose gpt-oss:120b from the model dropdown.

Step 5 — Run GPT‑OSS 120B

Enter a prompt in the web interface, for example:

“Tell a tale of a seaman who found the treasure of the clouds by following the sound of thunder.”

Conclusion

That’s it—the cheapest way to run GPT‑OSS 120B on Thunder Compute. For more, check out:

Happy building!

Documentation

Guides

API Reference

Run GPT‑OSS 120B on Thunder Compute

Run GPT‑OSS 120B on Thunder Compute

Step 1 — Create a Cost‑Effective Prototyping‑Mode GPU Instance

Step 2 — Check Status and Connect

Step 3 — Start Ollama and Download the Model

Step 4 — Access the Web UI and Select the Model

Step 5 — Run GPT‑OSS 120B

Conclusion

Documentation

Guides

API Reference

​Run GPT‑OSS 120B on Thunder Compute

​Step 1 — Create a Cost‑Effective Prototyping‑Mode GPU Instance

​Step 2 — Check Status and Connect

​Step 3 — Start Ollama and Download the Model

​Step 4 — Access the Web UI and Select the Model

​Step 5 — Run GPT‑OSS 120B

​Conclusion

Run GPT‑OSS 120B on Thunder Compute

Step 1 — Create a Cost‑Effective Prototyping‑Mode GPU Instance

Step 2 — Check Status and Connect

Step 3 — Start Ollama and Download the Model

Step 4 — Access the Web UI and Select the Model

Step 5 — Run GPT‑OSS 120B

Conclusion