Use these guides to launch open models on Thunder Compute, pick a model format that fits the GPU you selected, and expose the model through a local or public API endpoint.Documentation Index
Fetch the complete documentation index at: https://www.thundercompute.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
GPU availability, supported GPU counts, vCPU choices, and pricing can change. Use
tnr create or the pricing page to confirm the current options before launching a longer run.Available Guides
| Model guide | Best first GPU | Runtime | Use it when |
|---|---|---|---|
| Qwen3.6 27B | RTX A6000 | llama.cpp Docker | You want a tested dense-model path with a 32K context OpenAI-compatible endpoint. |
| GPT-OSS 120B | A100 80 GB | Ollama | You want to run the existing GPT-OSS guide from the Ollama template. |
| DeepSeek R1 | A100 80 GB | Ollama | You want to run the existing DeepSeek R1 guide from the Ollama template. |
How To Choose A Runtime
| Runtime | Best fit | Notes |
|---|---|---|
| llama.cpp | GGUF quantized models on A6000, A100, or H100 | Good for fast setup, lower VRAM use, and single-user endpoints. |
| vLLM | FP8 or BF16 serving on H100-class GPUs | Good for higher throughput, OpenAI-compatible APIs, long-context serving, and multi-user workloads. |
| Ollama | Models supported by the installed Ollama loader | Convenient when the model architecture is already supported by the template. |
UD-Q4_K_XL, full GPU offload, 32K context, and a public OpenAI-compatible endpoint.
Run Qwen3.6 27B
Launch Qwen3.6 27B dense with the tested A6000 setup, then expose it through the
/v1/chat/completions API.Run GPT-OSS 120B
Use the existing GPT-OSS guide from the Models section.
Run DeepSeek R1
Use the existing DeepSeek R1 guide from the Models section.