GPU availability, supported GPU counts, vCPU choices, and pricing can change. Use
tnr create or the pricing page to confirm the current options before launching a longer run.Available Guides
| Model guide | Best first GPU | Runtime | Use it when |
|---|---|---|---|
| Qwen3.6 27B | RTX A6000 | llama.cpp Docker | You want a tested dense-model path with a 32K context OpenAI-compatible endpoint. |
| GPT-OSS 120B | A100 80 GB | Ollama | You want to run the existing GPT-OSS guide from the Ollama template. |
| DeepSeek R1 | A100 80 GB | Ollama | You want to run the existing DeepSeek R1 guide from the Ollama template. |
How To Choose A Runtime
| Runtime | Best fit | Notes |
|---|---|---|
| llama.cpp | GGUF quantized models on A6000, A100, or H100 | Good for fast setup, lower VRAM use, and single-user endpoints. |
| vLLM | FP8 or BF16 serving on H100-class GPUs | Good for higher throughput, OpenAI-compatible APIs, long-context serving, and multi-user workloads. |
| Ollama | Models supported by the installed Ollama loader | Convenient when the model architecture is already supported by the template. |
UD-Q4_K_XL, full GPU offload, 32K context, and a public OpenAI-compatible endpoint.
Run Qwen3.6 27B
Launch Qwen3.6 27B dense with the tested A6000 setup, then expose it through the
/v1/chat/completions API.Run GPT-OSS 120B
Use the existing GPT-OSS guide from the Models section.
Run DeepSeek R1
Use the existing DeepSeek R1 guide from the Models section.