Run Models

GPU availability, supported GPU counts, vCPU choices, and pricing can change. Use tnr create or the pricing page to confirm the current options before launching a longer run.

Available Guides

Model guide	Best first GPU	Runtime	Use it when
Qwen3.6 27B	RTX A6000	llama.cpp Docker	You want a tested dense-model path with a 32K context OpenAI-compatible endpoint.
GPT-OSS 120B	A100 80 GB	Ollama	You want to run the existing GPT-OSS guide from the Ollama template.
DeepSeek R1	A100 80 GB	Ollama	You want to run the existing DeepSeek R1 guide from the Ollama template.

How To Choose A Runtime

Runtime	Best fit	Notes
llama.cpp	GGUF quantized models on A6000, A100, or H100	Good for fast setup, lower VRAM use, and single-user endpoints.
vLLM	FP8 or BF16 serving on H100-class GPUs	Good for higher throughput, OpenAI-compatible APIs, long-context serving, and multi-user workloads.
Ollama	Models supported by the installed Ollama loader	Convenient when the model architecture is already supported by the template.

For Qwen3.6 27B, start with the dedicated guide below. It uses a configuration that was tested on a Thunder Compute RTX A6000 base instance: UD-Q4_K_XL, full GPU offload, 32K context, and a public OpenAI-compatible endpoint.

Run Qwen3.6 27B

Launch Qwen3.6 27B dense with the tested A6000 setup, then expose it through the /v1/chat/completions API.

Run GPT-OSS 120B

Use the existing GPT-OSS guide from the Models section.

Run DeepSeek R1

Use the existing DeepSeek R1 guide from the Models section.

Getting Started

Operations

Guides

Reference

API Reference

Available Guides

How To Choose A Runtime

Run Qwen3.6 27B

Run GPT-OSS 120B

Run DeepSeek R1

Getting Started

Operations

Guides

Reference

API Reference

Documentation Index

​Available Guides

​How To Choose A Runtime

Run Qwen3.6 27B

Run GPT-OSS 120B

Run DeepSeek R1

Available Guides

How To Choose A Runtime