Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.thundercompute.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Use these guides to launch open models on Thunder Compute, pick a model format that fits the GPU you selected, and expose the model through a local or public API endpoint.
GPU availability, supported GPU counts, vCPU choices, and pricing can change. Use tnr create or the pricing page to confirm the current options before launching a longer run.

Available Guides

Model guideBest first GPURuntimeUse it when
Qwen3.6 27BRTX A6000llama.cpp DockerYou want a tested dense-model path with a 32K context OpenAI-compatible endpoint.
GPT-OSS 120BA100 80 GBOllamaYou want to run the existing GPT-OSS guide from the Ollama template.
DeepSeek R1A100 80 GBOllamaYou want to run the existing DeepSeek R1 guide from the Ollama template.

How To Choose A Runtime

RuntimeBest fitNotes
llama.cppGGUF quantized models on A6000, A100, or H100Good for fast setup, lower VRAM use, and single-user endpoints.
vLLMFP8 or BF16 serving on H100-class GPUsGood for higher throughput, OpenAI-compatible APIs, long-context serving, and multi-user workloads.
OllamaModels supported by the installed Ollama loaderConvenient when the model architecture is already supported by the template.
For Qwen3.6 27B, start with the dedicated guide below. It uses a configuration that was tested on a Thunder Compute RTX A6000 base instance: UD-Q4_K_XL, full GPU offload, 32K context, and a public OpenAI-compatible endpoint.

Run Qwen3.6 27B

Launch Qwen3.6 27B dense with the tested A6000 setup, then expose it through the /v1/chat/completions API.

Run GPT-OSS 120B

Use the existing GPT-OSS guide from the Models section.

Run DeepSeek R1

Use the existing DeepSeek R1 guide from the Models section.