# Authentication
Source: https://www.thundercompute.com/docs/agents/operations/authentication
Use your AI agent to authenticate with Thunder Compute, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
Thunder Compute uses your agent's standard MCP authentication flow. For example, in Claude Code you can run `/mcp` to connect and complete the browser-based sign-in flow for the Thunder Compute server.
# Connecting to Instances
Source: https://www.thundercompute.com/docs/agents/operations/connecting-to-instances
Ask your AI agent to connect to instances and run commands, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Creating Instances
Source: https://www.thundercompute.com/docs/agents/operations/creating-instances
Ask your AI agent to create Thunder Compute instances, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Deleting Instances
Source: https://www.thundercompute.com/docs/agents/operations/deleting-instances
Ask your AI agent to delete Thunder Compute instances, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# File Transfers
Source: https://www.thundercompute.com/docs/agents/operations/file-transfers
Ask your AI agent to move data to and from Thunder Compute instances, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Modifying Instances
Source: https://www.thundercompute.com/docs/agents/operations/modifying-instances
Ask your AI agent to update instance configuration, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Monitoring Instances
Source: https://www.thundercompute.com/docs/agents/operations/monitoring-instances
Ask your AI agent for instance status and usage, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Port Forwarding
Source: https://www.thundercompute.com/docs/agents/operations/port-forwarding
Ask your AI agent to set up or inspect port forwarding, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Snapshots
Source: https://www.thundercompute.com/docs/agents/operations/snapshots
Ask your AI agent to create, restore, or delete snapshots, or switch to another interface for direct control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# SSH on Thunder Compute
Source: https://www.thundercompute.com/docs/agents/operations/ssh
Ask your AI agent to work through Thunder Compute over MCP, or switch to another interface for direct SSH control.
MCP server
Editor extension
Command line
Web interface
Ask your agent in natural language. For direct control, switch to another interface.
# Quickstart
Source: https://www.thundercompute.com/docs/agents/quickstart
Connect AI tools to GPUs to develop faster
MCP server
Editor extension
Command line
Web interface
In just 60 seconds you can start developing on GPUs by connecting Cursor, Claude Code, etc. directly to our API. This allows your AI agent to help you set up and manage servers on your behalf.
## Create an Account
Sign up for a Thunder Compute account [here](https://console.thundercompute.com/signup).
## Add a Payment Method
Add a [payment method](https://console.thundercompute.com/settings/billing) to your account.
## Connect Your Agent
Use an AI agent that supports remote MCP servers (Claude Code, Cursor, Codex, etc.).
Run this in your terminal:
```bash theme={null}
claude mcp add --transport http thunder-compute https://www.thundercompute.com/mcp
```
Then start Claude Code and run `/mcp` to authenticate. A browser window will open for you to log in and authorize access.
Alternatively, add to `~/.claude.json` (global) or `.claude.json` in your project root:
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"url": "https://www.thundercompute.com/mcp"
}
}
}
```
Run this in your terminal:
```bash theme={null}
codex mcp add thunder-compute --url https://www.thundercompute.com/mcp
```
Codex will prompt you to authenticate via OAuth when you first use a Thunder Compute tool.
Add to `.cursor/mcp.json` in your project root (or `~/.cursor/mcp.json` for global access):
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"type": "http",
"url": "https://www.thundercompute.com/mcp"
}
}
}
```
Add to your MCP configuration:
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"serverUrl": "https://www.thundercompute.com/mcp",
"headers": {
"Content-Type": "application/json"
}
}
}
}
```
Run the interactive setup:
```bash theme={null}
opencode mcp add
```
When prompted:
* **Server name:** `thunder-compute`
* **Server type:** `Remote`
* **URL:** `https://www.thundercompute.com/mcp`
* **Requires OAuth:** `Yes`
* **Pre-registered client ID:** `No`
```bash theme={null}
opencode mcp auth thunder-compute
```
A browser window will open for you to log in and authorize access.
```bash theme={null}
opencode
```
The Thunder Compute tools are now available in your session.
Alternatively, add to `~/.config/opencode/opencode.json`:
```json theme={null}
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"thunder-compute": {
"type": "remote",
"url": "https://www.thundercompute.com/mcp",
"oauth": {}
}
}
}
```
Then run `opencode mcp auth thunder-compute` to authenticate.
If you use an MCP client that supports [Smithery](https://smithery.ai), you can install directly:
```bash theme={null}
npx @smithery/cli install @thunder-compute/thunder-compute
```
Or browse the [Thunder Compute listing on Smithery](https://smithery.ai/server/@thunder-compute/thunder-compute) and click **Install** for your client.
For custom integrations, the MCP server uses [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http) at a single endpoint. Authentication is via OAuth 2.0 with standard MCP discovery.
**Endpoint:** `https://www.thundercompute.com/mcp`
```bash theme={null}
curl -X POST https://www.thundercompute.com/mcp \
-H "Authorization: Bearer " \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-agent", "version": "1.0.0" }
},
"id": 1
}'
```
## Try It Out
Once configured, ask your AI agent to:
* "Spin up an A100 instance with PyTorch"
* "List my running instances"
* "Run `nvidia-smi` on my instance"
* "How much have I spent this month?"
The MCP server gives your agent access to everything you would want to do with a GPU.
## Next Steps
* Read the [full MCP Server guide](/guides/mcp-server) for the complete list of available tools, prompts, and troubleshooting
* Learn about [Prototyping vs Production](/prototyping-vs-production) to choose the right mode for your workload
* Explore [Technical Specifications](/technical-specs) for hardware, networking, and storage details
# Add SSH key to instance
Source: https://www.thundercompute.com/docs/api-reference/instances/add-ssh-key-to-instance
https://api.thundercompute.com:8443/openapi.json post /instances/{id}/add_key
Add an SSH key to an existing instance. If public_key is provided in the request body, it will be added to authorized_keys. If no public_key is provided, a new key pair will be generated and the private key returned.
# Create instance
Source: https://www.thundercompute.com/docs/api-reference/instances/create-instance
https://api.thundercompute.com:8443/openapi.json post /instances/create
Create a new compute instance
# Delete instance
Source: https://www.thundercompute.com/docs/api-reference/instances/delete-instance
https://api.thundercompute.com:8443/openapi.json post /instances/{id}/delete
Delete a compute instance by ID
# List instances
Source: https://www.thundercompute.com/docs/api-reference/instances/list-instances
https://api.thundercompute.com:8443/openapi.json get /instances/list
Get a list of user's compute instances
# Modify instance
Source: https://www.thundercompute.com/docs/api-reference/instances/modify-instance
https://api.thundercompute.com:8443/openapi.json post /instances/{id}/modify
Modify a running compute instance's resources
# Create a snapshot
Source: https://www.thundercompute.com/docs/api-reference/snapshots/create-a-snapshot
https://api.thundercompute.com:8443/openapi.json post /snapshots/create
Create a new snapshot from a running instance
# Delete a snapshot
Source: https://www.thundercompute.com/docs/api-reference/snapshots/delete-a-snapshot
https://api.thundercompute.com:8443/openapi.json delete /snapshots/{id}
Delete a snapshot by ID
# List snapshots
Source: https://www.thundercompute.com/docs/api-reference/snapshots/list-snapshots
https://api.thundercompute.com:8443/openapi.json get /snapshots/list
Get a list of all snapshots for the authenticated user's organization
# Add an SSH key
Source: https://www.thundercompute.com/docs/api-reference/ssh-keys/add-an-ssh-key
https://api.thundercompute.com:8443/openapi.json post /keys/add
Add a new SSH public key to the authenticated user's organization
# Delete an SSH key
Source: https://www.thundercompute.com/docs/api-reference/ssh-keys/delete-an-ssh-key
https://api.thundercompute.com:8443/openapi.json delete /keys/{id}
Delete an SSH key by ID
# List SSH keys
Source: https://www.thundercompute.com/docs/api-reference/ssh-keys/list-ssh-keys
https://api.thundercompute.com:8443/openapi.json get /keys/list
Get a list of all SSH keys for the authenticated user's organization
# Get current pricing
Source: https://www.thundercompute.com/docs/api-reference/utilities/get-current-pricing
https://api.thundercompute.com:8443/openapi.json get /pricing
Retrieve current hourly pricing information for compute resources
# Get GPU specifications
Source: https://www.thundercompute.com/docs/api-reference/utilities/get-gpu-specifications
https://api.thundercompute.com:8443/openapi.json get /specs
Retrieve GPU spec configurations for all supported GPU types, counts, and modes
# Get thunder templates
Source: https://www.thundercompute.com/docs/api-reference/utilities/get-thunder-templates
https://api.thundercompute.com:8443/openapi.json get /thunder-templates
Get available thunder templates for instance creation
# Billing
Source: https://www.thundercompute.com/docs/billing
Understand Thunder Compute's usage-based billing, payment methods, billing alerts, current rates, and tips for saving on GPU cloud costs.
## Payment Options
There are **two ways to pay** for Thunder Compute:
### Option 1: Auto-Pay
Set up auto-pay by saving a credit card. Go to [console.thundercompute.com/settings/billing](https://console.thundercompute.com/settings/billing) and click "Manage saved payment method" (or "Add card to enable auto-pay" if no card is on file).
### Option 2: Preload Credit
Add credit directly to your account as an alternative to auto-pay. This credit never expires and will be used before any saved payment method.
**Order of payment**
1. Any preloaded credit you've added
2. Charges to your saved payment method
You can switch between options or use both—set up auto-pay anytime, even if you started with preloaded credit.
## Billing Alerts
* **Instance reminders:** We'll email you about any running instances so you're never caught off guard.
* **Threshold charges:** As your usage grows, we'll bill your card at preset checkpoints (which rise over time) to prevent runaway bills.
## Our rates
All compute resources are billed per minute only while your instances run. Rates and promotions are subject to change without notice. For current rates, see our [pricing page](https://www.thundercompute.com/pricing).
## Credit Terms
* **Preloaded credit** does not expire and will be used before charging your saved card.
* **Promotional credit** can be revoked at our discretion.
* **Refunds:** Credit is non-refundable.
## Money-Saving Tips
While Thunder Compute is already the cheapest GPU cloud platform, there are a few strategies we recommend to reduce your bill:
* Delete instances when you're done with them to stop billing.
* Right‑size new workloads with `tnr create --gpu`, `--vcpus`, and related flags so you only pay for what you use.
We think this balances a smooth experience with strong verification—but if you have feedback or questions, please hop into our [Discord](https://discord.com/invite/nwuETS9jJK). We're always happy to improve!
# Data Processing Addendum
Source: https://www.thundercompute.com/docs/guides/data-processing-addendum
Review the Data Processing Addendum for Thunder Compute. Audit legal terms, data handling protocols, and privacy compliance for your organization.
## Sample Agreement
Data Processing Agreement
## Using this DPA
This DPA has 2 parts: (1) the Key Terms on this Cover Page and (2) the Common Paper DPA Standard Terms Version 1 posted at commonpaper.com/standards/data-processing-agreement/1.1 (“DPA Standard Terms”), which is incorporated by reference. If there is any inconsistency between the parts of the DPA, the Cover Page will control over the DPA Standard Terms. Capitalized and highlighted words have the meanings given on the Cover Page. However, if the Cover Page omits or does not define a highlighted word, the default meaning will be “none” or “not applicable” and the correlating clause, sentence, or section does not apply to this Agreement. All other capitalized words have the meanings given in the DPA Standard Terms or the Agreement. A copy of the DPA Standard Terms is attached for convenience only.
## Key Terms
The key legal terms of the DPA are as follows:
| Term | Details |
| ------------------------- | ---------------------------------------------------------------------------------------------- |
| Agreement | Reference to sales contract will be set when sending agreement |
| Approved Subprocessors | [https://www.thundercompute.com/sub-processors](https://www.thundercompute.com/sub-processors) |
| Provider Security Contact | `support@thundercompute.com` |
| Security Policy | As defined in the Agreement. |
### Service Provider Relationship
To the extent California Consumer Privacy Act, Cal. Civ. Code § 1798.100 et seq (“CCPA”) applies, the parties acknowledge and agree that Provider is a service provider and is receiving Personal Data from Customer to provide the Service as agreed in the Agreement and detailed below (see Nature and Purpose of Processing), which constitutes a limited and specified business purpose. Provider will not sell or share any Personal Data provided by Customer under the Agreement. In addition, Provider will not retain, use, or disclose any Personal Data provided by Customer under the Agreement except as necessary for providing the Service for Customer, as stated in the Agreement, or as permitted by Applicable Data Protection Laws. Provider certifies that it understands the restrictions of this paragraph and will comply with all Applicable Data Protection Laws. Provider will notify Customer if it can no longer meet its obligations under the CCPA.
## Restricted Transfers
### Governing Member State
* EEA Transfers: Ireland
* UK Transfers: England and Wales
## Annex I(A) List of Parties
### Data Exporter
* Name: the Customer signing this DPA
* Activities relevant to transfer: See Annex I(B)
* Role: Controller
### Data Importer
* Name: the Provider signing this DPA
* Contact person: Carl Peterson, CEO
* Address: 887 w marietta st nw, Suite N105, Georgia 30318, USA
* Activities relevant to transfer: See Annex I(B)
* Role: Processor
## Annex I(B) Description of Transfer and Processing Activities
### Service
The Service is: GPU cloud computing with on-demand cloud instances, backed by physical servers, in addition to data storage.
### Categories of Data Subjects
* Customer's employees
### Categories of Personal Data
* Name
* Contact information such as email, phone number, or address
* Financial information such as bank account numbers
* Transactional information such as account information or purchases
* User activity and analysis such as device information or IP address
* Location information
### Special Category Data
Is special category data (as defined in Article 9 of the GDPR) Processed? No
### Frequency of Transfer
Continuous
### Nature and Purpose of Processing
* Receiving data, including collection, accessing, retrieval, recording, and data entry
* Holding data, including storage, organization, and structuring
* Using data, including analysis, consultation, testing, automated decision making, and profiling
* Updating data, including correcting, adaption, alteration, alignment, and combination
* Protecting data, including restricting, encrypting, and security testing
* Sharing data, including disclosure, dissemination, allowing access, or otherwise making available
* Returning data to the data exporter or data subject
* Erasing data, including destruction and deletion
### Duration of Processing
Provider will process Customer Personal Data as long as required (i) to conduct the Processing activities instructed in Section 2.2(a)-(d) of the Standard Terms; or (ii) by Applicable Laws.
## Annex I(C)
### Competent Supervisory Authority
The supervisory authority will be the supervisory authority of the data exporter, as determined in accordance with Clause 13 of the EEA SCCs or the relevant provision of the UK Addendum.
## Annex II
### Technical and Organizational Security Measures
See Security Policy
Provider and Customer have not changed the DPA Standard Terms except for the details on the Cover Page above. By signing this Cover Page, each party agrees to enter into this DPA as of the last date of signature below.
## Signatures
| Field | Provider (Thunder Compute) | Customer |
| -------------------- | -------------------------- | -------- |
| Signature | | |
| Print Name | | |
| Title | | |
| Legal Notice Address | `carl@thundercompute.com` | |
| Date | | |
## 1. Processor and Subprocessor Relationships
### 1.1 Provider as Processor
In situations where Customer is a Controller of the Customer Personal Data, Provider will be deemed a Processor that is Processing Personal Data on behalf of Customer.
### 1.2 Provider as Subprocessor
In situations where Customer is a Processor of the Customer Personal Data, Provider will be deemed a Subprocessor of the Customer Personal Data.
## 2. Processing
### 2.1 Processing Details
Annex I(B) on the Cover Page describes the subject matter, nature, purpose, and duration of this Processing, as well as the Categories of Personal Data collected and Categories of Data Subjects.
### 2.2 Processing Instructions
Customer instructs Provider to Process Customer Personal Data: (a) to provide and maintain the Service; (b) as may be further specified through Customer’s use of the Service; (c) as documented in the Agreement; and (d) as documented in any other written instructions given by Customer and acknowledged by Provider about Processing Customer Personal Data under this DPA. Provider will abide by these instructions unless prohibited from doing so by Applicable Laws. Provider will immediately inform Customer if it is unable to follow the Processing instructions. Customer has given and will only give instructions that comply with Applicable Laws.
### 2.3 Processing by Provider
Provider will only Process Customer Personal Data in accordance with this DPA, including the details in the Cover Page. If Provider updates the Service to update existing or include new products, features, or functionality, Provider may change the Categories of Data Subjects, Categories of Personal Data, Special Category Data, Special Category Data Restrictions or Safeguards, Frequency of Transfer, Nature and Purpose of Processing, and Duration of Processing as needed to reflect the updates by notifying Customer of the updates and changes.
### 2.4 Customer Processing
Where Customer is a Processor and Provider is a Subprocessor, Customer will comply with all Applicable Laws that apply to Customer’s Processing of Customer Personal Data. Customer’s agreement with its Controller will similarly require Customer to comply with all Applicable Laws that apply to Customer as a Processor. In addition, Customer will comply with the Subprocessor requirements in Customer’s agreement with its Controller.
### 2.5 Consent to Processing
Customer has complied with and will continue to comply with all Applicable Data Protection Laws concerning its provision of Customer Personal Data to Provider and/or the Service, including making all disclosures, obtaining all consents, providing adequate choice, and implementing relevant safeguards required under Applicable Data Protection Laws.
### 2.6 Subprocessors
1. Provider will not provide, transfer, or hand over any Customer Personal Data to a Subprocessor unless Customer has approved the Subprocessor. The current list of Approved Subprocessors includes the identities of the Subprocessors, their country of location, and their anticipated Processing tasks. Provider will inform Customer at least 10 business days in advance and in writing of any intended changes to the Approved Subprocessors whether by addition or replacement of a Subprocessor, which allows Customer to have enough time to object to the changes before the Provider begins using the new Subprocessor(s). Provider will give Customer the information necessary to allow Customer to exercise its right to object to the change to Approved Subprocessors. Customer has 30 days after notice of a change to the Approved Subprocessors to object, otherwise Customer will be deemed to accept the changes. If Customer objects to the change within 30 days of notice, Customer and Provider will cooperate in good faith to resolve Customer’s objection or concern.
2. When engaging a Subprocessor, Provider will have a written agreement with the Subprocessor that ensures the Subprocessor only accesses and uses Customer Personal Data (i) to the extent required to perform the obligations subcontracted to it, and (ii) consistent with the terms of Agreement.
3. If the GDPR applies to the Processing of Customer Personal Data, (i) the data protection obligations described in this DPA (as referred to in Article 28(3) of the GDPR, if applicable) are also imposed on the Subprocessor, and (ii) Provider’s agreement with the Subprocessor will incorporate these obligations, including details about how Provider and its Subprocessor will coordinate to respond to inquiries or requests about the Processing of Customer Personal Data. In addition, Provider will share, at Customer’s request, a copy of its agreements (including any amendments) with its Subprocessors. To the extent necessary to protect business secrets or other confidential information, including personal data, Provider may redact the text of its agreement with its Subprocessor prior to sharing a copy.
4. Provider remains fully liable for all obligations subcontracted to its Subprocessors, including the acts and omissions of its Subprocessors in Processing Customer Personal Data. Provider will notify Customer of any failure by its Subprocessors to fulfill a material obligation about Customer Personal Data under the agreement between Provider and the Subprocessor.
## 3. Restricted Transfers
### 3.1 Authorization
Customer agrees that Provider may transfer Customer Personal Data outside the EEA, the United Kingdom, or other relevant geographic territory as necessary to provide the Service. If Provider transfers Customer Personal Data to a territory for which the European Commission or other relevant supervisory authority has not issued an adequacy decision, Provider will implement appropriate safeguards for the transfer of Customer Personal Data to that territory consistent with Applicable Data Protection Laws.
### 3.2 Ex-EEA Transfers
Customer and Provider agree that if the GDPR protects the transfer of Customer Personal Data, the transfer is from Customer from within the EEA to Provider outside of the EEA, and the transfer is not governed by an adequacy decision made by the European Commission, then by entering into this DPA, Customer and Provider are deemed to have signed the EEA SCCs and their Annexes, which are incorporated by reference. Any such transfer is made pursuant to the EEA SCCs, which are completed as follows:
1. Module Two (Controller to Processor) of the EEA SCCs apply when Customer is a Controller and Provider is Processing Customer Personal Data for Customer as a Processor.
2. Module Three (Processor to Sub-Processor) of the EEA SCCs apply when Customer is a Processor and Provider is Processing Customer Personal Data on behalf of Customer as a Subprocessor.
3. For each module, the following applies (when applicable):
* The optional docking clause in Clause 7 does not apply;
* In Clause 9, Option 2 (general written authorization) applies, and the minimum time period for prior notice of Subprocessor changes is 10 business days;
* In Clause 11, the optional language does not apply;
* All square brackets in Clause 13 are removed;
* In Clause 17 (Option 1), the EEA SCCs will be governed by the laws of Governing Member State;
* In Clause 18(b), disputes will be resolved in the courts of the Governing Member State; and
* The Cover Page to this DPA contains the information required in Annex I, Annex II, and Annex III of the EEA SCCs.
### 3.3 Ex-UK Transfers
Customer and Provider agree that if the UK GDPR protects the transfer of Customer Personal Data, the transfer is from Customer from within the United Kingdom to Provider outside of the United Kingdom, and the transfer is not governed by an adequacy decision made by the United Kingdom Secretary of State, then by entering into this DPA, Customer and Provider are deemed to have signed the UK Addendum and their Annexes, which are incorporated by reference. Any such transfer is made pursuant to the UK Addendum, which is completed as follows:
1. Section 3.2 of this DPA contains the information required in Table 2 of the UK Addendum.
2. Table 4 of the UK Addendum is modified as follows: Neither party may end the UK Addendum as set out in Section 19 of the UK Addendum; to the extent ICO issues a revised Approved Addendum under Section 18 of the UK Addendum, the parties will work in good faith to revise this DPA accordingly.
3. The Cover Page contains the information required by Annex 1A, Annex 1B, Annex II, and Annex III of the UK Addendum.
### 3.4 Other International Transfers
For Personal Data transfers where Swiss law (and not the law in any EEA member state or the United Kingdom) applies to the international nature of the transfer, references to the GDPR in Clause 4 of the EEA SCCs are, to the extent legally required, amended to refer to the Swiss Federal Data Protection Act or its successor instead, and the concept of supervisory authority will include the Swiss Federal Data Protection and Information Commissioner.
## 4. Security Incident Response
Upon becoming aware of any Security Incident, Provider will: (a) notify Customer without undue delay when feasible, but no later than 72 hours after becoming aware of the Security Incident; (b) provide timely information about the Security Incident as it becomes known or as is reasonably requested by Customer; and (c) promptly take reasonable steps to contain and investigate the Security Incident. Provider’s notification of or response to a Security Incident as required by this DPA will not be construed as an acknowledgment by Provider of any fault or liability for the Security Incident.
## 5. Audit & Reports
### 5.1 Audit Rights
Provider will give Customer all information reasonably necessary to demonstrate its compliance with this DPA and Provider will allow for and contribute to audits, including inspections by Customer, to assess Provider’s compliance with this DPA. However, Provider may restrict access to data or information if Customer’s access to the information would negatively impact Provider’s intellectual property rights, confidentiality obligations, or other obligations under Applicable Laws. Customer acknowledges and agrees that it will only exercise its audit rights under this DPA and any audit rights granted by Applicable Data Protection Laws by instructing Provider to comply with the reporting and due diligence requirements below. Provider will maintain records of its compliance with this DPA for 3 years after the DPA ends.
### 5.2 Security Reports
Customer acknowledges that Provider is regularly audited against the standards defined in the Security Policy by independent third-party auditors. Upon written request, Provider will give Customer, on a confidential basis, a summary copy of its then-current Report so that Customer can verify Provider’s compliance with the standards defined in the Security Policy.
### 5.3 Security Due Diligence
In addition to the Report, Provider will respond to reasonable requests for information made by Customer to confirm Provider’s compliance with this DPA, including responses to information security, due diligence, and audit questionnaires, or by giving additional information about its information security program. All such requests must be in writing and made to the Provider Security Contact and may only be made once a year.
## 6. Coordination & Cooperation
### 6.1 Response to Inquiries
If Provider receives any inquiry or request from anyone else about the Processing of Customer Personal Data, Provider will notify Customer about the request and Provider will not respond to the request without Customer’s prior consent. Examples of these kinds of inquiries and requests include a judicial or administrative or regulatory agency order about Customer Personal Data where notifying Customer is not prohibited by Applicable Law, or a request from a data subject. If allowed by Applicable Law, Provider will follow Customer’s reasonable instructions about these requests, including providing status updates and other information reasonably requested by Customer. If a data subject makes a valid request under Applicable Data Protection Laws to delete or opt out of Customer’s giving of Customer Personal Data to Provider, Provider will assist Customer in fulfilling the request according to the Applicable Data Protection Law. Provider will cooperate with and provide reasonable assistance to Customer, at Customer’s expense, in any legal response or other procedural action taken by Customer in response to a third-party request about Provider’s Processing of Customer Personal Data under this DPA.
### 6.2 DPIAs and DTIAs
If required by Applicable Data Protection Laws, Provider will reasonably assist Customer in conducting any mandated data protection impact assessments or data transfer impact assessments and consultations with relevant data protection authorities, taking into consideration the nature of the Processing and Customer Personal Data.
## 7. Deletion of Customer Personal Data
### 7.1 Deletion by Customer
Provider will enable Customer to delete Customer Personal Data in a manner consistent with the functionality of the Services. Provider will comply with this instruction as soon as reasonably practicable except where further storage of Customer Personal Data is required by Applicable Law.
### 7.2 Deletion at DPA Expiration
1. After the DPA expires, Provider will return or delete Customer Personal Data at Customer’s instruction unless further storage of Customer Personal Data is required or authorized by Applicable Law. If return or destruction is impracticable or prohibited by Applicable Laws, Provider will make reasonable efforts to prevent additional Processing of Customer Personal Data and will continue to protect the Customer Personal Data remaining in its possession, custody, or control. For example, Applicable Laws may require Provider to continue hosting or Processing Customer Personal Data.
2. If Customer and Provider have entered the EEA SCCs or the UK Addendum as part of this DPA, Provider will only give Customer the certification of deletion of Personal Data described in Clause 8.1(d) and Clause 8.5 of the EEA SCCs if Customer asks for one.
## 8. Limitation of Liability
### 8.1 Liability Caps and Damages Waiver
To the maximum extent permitted under Applicable Data Protection Laws, each party’s total cumulative liability to the other party arising out of or related to this DPA will be subject to the waivers, exclusions, and limitations of liability stated in the Agreement.
### 8.2 Related-Party Claims
Any claims made against Provider or its Affiliates arising out of or related to this DPA may only be brought by the Customer entity that is a party to the Agreement.
### 8.3 Exceptions
This DPA does not limit any liability to an individual about the individual’s data protection rights under Applicable Data Protection Laws. In addition, this DPA does not limit any liability between the parties for violations of the EEA SCCs or UK Addendum.
## 9. Conflicts Between Documents
This DPA forms part of and supplements the Agreement. If there is any inconsistency between this DPA, the Agreement, or any of their parts, the part listed earlier will control over the part listed later for that inconsistency: (1) the EEA SCCs or the UK Addendum, (2) this DPA, and then (3) the Agreement.
## 10. Term of Agreement
This DPA will start when Provider and Customer agree to a Cover Page for the DPA and sign or electronically accept the Agreement and will continue until the Agreement expires or is terminated. However, Provider and Customer will each remain subject to the obligations in this DPA and Applicable Data Protection Laws until Customer stops transferring Customer Personal Data to Provider and Provider stops Processing Customer Personal Data.
## 11. Definitions
### 11.1 Applicable Laws
“Applicable Laws” means the laws, rules, regulations, court orders, and other binding requirements of a relevant government authority that apply to or govern a party.
### 11.2 Applicable Data Protection Laws
“Applicable Data Protection Laws” means the Applicable Laws that govern how the Service may process or use an individual’s personal information, personal data, personally identifiable information, or other similar term.
### 11.3 Controller
“Controller” will have the meaning(s) given in the Applicable Data Protection Laws for the company that determines the purpose and extent of Processing Personal Data.
### 11.4 Cover Page
“Cover Page” means a document that is signed or electronically accepted by the parties that incorporates these DPA Standard Terms and identifies Provider, Customer, and the subject matter and details of the data processing.
### 11.5 Customer Personal Data
“Customer Personal Data” means Personal Data that Customer uploads or provides to Provider as part of the Service and that is governed by this DPA.
### 11.6 DPA
“DPA” means these DPA Standard Terms, the Cover Page between Provider and Customer, and the policies and documents referenced in or attached to the Cover Page.
### 11.7 EEA SCCs
“EEA SCCs” means the standard contractual clauses annexed to the European Commission's Implementing Decision 2021/914 of 4 June 2021 on standard contractual clauses for the transfer of personal data to third countries pursuant to Regulation (EU) 2016/679 of the European Parliament and of the European Council.
### 11.8 European Economic Area (EEA)
“European Economic Area” or “EEA” means the member states of the European Union, Norway, Iceland, and Liechtenstein.
### 11.9 GDPR
“GDPR” means European Union Regulation 2016/679 as implemented by local law in the relevant EEA member nation.
### 11.10 Personal Data
“Personal Data” will have the meaning(s) given in the Applicable Data Protection Laws for personal information, personal data, or other similar term.
### 11.11 Processing
“Processing” or “Process” will have the meaning(s) given in the Applicable Data Protection Laws for any use of, or performance of a computer operation on, Personal Data, including by automatic methods.
### 11.12 Processor
“Processor” will have the meaning(s) given in the Applicable Data Protection Laws for the company that Processes Personal Data on behalf of the Controller.
### 11.13 Report
“Report” means audit reports prepared by another company according to the standards defined in the Security Policy on behalf of Provider.
### 11.14 Restricted Transfer
“Restricted Transfer” means (a) where the GDPR applies, a transfer of personal data from the EEA to a country outside of the EEA which is not subject to an adequacy determination by the European Commission; and (b) where the UK GDPR applies, a transfer of personal data from the United Kingdom to any other country which is not subject to adequacy regulations adopted pursuant to Section 17A of the United Kingdom Data Protection Act 2018.
### 11.15 Security Incident
“Security Incident” means a Personal Data Breach as defined in Article 4 of the GDPR.
### 11.16 Service
“Service” means the product and/or services described in the Agreement.
### 11.17 Special Category Data
"Special Category Data” will have the meaning given in Article 9 of the GDPR.
### 11.18 Subprocessor
“Subprocessor” will have the meaning(s) given in the Applicable Data Protection Laws for a company that, with the approval and acceptance of Controller, assists the Processor in Processing Personal Data on behalf of the Controller.
### 11.19 UK GDPR
“UK GDPR” means European Union Regulation 2016/679 as implemented by section 3 of the United Kingdom’s European Union (Withdrawal) Act of 2018 in the United Kingdom.
### 11.20 UK Addendum
“UK Addendum” means the international data transfer addendum to the EEA SCCs issued by the Information Commissioner for Parties making Restricted Transfers under S119A(1) Data Protection Act 2018.
# Self-host Deepseek R1
Source: https://www.thundercompute.com/docs/guides/deepseek-r1-running-locally-on-thunder-compute
Self-host Deepseek R1 on Thunder Compute cloud GPUs. Local model deployment and configure hardware for optimized inference performance.
# Easily Run DeepSeek R1 on Thunder Compute
Looking for the **cheapest way to run DeepSeek R1** or just want to **try DeepSeek R1** without buying hardware? Thunder Compute lets you spin up pay‑per‑minute A100 GPUs so you only pay for the time you use. Follow the steps below to get the model running in minutes.
> **Quick reminder:** Make sure your Thunder Compute account is set up. If not, start with our [Quickstart Guide](/vscode/quickstart).
If you prefer video instructions, watch this overview:
## Step 1: Create a Cost‑Effective GPU Instance
Open your CLI and launch an 80 GB A100 GPU (perfect for the 70B variant):
```bash theme={null}
tnr create --gpu "a100xl" --template "ollama"
```
For details on instance templates, see our [templates guide](/guides/using-instance-templates).
## Step 2: Check Status and Connect
Verify the instance is running:
```bash theme={null}
tnr status
```
Connect with its ID:
```bash theme={null}
tnr connect
```
## Step 3: Start the Ollama Server
Inside the instance, start Ollama:
```bash theme={null}
start-ollama
```
If you run into issues, check our [troubleshooting guide](/troubleshooting).
Wait about 30 seconds for the web UI to load.
## Step 4: Access the Web UI and Load DeepSeek R1
1. Visit `http://localhost:8080` in your browser.
2. Choose **DeepSeek R1** from the dropdown. On an 80 GB A100, pick the **70B** variant for peak performance.
## Step 5: Run DeepSeek R1
Type a prompt in the web interface. For example:
> *"If the concepts of rCUDA were applied at scale, overcoming latency, what would it mean for the cost of GPUs on cloud providers?"*
The model will think through the answer and respond. A full reply can take up to 200 seconds.
## Conclusion
That's the **cheapest way to run DeepSeek R1** and a quick way to **try DeepSeek R1** on Thunder Compute. Explore more guides:
* [Using Docker on Thunder Compute](/guides/using-docker-on-thundercompute)
* [Using Instance Templates](/guides/using-instance-templates)
* [Running Jupyter notebooks](/guides/running-jupyter-notebooks-on-thunder-compute)
Happy building!
# Ephemeral Storage
Source: https://www.thundercompute.com/docs/guides/ephemeral-storage
Fast, temporary local storage for model weights, caches, and scratch files. Mounted at /ephemeral.
## What is Ephemeral Storage?
Ephemeral storage is fast, local disk space mounted at `/ephemeral` on your instance. It uses high-performance NVMe drives directly attached to the host machine, making it significantly faster than the persistent disk for I/O-heavy workloads.
Ephemeral storage is temporary. Data on `/ephemeral` is lost when you modify, delete, or migrate your instance. It is also not included in snapshots.
## When to Use Ephemeral Storage
Ephemeral storage is ideal for data that is large, frequently accessed, and easy to re-download:
* **Model weights** downloaded from Hugging Face or other registries
* **Pip/conda caches** to speed up environment rebuilds
* **Training checkpoints** (back up important ones to persistent disk or cloud storage)
* **Large datasets** that can be re-fetched
* **Scratch files** from preprocessing or intermediate computation
For data you need to keep, use the persistent disk (your home directory) or [snapshots](/cli/operations/snapshots).
## Configuring Ephemeral Storage
Ephemeral storage defaults to **0 GB** (disabled). You can add it when creating or modifying an instance.
```bash theme={null}
# Create with ephemeral storage
tnr create --ephemeral-disk 200
# Add to an existing instance
tnr modify --ephemeral-disk 200
# Disable ephemeral storage
tnr modify --ephemeral-disk 0
```
Set the **Ephemeral Storage** field in the create or modify instance dialog.
Set the **Ephemeral Storage** field in the create or modify instance dialog.
### Size Limits
See [thundercompute.com/pricing](https://www.thundercompute.com/pricing) for current ephemeral storage limits by instance mode.
## Using Ephemeral Storage
Once configured, the storage is available at `/ephemeral` inside your instance:
```bash theme={null}
# Check available space
df -h /ephemeral
# Download model weights to ephemeral storage
huggingface-cli download meta-llama/Llama-3-8B --local-dir /ephemeral/llama-3-8b
# Use as pip cache
pip install --cache-dir /ephemeral/pip-cache transformers
```
## What Happens to Ephemeral Data
| Event | Ephemeral data | Persistent disk |
| --------------------- | -------------- | --------------- |
| Instance running | Preserved | Preserved |
| Modify instance | Lost | Preserved |
| Delete instance | Lost | Lost |
| Create snapshot | Not included | Included |
| Restore from snapshot | Empty | Restored |
## Best Practices
1. **Store only re-downloadable data** on `/ephemeral`. Anything important should live on your persistent disk or be backed up to cloud storage.
2. **Use symlinks** to redirect cache directories to ephemeral storage:
```bash theme={null}
mkdir -p /ephemeral/huggingface
ln -s /ephemeral/huggingface ~/.cache/huggingface
```
3. **Set environment variables** to point tools at ephemeral storage:
```bash theme={null}
export HF_HOME=/ephemeral/huggingface
export PIP_CACHE_DIR=/ephemeral/pip-cache
export TRANSFORMERS_CACHE=/ephemeral/huggingface/transformers
```
4. **Back up training checkpoints** periodically from `/ephemeral` to your home directory or cloud storage if you need to keep them.
# Run GPT‑OSS 120B on Thunder Compute
Source: https://www.thundercompute.com/docs/guides/gpt-oss-running-locally-on-thunder-compute
Deploy GPT-OSS 120B on Thunder Compute hardware. Initialize the large language model and configure the local environment for high-performance use.
# Run GPT‑OSS 120B on Thunder Compute
Looking for the **cheapest way to self‑host GPT‑OSS 120B** or just want to **try it out** without buying hardware? Thunder Compute lets you spin up pay‑per‑minute NVIDIA A100 GPUs, so you only pay for what you use. Follow the steps below to get the model running in minutes.
> **Prerequisite:** Ensure your Thunder Compute account is ready. If not, start with our [Quickstart Guide](/vscode/quickstart).
## Step 1 — Create a Cost‑Effective Prototyping‑Mode GPU Instance
Launch an 80 GB A100 instance (large enough to host the full 120 B model):
```bash theme={null}
tnr create \
--gpu a100xl \
--vcpus 4 \
--mode prototyping \
--persistent-disk 200 \
--template "ollama"
```
This command starts a lower‑cost [prototyping‑mode](/prototyping-vs-production#prototyping-mode) instance with:
* **GPU:** A100 80 GB
* **vCPUs:** 4
* **Storage:** 200 GB (from the *Ollama* template)
> The GPU, vCPU Count, and Mode ([Prototyping](/prototyping-vs-production#prototyping-mode) / [Production](/prototyping-vs-production#production-mode)), can be changed later if your requirements change, and the amount of storage can be increased if needed.
For details on templates, see the [Instance Templates guide](/guides/using-instance-templates).
## Step 2 — Check Status and Connect
Verify that the instance is running, it can take a minute to spin up:
```bash theme={null}
tnr status
```
Connect to the instance:
```bash theme={null}
tnr connect
```
## Step 3 — Start Ollama and Download the Model
Inside the instance, start Ollama (this also launches OpenWebUI and a Cloudflare tunnel):
```bash theme={null}
start-ollama
```
While the UI is initializing, download the model, here we are downloading the 120B variant of GPT‑OSS, but any models can be downloaded from the [Ollama Model Library](https://ollama.com/library):
```bash theme={null}
ollama pull gpt-oss:120b
```
> **Tip:** If you encounter issues, consult the [troubleshooting guide](/troubleshooting).
Give the UI about 60 seconds to finish loading.
## Step 4 — Access the Web UI and Select the Model
1. Open `http://localhost:8080` in your browser.
2. Choose **gpt-oss:120b** from the model dropdown.
## Step 5 — Run GPT‑OSS 120B
Enter a prompt in the web interface, for example:
> *“Tell a tale of a seaman who found the treasure of the clouds by following the sound of thunder.”*
## Conclusion
That's it—the **cheapest way to run GPT‑OSS 120B** on Thunder Compute. For more, check out:
* [Using Docker on Thunder Compute](/guides/using-docker-on-thundercompute)
* [Using Instance Templates](/guides/using-instance-templates)
* [Running Jupyter Notebooks](/guides/running-jupyter-notebooks-on-thunder-compute)
Happy building!
# MCP Server
Source: https://www.thundercompute.com/docs/guides/mcp-server
Use Thunder Compute with AI coding agents like Claude Code, Cursor, Windsurf, and Codex via the Model Context Protocol (MCP).
Thunder Compute provides an MCP (Model Context Protocol) server that lets AI coding agents manage GPU instances on your behalf. Create, monitor, modify, and tear down instances without leaving your agent workflow.
## Prerequisites
1. A Thunder Compute account
2. An AI agent that supports remote MCP servers (Claude Code, Cursor, Codex, etc.)
No local installation or API tokens required — authentication is handled via OAuth in your browser.
## Setup
Run this in your terminal:
```bash theme={null}
claude mcp add --transport http thunder-compute https://www.thundercompute.com/mcp
```
Then start Claude Code and run `/mcp` to authenticate. A browser window will open for you to log in and authorize access.
Alternatively, add to `~/.claude.json` (global) or `.claude.json` in your project root:
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"url": "https://www.thundercompute.com/mcp"
}
}
}
```
Run this in your terminal:
```bash theme={null}
codex mcp add thunder-compute --url https://www.thundercompute.com/mcp
```
Codex will prompt you to authenticate via OAuth when you first use a Thunder Compute tool.
Add to `.cursor/mcp.json` in your project root (or `~/.cursor/mcp.json` for global access):
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"type": "http",
"url": "https://www.thundercompute.com/mcp"
}
}
}
```
Add to your MCP configuration:
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"serverUrl": "https://www.thundercompute.com/mcp",
"headers": {
"Content-Type": "application/json"
}
}
}
}
```
Run the interactive setup:
```bash theme={null}
opencode mcp add
```
When prompted:
* **Server name:** `thunder-compute`
* **Server type:** `Remote`
* **URL:** `https://www.thundercompute.com/mcp`
* **Requires OAuth:** `Yes`
* **Pre-registered client ID:** `No`
```bash theme={null}
opencode mcp auth thunder-compute
```
A browser window will open for you to log in and authorize access.
```bash theme={null}
opencode
```
The Thunder Compute tools are now available in your session.
Alternatively, add to `~/.config/opencode/opencode.json`:
```json theme={null}
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"thunder-compute": {
"type": "remote",
"url": "https://www.thundercompute.com/mcp",
"oauth": {}
}
}
}
```
Then run `opencode mcp auth thunder-compute` to authenticate.
If you use an MCP client that supports [Smithery](https://smithery.ai), you can install directly:
```bash theme={null}
npx @smithery/cli install @thunder-compute/thunder-compute
```
Or browse the [Thunder Compute listing on Smithery](https://smithery.ai/server/@thunder-compute/thunder-compute) and click **Install** for your client.
For custom integrations, the MCP server uses [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http) at a single endpoint. Authentication is via OAuth 2.0 with standard MCP discovery.
**Endpoint:** `https://www.thundercompute.com/mcp`
```bash theme={null}
curl -X POST https://www.thundercompute.com/mcp \
-H "Authorization: Bearer " \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-agent", "version": "1.0.0" }
},
"id": 1
}'
```
## Authentication
No API tokens or environment variables needed. When you first connect, a browser window opens for you to log in with your Thunder Compute account and authorize access. Tokens refresh automatically, so you only authenticate once per session.
## Available Tools
### Instance Management
| Tool | Description |
| ----------------- | -------------------------------------------------------------------------------------- |
| `list_instances` | List all GPU instances with status, IP, and configuration |
| `create_instance` | Create a new GPU instance (specify GPU type, template, mode, etc.) |
| `delete_instance` | Delete an instance (irreversible) |
| `modify_instance` | Change instance config (GPU type, vCPUs, disk, mode) |
| `run_command` | Execute a shell command on a running instance and return stdout, stderr, and exit code |
### Information
| Tool | Description |
| ------------------ | ------------------------------------------------------------ |
| `get_specs` | Get available GPU specs (VRAM, vCPU options, storage ranges) |
| `get_availability` | Get current GPU availability status for each spec |
| `get_pricing` | Get current per-hour GPU pricing |
| `list_templates` | List available OS templates (Ubuntu, PyTorch, etc.) |
### Snapshots
| Tool | Description |
| ----------------- | -------------------------------- |
| `list_snapshots` | List all instance snapshots |
| `create_snapshot` | Create a snapshot of an instance |
| `delete_snapshot` | Delete a snapshot (irreversible) |
### SSH Keys
| Tool | Description |
| ------------------------- | -------------------------------------------------------------- |
| `list_ssh_keys` | List SSH keys in your organization |
| `create_ssh_key` | Add an SSH public key to your organization |
| `delete_ssh_key` | Delete an SSH key |
| `add_ssh_key_to_instance` | Add an SSH public key to a running instance's authorized\_keys |
### Port Forwarding
| Tool | Description |
| -------------- | --------------------------------------------- |
| `list_ports` | List all instances with their forwarded ports |
| `forward_port` | Forward HTTP ports on an instance |
| `delete_port` | Remove forwarded ports from an instance |
### Connectivity
| Tool | Description |
| ----------------- | ----------------------------------------------------- |
| `get_ssh_command` | Get the SSH command to connect to an instance |
| `get_scp_command` | Get the SCP command to copy files to/from an instance |
### Billing & Usage
| Tool | Description |
| ---------------------- | --------------------------------------------------------------------------- |
| `get_meter_data` | Get GPU usage metrics for a time period (hourly, daily, weekly, or monthly) |
| `get_upcoming_invoice` | Get estimated charges for the current billing period |
| `get_invoice_history` | Get historical invoices for your organization |
| `get_subscription` | Get subscription details including plan, status, and payment info |
### API Tokens
| Tool | Description |
| -------------- | ----------------------------------------------- |
| `list_tokens` | List all named API tokens for your organization |
| `create_token` | Create a new named API token |
| `delete_token` | Delete a named API token |
## Prompts
The MCP server includes built-in prompts that guide your agent through common multi-step workflows:
| Prompt | Description |
| ----------------------- | ------------------------------------------------------------------- |
| `create-dev-instance` | Set up a GPU development instance with sensible defaults |
| `deploy-model` | Deploy an ML model (supports Ollama, vLLM, and Transformers) |
| `check-costs` | Review current GPU usage and costs |
| `snapshot-and-teardown` | Save instance state and clean up |
| `setup-comfyui` | Spin up a GPU instance with ComfyUI for AI image generation |
| `setup-jupyter` | Launch a Jupyter Lab environment on a GPU instance |
| `fine-tune-model` | Set up a GPU instance for fine-tuning with LoRA or full fine-tuning |
| `benchmark-gpu` | Run a quick GPU benchmark on an instance to verify performance |
## Example Usage
Once configured, you can ask your AI agent things like:
* "Spin up an A100 instance with PyTorch"
* "What GPU types are available and how much do they cost?"
* "Which GPUs are available right now?"
* "List my running instances"
* "Run `nvidia-smi` on my instance"
* "Delete instance inst-abc123"
* "Forward port 8080 on my instance"
* "Create a snapshot of my instance before I make changes"
* "Deploy Llama 3 on a GPU"
* "How much have I spent this month?"
* "Show my invoice history"
* "Create an API token for my CI pipeline"
## Troubleshooting
**Authentication fails or browser doesn't open**: Run `/mcp` in Claude Code to manually trigger authentication. Make sure you're logged in to your Thunder Compute account in the browser.
**"Protected resource does not match" error**: The URL in your MCP config must match the server's configured resource URL exactly. Ensure you're using `https://www.thundercompute.com/mcp`.
**"token has invalid issuer" error**: This is a server-side configuration issue. The MCP authentication client must be configured with the correct Stytch Connected Apps domain.
**Tools not appearing**: Restart your AI agent after changing MCP configuration. Most agents only read MCP config on startup.
## MCP Directories
Thunder Compute is listed on major MCP directories for easy discovery:
* [**Smithery**](https://smithery.ai/server/@thunder-compute/thunder-compute) — One-click install for supported clients
* [**MCP Registry**](https://registry.modelcontextprotocol.io) — The official Model Context Protocol server registry
* [**Glama**](https://glama.ai) — Auto-indexed from the MCP Registry
* [**PulseMCP**](https://pulsemcp.com) — Auto-indexed from the MCP Registry
If your MCP client supports browsing directories, search for "Thunder Compute" to find and install the server directly.
# Thunder Compute Referral Program
Source: https://www.thundercompute.com/docs/guides/referral-program
Earn credits by referring friends to Thunder Compute. Get 3% of every dollar your referrals spend on GPU instances with our lifetime rewards program.
**Refer a friend, earn credit.** Share your unique referral link and receive credits every time someone you refer spends on Thunder Compute GPUs.
This program is currently in beta. Terms may evolve as we improve the program based on user feedback.
## How It Works
Our referral program rewards you with **3% of every dollar** your referrals spend on GPU instances. Here's what you need to know:
* **Reward Rate:** 3% of all spending by referred users
* **Duration:** Lifetime rewards for each referred customer
* **Credits:** Paid out in Thunder Compute credits (non-transferable)
* **Tracking:** Credits apply to paid, consumed compute resources. These typically post within minutes of a finalized invoice for consumed compute.
We created this program as a way to give back to our community. Rather than paying advertisers, we want to reward you for your contribution to Thunder Compute.
By referring even a medium-size startup you can often receive thousands of dollars of free compute.
## Getting Started
### 1. Find Your Referral Link
1. Sign in to the [Thunder Compute Console](https://console.thundercompute.com/)
2. Navigate to **Referrals** in the sidebar
3. Copy your unique referral link
4. Share it anywhere—social media, tutorials, blog posts, or direct messages
### 2. Share and Earn
Once someone creates a new account using your link and starts using GPU instances, you'll automatically earn 3% of their payments as credits.
## Eligibility Requirements
### For Referrers
* Active Thunder Compute account in good standing
* No restrictions on sharing methods or platforms
### For Referrals
* Must create a **new account** via your referral link
* Existing accounts that sign up through referral links are not eligible
* Self-referrals and duplicate accounts are prohibited
Credits are non-transferable and cannot be converted to cash. They can only be used for Thunder Compute services.
## Program Rules
### Fair Use Policy
We maintain strict anti-fraud measures to ensure program integrity:
* Creating fake accounts is prohibited
* Self-referrals will result in credit removal
* Violating Thunder Compute's Terms & Conditions may lead to account suspension
* All referral activity is monitored for suspicious patterns
### Program Changes
Thunder Compute reserves the right to:
* Modify reward rates or eligibility requirements
* Update program terms with advance notice
* Discontinue the program if necessary
We'll announce any changes through email notifications and documentation updates.
## Frequently Asked Questions
**Q: When do I receive my referral credits?**
A: Credits are typically added to your account within minutes of your referral's successful invoice.
**Q: Is there a limit to how much I can earn?**
A: No, there's no cap on referral earnings. The more successful referrals you make, the more you earn.
**Q: Can I refer existing Thunder Compute users?**
A: No, only new users who create accounts through your referral link are eligible.
**Q: What counts as a qualifying payment?**
A: Only direct card payments for GPU instances qualify for referral rewards. Usage on free or referral credits do not qualify.
## Need Help?
Have questions about referral eligibility, credit posting, or the program in general? Contact our support team:
* **Discord:** Join our [community server](https://discord.com/invite/nwuETS9jJK)
Thank you for giving back to the Thunder Compute community!
# Jupyter Notebooks
Source: https://www.thundercompute.com/docs/guides/running-jupyter-notebooks-on-thunder-compute
Execute Jupyter Notebooks on Thunder Compute cloud GPUs. Configure remote kernels and process intensive data workloads in a notebook environment.
## Prerequisites for a Jupyter Notebook with Cloud GPU
* A supported editor installed: VSCode, Cursor, or Windsurf
* The Thunder Compute extension installed in that editor
* The Jupyter extension installed in that editor
## Steps to Launch Your Notebook
### 1. Connect to a Thunder Compute cloud GPU in VSCode
Follow the instructions in our [quickstart](/vscode/quickstart) guide to set and connect to a remote instance in VSCode.
### 2. Install the Jupyter extension in your cloud workspace
Open the Extensions panel and install the Jupyter extension inside your Thunder Compute instance.
### 3. Verify GPU availability inside the notebook
Create a Jupyter Notebook, which is now connected to a Thunder Compute instance with GPU capabilities. To confirm that the GPU is accessible, run the following in a notebook cell:
```
import torch
print(torch.cuda.is_available())
```
If everything is set up correctly, the output should be:
```
True
```
You now have a Jupyter Notebook running on a Thunder Compute cloud GPU, a fast and low-cost alternative to Colab for indie developers, researchers, and data scientists.
# Speeding Up Snapshots
Source: https://www.thundercompute.com/docs/guides/speeding-up-snapshots
Accelerate snapshot creation and restoration on Thunder Compute. technical optimizations to reduce backup latency and improve data speed.
The size of your instance's disk directly affects how long snapshots take to create and restore.
This guide focuses on simple, high-impact steps to reduce snapshot size and speed up restores. We’ll expand this guide as more snapshot features ship.
## Quick Wins
1. **Keep your instance disk lean**: Remove large, transient files before snapshotting.
2. **Exclude non-essential data**: Use `.thunderignore` to skip caches, build outputs, and generated assets.
## .thunderignore Files for Exclusion
Often, you may want to exclude certain heavy files, cache directories, or generated files from a snapshot. You can do this using a `.thunderignore` file. This will help speed up snapshot creation and restoration.
1. Create a `.thunderignore` file in the `/` directory of your instance.
2. Add all paths you would like to ignore (absolute paths or relative to `/`). Patterns are supported - the syntax for these is the same as [`filepath.Match`](https://pkg.go.dev/path/filepath#Match) in Go. Patterns are matched against paths, not just basenames, so use `/` to anchor from the root (for example, `/data/*.parquet`). `*` and `?` are supported; `**` is not special and is treated literally. Blank lines are ignored, and lines starting with `#` are treated as comments.
3. Create your snapshot. The `.thunderignore` file is included in the snapshot so your exclusions persist on restore.
Start by excluding caches, build outputs, and temporary files. You’ll usually see the biggest size reductions there.
Make sure you don’t exclude anything required to run your workloads after restore, such as model weights or datasets you actually need.
Example `.thunderignore`:
```
# Caches and build artifacts
.cache/*
*.tmp
# Large data
/data/*.parquet
/models/*.pt
# Common language build outputs
/node_modules/*
/dist/*
/target/*
```
# Stopping Instances
Source: https://www.thundercompute.com/docs/guides/stopping-instances
Manage instance states to optimize billing on Thunder Compute. Learn how to pause and resume compute resources using snapshots.
## The Workflow
Thunder Compute does not have a native "Stop" feature for instances. Fortunately, you can achieve the same result by using snapshots.
To "stop" an instance, follow these three steps:
1. **Create a snapshot:** This saves the current state of the running instance.
2. **Delete the instance:** Once snapshot creation is underway, you can safely delete the running instance.
3. **Restore the snapshot:** Create a new instance by using your saved Snapshot as the template.
### 1. Create a Snapshot
First, capture the current state of your running instance. You can trigger this through any of our supported interfaces:
**Guides:** [VS Code](https://www.thundercompute.com/docs/vscode/operations/snapshots#create-a-snapshot) | [CLI](https://www.thundercompute.com/docs/cli/operations/snapshots#create-a-snapshot) | [Console](https://www.thundercompute.com/docs/console/operations/snapshots#create-a-snapshot)
### 2. Delete the Running Instance
Once the snapshot is initiated, delete the instance.
**Guides:** [VS Code](/vscode/operations/deleting-instances#delete-an-instance) | [CLI](/cli/operations/deleting-instances#delete-an-instance) | [Console](/console/operations/deleting-instances#delete-an-instance)
You can delete your instance immediately after triggering the snapshot.
### 3. Restore from Snapshot
When you are ready to resume, create a new instance using your snapshot as the template.
**Guides:** [VS Code](/vscode/operations/snapshots#restore-from-a-snapshot) | [CLI](/cli/operations/snapshots#restore-from-a-snapshot) | [Console](/console/operations/snapshots#restore-from-a-snapshot)
***
## Important Notes
The time required to create and restore snapshots varies based on the size of the snapshot.
* **Cost Efficiency:** You only pay for the snapshot storage while the instance is deleted; significantly cheaper than keeping an instance running.
# Using Docker
Source: https://www.thundercompute.com/docs/guides/using-docker-on-thundercompute
Containerize applications using Docker on Thunder Compute. Manage images, deploy containers, and optimize Docker environments on cloud GPU instances.
## Disclaimer: Docker support is experimental
Docker has experimental support inside Thunder Compute instances. Because Thunder Compute instances
are themselves containers, running Docker on Thunder Compute is like running Docker inside of
Docker. To get this to work, our instances come with a modified version of `dockerd`, and there are
certain situations when it might not work exactly like the official Docker (eg, advanced
networking features).
# Running Docker
Start your container with the `--device nvidia.com/gpu=all` flag in order to expose GPUs. For example: `docker run -it --rm --device nvidia.com/gpu=all ubuntu:latest`.
Some tutorials will tell you to use `--runtime=nvidia` or `--gpus=all`. These are outdated options and are not supported in Thunder Compute. `--device nvidia.com/gpu=all` is the only supported way to add a GPU to a docker container.
# Known issues
* Docker Compose does not work.
* The container network is not isolated. This means that even ports you don't list with `-p` will be available, and could potentially conflict with other processes or containers.
* Sometimes when the container is destroyed, the processes in it will not be properly killed. This can cause e.g. port conflicts if you then try to start the same container again. You can use standard tools like `ps aux` and `kill` to find and stop any remaining container processes.
If you run into issues, please [contact us](https://www.thundercompute.com/contact).
# Use Instance Templates for AI
Source: https://www.thundercompute.com/docs/guides/using-instance-templates
Quickly deploy LLMs (Ollama) and AI image generators (ComfyUI) on Thunder Compute using pre-configured instance templates. Get started fast.
Thunder Compute gives indie developers, researchers and data scientists instant access to **affordable cloud GPUs**. Our pre-configured **instance templates** set up popular AI stacks automatically, so you can **run LLMs** or **generate AI images** in minutes.
## AI Templates on Cheap Cloud GPUs
We currently offer:
* **Ollama** – launches an Ollama server for open-source large language models
* **ComfyUI** – installs ComfyUI for fast AI-image generation workflows
## Deploy a Template
1. **Create an instance**
```bash theme={null}
# Launch an Ollama instance
tnr create --template ollama
# Launch ComfyUI
tnr create --template comfy-ui
```
2. **Check the instance is ready**
```bash theme={null}
tnr status
```
Wait until the status shows the instance is running before connecting.
3. **Connect to the instance**
```bash theme={null}
tnr connect 0 # replace 0 with your instance ID
```
Port forwarding is handled automatically when you connect. The `-t` flag is unnecessary.
4. **Start the service**
```bash theme={null}
# Ollama
start-ollama
# ComfyUI
start-comfyui
```
Required ports forward to your local machine automatically.
## Template Details
### Ollama Template
* Forwards port **11434**
* Access the API at `http://localhost:11434`
* Ready for popular Ollama models
### ComfyUI Template
* Forwards port **8188**
* Mounts the `ComfyUI` directory to your Mac or Linux host
* UI at `http://localhost:8188`
* Includes common nodes and extensions
## Need Help?
Encounter problems or have questions? Reach out to our support team any time.
# Weights & Biases
Source: https://www.thundercompute.com/docs/guides/weights-and-biases
Track, debug, and optimize GPU-heavy workloads on Thunder Compute instances using Weights & Biases (wandb).
Weights & Biases (wandb) is an experiment tracking and model management platform that’s particularly useful when training large models on Cloud GPUs. It helps you:
* Track training runs, hyperparameters, and metrics
* Monitor GPU/CPU utilization in real time
* Version datasets and model checkpoints
* Run large-scale hyperparameter sweeps across many GPU instances
On Thunder Compute, wandb helps you monitor GPU utilization, identify bottlenecks, and track training metrics.
***
## Prerequisites
* A Thunder Compute GPU instance created and connected
* Python environment set up on your instance
* A Weights & Biases account ([https://wandb.ai/site](https://wandb.ai/site))
***
## Installation
Install wandb on your Thunder Compute instance:
```bash theme={null}
pip install wandb
```
Or add to a `requirements.txt`:
```bash theme={null}
echo "wandb" >> requirements.txt
pip install -r requirements.txt
```
***
## Authentication
Authenticate with:
```bash theme={null}
wandb login
```
Or via environment variable:
```bash theme={null}
export WANDB_API_KEY="your_api_key"
wandb login --relogin
```
You will see the following below:
```
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
```
Enter your API key which can be found on the homepage of wandb.ai after you create an account, once entered you will see:
```
wandb: No netrc file found, creating one.
wandb: Appending key for api.wandb.ai to your netrc file: /home/ubuntu/.netrc
wandb: Currently logged in as: username (entity-name) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
```
For shared or production Thunder instances, environment variables or secret
managers are preferred over pasting API keys directly.
***
## Getting Started
Follow these steps to run your first wandb experiment on your Thunder Compute instance.
### Step 1 — Create a Training File
Create a new Python file on your instance:
```bash theme={null}
nano train.py
```
Or create a new file within your IDE connected over SSH.
### Step 2 — Paste Minimal Working Example
Copy this minimal example into your `train.py` file:
```python theme={null}
import wandb
import time
# Initialize wandb
wandb.init(
project="thunder-resnet",
name="quick-test",
config={
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 5,
},
)
# Simple training loop simulation
for epoch in range(5):
# Simulate training metrics
train_loss = 1.0 / (epoch + 1)
train_acc = 0.5 + epoch * 0.1
# Log metrics to wandb
wandb.log({
"epoch": epoch,
"train/loss": train_loss,
"train/accuracy": train_acc,
})
time.sleep(0.5) # Simulate work
wandb.finish()
```
### Step 3 — Run the Script
Execute your training script:
```bash theme={null}
python train.py
```
### Step 4 — Expected Output
You should see output similar to:
```
wandb: Currently logged in as: your-username (entity-name) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.23.0
wandb: Run data is saved locally in /home/ubuntu/wandb/run-20251120_135726-abcd
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run quick-test
wandb: ⭐️ View project at https://wandb.ai/entity-name/thunder-resnet
wandb: 🚀 View run at https://wandb.ai/entity-name/thunder-resnet/runs/abcd
wandb:
wandb: Run history:
wandb: epoch ▁▃▅▆█
wandb: train/accuracy ▁▃▅▆█
wandb: train/loss █▄▂▁▁
wandb:
wandb: Run summary:
wandb: epoch 4
wandb: train/accuracy 0.9
wandb: train/loss 0.2
wandb:
wandb: 🚀 View run quick-test at: https://wandb.ai/entity-name/thunder-resnet/runs/abcd
wandb: ⭐️ View project at: https://wandb.ai/entity-name/thunder-resnet
wandb: Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20251120_135726-abcd/logs
```
### Step 5 — View Your Results
1. **View your dashboard**: Click the link in the output or visit [https://wandb.ai](https://wandb.ai) and navigate to your project
2. **View in Table view**: Go to **(Project Name)** > **Projects** > **thunder-resnet** > **Table** to see all your runs in a tabular format
3. **Compare runs**: Run the script multiple times with different configurations to compare results
4. **Add artifacts**: See the [Model Checkpointing with Weights & Biases Artifacts](#model-checkpointing-with-weights--biases-artifacts) section to version checkpoints and datasets
5. **Scale to multi-GPU**: Check out [Distributed Training](#distributed-training-ddp-lightning-deepspeed) for multi-GPU setups
6. **Run sweeps**: Use [Hyperparameter Sweeps](#hyperparameter-sweeps-multi‑gpu-multi‑instance) for automated hyperparameter search
***
## Viewing Results
1. Visit [https://wandb.ai/site](https://wandb.ai/site)
2. Select your project
3. Explore:
* Metrics charts
* GPU utilization
* Model checkpoints
* Dataset artifacts
* Sweep dashboards
***
## Core Concepts for Cloud GPU Workloads
When using remote GPUs, these wandb features matter most:
1. **Run tracking** — metrics, hyperparameters, logs
2. **GPU/system monitoring** — GPU utilization, power, memory, CPU load
3. **Artifacts** — versioned checkpoints and datasets
4. **Sweeps** — distributed hyperparameter search
5. **Groups & jobs** — organize multi-GPU/distributed training
***
## Basic Usage
### Initialize a Run
```python theme={null}
import wandb
wandb.init(
project="my-thunder-project",
name="baseline-resnet50",
config={
"learning_rate": 3e-4,
"batch_size": 64,
"epochs": 20,
"optimizer": "adamw",
"precision": "fp16",
},
)
```
### Log Metrics
```python theme={null}
wandb.log({
"train/loss": loss,
"train/accuracy": acc,
"step": step,
})
```
### Best Logging Practices
* Log every **N steps** (e.g., 10–50) to minimize overhead
* Avoid logging huge tensors every step
* Use artifacts for large files
***
## GPU & System Monitoring
Wandb automatically collects:
* GPU utilization
* GPU memory usage
* GPU temperature and power
* CPU usage
* RAM usage
* Disk and network I/O
Use these graphs to diagnose:
* **GPU-bound** workloads
* **Data-bound** workloads
* **Bottlenecks** due to I/O or preprocessing
* **Too-small batch sizes**
### Improving GPU Utilization
* Increase batch size until GPU memory is near capacity
* Use **mixed precision** (`torch.cuda.amp`)
* Increase dataloader workers
* Preload/augment data on the GPU
* Reduce unnecessary synchronizations
***
## Model Checkpointing with Weights & Biases Artifacts
When you train on Thunder Compute GPU instances, it's important that your model checkpoints are **not** tied to a single machine. Weights & Biases Artifacts provide a simple way to:
* Persist checkpoints even if the instance is deleted
* Move checkpoints between different Thunder instances (or GPU types)
* Share models with your team
* Reproduce and resume long-running training jobs
This section provides a walkthrough of how to do checkpointing with wandb.
***
### Why use Artifacts for checkpoints?
Saving checkpoints only to the local filesystem is risky:
* Thunder instances may be stopped or recreated
* You may want to resume training on a *different* GPU (A100 → H100)
* Your team may need to reuse your model
* You may want versioned, reproducible training history
Artifacts solve this by storing checkpoints in W\&B's managed, versioned storage.
***
### Step 1 — Save a checkpoint locally during training
Inside your real training loop, periodically save a checkpoint.\
For real projects (PyTorch):
```python theme={null}
import torch
# ... inside your training loop ...
if (epoch + 1) % 5 == 0:
ckpt_path = f"checkpoints/model_epoch_{epoch+1}.pt"
torch.save(model.state_dict(), ckpt_path)
```
> It is best practice to save checkpoints inside a dedicated `checkpoints/` folder.
***
### Step 2 — Log the checkpoint as a W\&B Artifact
Right after saving your file:
```python theme={null}
import wandb
artifact = wandb.Artifact(
name=f"resnet50-epoch-{epoch+1}",
type="model",
metadata={
"epoch": epoch + 1,
"val_loss": float(val_loss),
"val_accuracy": float(val_acc),
},
)
artifact.add_file(ckpt_path)
wandb.log_artifact(artifact)
```
This uploads your checkpoint to W\&B and keeps a permanent copy.
***
### Step 3 — View & manage checkpoints in the W\&B UI
1. Go to your wandb project
2. Open the **Artifacts** tab
3. Click your model artifact
4. You can now:
* View version history (v0, v1, v2…)
* Open the metrics/metadata
* Download the checkpoint
* Use it as an input for new runs
***
### Step 4 — Restore a checkpoint on another Thunder instance
On a fresh machine:
```python theme={null}
import wandb
import torch
run = wandb.init(project="my-thunder-project", job_type="restore")
artifact = run.use_artifact(
"wato/my-thunder-project/resnet50-epoch-10:latest",
type="model",
)
artifact_dir = artifact.download()
checkpoint = torch.load(f"{artifact_dir}/model_epoch_10.pt", map_location="cuda")
model.load_state_dict(checkpoint)
model.to("cuda")
```
You now have the exact model weights from your previous run — even if the original instance is gone.
***
### Step 5 — Resume training
```python theme={null}
model.load_state_dict(checkpoint)
model.to("cuda")
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
start_epoch = 10
for epoch in range(start_epoch, config.epochs):
train_one_epoch(...)
validate(...)
wandb.log({"epoch": epoch})
```
***
### Example: Adding Checkpointing to a Minimal `train.py`
Here is a working example using the simple training script from the Getting Started section.
This example simulates a checkpoint file (JSON), but the workflow is identical for real model weights.
```python theme={null}
import wandb
import time
import json
import os
# Initialize wandb
wandb.init(
project="thunder-resnet",
name="quick-test",
config={
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 5,
},
)
os.makedirs("checkpoints", exist_ok=True)
for epoch in range(5):
# Simulate training metrics
train_loss = 1.0 / (epoch + 1)
train_acc = 0.5 + epoch * 0.1
# Log metrics to wandb
wandb.log({
"epoch": epoch,
"train/loss": train_loss,
"train/accuracy": train_acc,
})
# ---- Checkpointing Example ----
# In a real project this would be torch.save(model.state_dict(), ...)
checkpoint_path = f"checkpoints/epoch_{epoch}.json"
with open(checkpoint_path, "w") as f:
json.dump({
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc,
}, f)
# Log checkpoint as an artifact
artifact = wandb.Artifact(
name=f"quick-test-epoch-{epoch}",
type="model",
metadata={
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc
},
)
artifact.add_file(checkpoint_path)
wandb.log_artifact(artifact)
# --------------------------------
time.sleep(0.5)
wandb.finish()
```
This example demonstrates:
* how checkpoint files are created
* how they are logged as Artifacts
* how each epoch becomes a tracked, versioned checkpoint
These appear in the **Artifacts** tab of your project.
***
### Quick Reference: Other Artifact Types
Artifacts aren't just for model checkpoints. You can also version datasets:
```python theme={null}
# Logging a Dataset
dataset = wandb.Artifact("imagenet-subset", type="dataset")
dataset.add_dir("data/imagenet_subset")
wandb.log_artifact(dataset)
```
***
## Hyperparameter Sweeps (Multi‑GPU, Multi‑Instance)
Sweeps allow large-scale hyperparameter search across many Thunder Compute instances.
### Step 1 — Create `sweep.yaml`
```yaml theme={null}
program: train.py
project: thunder-resnet
method: bayes
metric:
name: val/accuracy
goal: maximize
parameters:
learning_rate:
min: 0.00001
max: 0.001
batch_size:
values: [32, 64, 128]
weight_decay:
min: 0.0
max: 0.1
augment:
values: ["none", "light", "heavy"]
```
Output:
```
wandb: Creating sweep from: sweep.yaml
wandb: Creating sweep with ID: fgbkmk3q
wandb: View sweep at: https://wandb.ai/entity-name/thunder-resnet/sweeps/fgbkmk3q
wandb: Run sweep agent with: wandb agent entity-name/thunder-resnet/fgbkmk3q
```
### Step 2 — Initialize the sweep:
```bash theme={null}
wandb sweep sweep.yaml
```
### Step 3 — Run agents on Thunder GPU instances:
```bash theme={null}
wandb agent //
```
Each agent pulls new hyperparameters and launches a run automatically.
***
## Distributed Training (DDP, Lightning, DeepSpeed)
### PyTorch DDP Example
```python theme={null}
wandb.init(
project="thunder-ddp",
group="llama7b-a100x4",
job_type="training",
)
```
Set run names per rank:
```python theme={null}
wandb.run.name = f"gpu-{rank}"
```
### PyTorch Lightning Example
```python theme={null}
from lightning.pytorch import Trainer
from lightning.pytorch.loggers import WandbLogger
wandb_logger = WandbLogger(project="thunder-lightning-demo")
trainer = Trainer(
logger=wandb_logger,
accelerator="gpu",
devices=4,
strategy="ddp",
max_epochs=50,
)
trainer.fit(model)
```
Lightning automatically:
* Logs metrics and gradients
* Tracks checkpoints
* Handles multi-GPU logging
***
## Offline Mode (Air‑Gapped or Firewalled Environments)
Thunder instances may have intermittent or restricted internet access.
### Run in offline mode:
```bash theme={null}
export WANDB_MODE=offline
python train.py
```
### Sync later:
```bash theme={null}
wandb sync /path/to/wandb/run-folder
```
### Fully disable wandb:
```bash theme={null}
export WANDB_MODE=disabled
```
***
## Best Practices for Thunder Compute GPU Instances
### Run Management
* Use meaningful run names that include dataset + model + GPU type
* Log all hyperparameters in `wandb.config`
* Track system metrics to diagnose bottlenecks
* Organize multi-GPU runs using `group`
* Reduce logging overhead by batching logs
### Artifacts & Checkpointing
* Use meaningful artifact names (e.g. `llama7b-a100-epoch20`)
* Attach useful metadata (epoch, val metrics, dataset version)
* Log fewer but higher-quality checkpoints
* Always use artifacts for long or expensive runs
* Use `use_artifact(...).download()` to restore weights anywhere
* Use artifacts for datasets and checkpoints
### Experimentation
* Use sweeps for expensive experiments
* Compare runs systematically using the dashboard
* Monitor GPU utilization to optimize batch sizes
***
## Troubleshooting
### Authentication Issues
```bash theme={null}
wandb login --relogin
```
### GPU Metrics Not Showing
* Ensure `nvidia-smi` works inside the environment
* Use GPU-enabled containers (`--gpus all`)
* Call `wandb.init()` early
### Connection Issues
* Verify outbound internet access
* Firewalls must allow connections to `*.wandb.ai`
* Use offline mode if required
### Large File Uploads
* Always use artifacts for multi-GB files
* Compress large checkpoints
* Prune old versions
***
## Need Help?
* W\&B Docs: [https://docs.wandb.ai](https://docs.wandb.ai)
* Thunder Compute Discord: [https://discord.com/invite/nwuETS9jJK](https://discord.com/invite/nwuETS9jJK)
* Email support: `support@thundercompute.com`
# Prototyping vs Production
Source: https://www.thundercompute.com/docs/prototyping-vs-production
Differentiate prototyping and production environments on Thunder Compute. Select hardware and configurations optimized for your project scale.
Thunder Compute offers two modes for running instances.
| Feature | Prototyping | Production |
| ------------------------- | ------------------------ | ----------------------- |
| Cost | Lower | Higher |
| Compatibility | Most ML workloads | Full CUDA compatibility |
| GPUs | A6000, A100, H100 | A100, H100 |
| Multi-GPU | A100, H100: up to 2 GPUs | Up to 8 GPUs |
| Graphics (OpenGL, Vulkan) | No | Yes |
## Prototyping Mode
Prototyping mode is currently in beta and exclusively available on Thunder Compute.
Prototyping mode applies CUDA-level optimizations to maximize GPU utilization, significantly reducing costs for AI/ML development workflows.
### Supported Software
* **PyTorch**: Fully supported (downgrading from the pre-installed version may cause issues)
* **TensorFlow**
* **JAX**
* **Jupyter Notebooks**
* **Model Serving**: ComfyUI, Ollama, VLLM, and others
* **Fine Tuning**: Unsloth and others
### Unsupported Workloads
* Graphics workloads (OpenGL, Vulkan, FFMPEG)
* Custom CUDA kernels (may exhibit unpredictable behavior)
* Hardware-specific profiling tools
If you encounter issues with an unsupported workload, switch to production mode with [modify](/vscode/operations/modifying-instances) for full compatibility.
## Production Mode
Production mode provisions a standard virtual machine with full CUDA compatibility and predictable performance.
### When to Choose Production
* Long-running training jobs
* Multi-GPU workloads (up to 8 GPUs)
* Graphics workloads (OpenGL, Vulkan, FFMPEG)
* Custom CUDA kernels
* Workloads requiring accurate hardware metrics
## Switching Between Modes
[Modify existing instances](/vscode/operations/modifying-instances) to switch between prototyping and production mode. This also lets you change GPU type, vCPUs, and RAM. Storage can be expanded but not reduced.
## Learn More
* [Technical Specifications](/technical-specs) - Hardware, networking, and storage details
# Restrictions
Source: https://www.thundercompute.com/docs/restrictions
Analyze platform restrictions for Thunder Compute services. Review technical limitations, usage policies, and resource constraints for accounts.
## Prohibited Activities
### Cryptocurrency Mining
Mining, staking, or otherwise interacting with cryptocurrency is strictly prohibited on Thunder Compute. If cryptocurrency-related activity is detected:
* The associated account is immediately banned
* Any billing credit is revoked
* The account is billed for the full amount of usage
## Geographic Availability
### B2B Requirements
Thunder Compute is only available for B2B customers and requires a VAT ID (or similar) in the following countries:
* United Arab Emirates
* Angola
* Bahrain
* Brazil
* Switzerland
* Côte d’Ivoire (Ivory Coast)
* Colombia
* Algeria
* Georgia
* Iraq
* Jordan
* Kazakhstan
* South Korea (Republic of Korea)
* Kuwait
* Morocco
* North Macedonia
* Oman
* Paraguay
* Qatar
* Saudi Arabia
* Tunisia
* Turkey (Türkiye)
* Tanzania
* Ukraine
* Uganda
* Uzbekistan
* Yemen
* India
* Moldova (Republic of Moldova)
### Restricted Countries
Thunder Compute is not currently available in the following countries:
* Belarus
* China
* Cuba
* Indonesia
* Iran
* Kenya
* North Korea
* Malaysia
* Mexico
* Nigeria
* Russia
* Sudan
* Syria
* Uruguay
If you're located in one of these countries and need access to Thunder Compute, please contact us to discuss potential alternatives.
## Usage Guidelines
### Acceptable Use
Thunder Compute instances are intended for legitimate computational workloads, particularly:
* AI/ML development and training
* Scientific computing
* Data processing and analysis
* Software development and testing
We have a strict one-account-per-user policy.
### Resource Usage
Users must comply with fair use policies and avoid activities that:
* Violate terms of service
* Engage in illegal or unethical activities
## Support
If you have questions about restrictions or need clarification on acceptable use, contact our support team.
# Technical Specifications
Source: https://www.thundercompute.com/docs/technical-specs
Hardware specifications, networking details, and pre-installed software for Thunder Compute instances
## Instance Infrastructure
### Hardware Specifications
* **GPU and CPU configs**: Check [pricing page](https://www.thundercompute.com/pricing) for latest availability
* **Location**: North America
### Pre-installed Software
* **CUDA**: Version 13.0
* **CUDA Driver**: Version 580
* **PyTorch**: Version 2.9.0+cu128
* **JupyterLab**: Pre-installed
* Additional scientific Python libraries (NumPy, Pandas, etc.)
Do not attempt to reinstall CUDA. If compatibility issues arise, use a venv and change the versions of your other dependencies (e.g., PyTorch) rather than modifying the CUDA libraries.
## Storage
* **Persistent Disk**: Your home directory and OS. Preserved across modifications and included in snapshots. Can be expanded but not shrunk.
* **Ephemeral Storage**: Optional fast local NVMe disk mounted at `/ephemeral`. Not included in snapshots and lost when the instance is modified or deleted. Ideal for model weights, caches, and scratch files. See [Ephemeral Storage](/guides/ephemeral-storage).
| Storage Type | Prototyping Range | Production Range |
| ----------------- | ----------------- | ---------------- |
| Persistent Disk | 100 - 400 GB | 100 - 1000 GB |
| Ephemeral Storage | 0 - 300 GB | 0 - 500 GB |
## Networking
* **Egress/Ingress**: 7 Gbps
* **IP Address**: Dynamic
### Port Access
* **Public URLs (CLI)**: Use `tnr ports forward` to expose HTTP services at `https://-.thundercompute.net` with automatic HTTPS and DDoS protection. See [Port Forwarding](/cli/operations/port-forwarding) for details.
* **Local tunneling (CLI)**: Use `tnr connect -t ` to tunnel ports to your local machine
* **VS Code**: Use the built-in [port forwarding](https://code.visualstudio.com/docs/debugtest/port-forwarding) feature
# Troubleshooting
Source: https://www.thundercompute.com/docs/troubleshooting
Troubleshoot common Thunder Compute errors. Find solutions for connection issues, function errors, SSH problems, and access logs. Get support via Discord.
## Common solutions
1. Reconnect to the instance with `ctrl + d` and `tnr connect `
2. Upgrade tnr. Depending on your install method, you may have to use `pip install tnr --upgrade` or re-download the binary from the website
3. Back up any important data, then delete and recreate the instance.
## Common errors
### Function not implemented
A common error you may encounter is some variant of "This function is not implemented." What this means is that your program touches a portion of the CUDA API that we do not currently support. Check our [Prototyping vs Production](/prototyping-vs-production) guide for supported features, and if you encounter this, please contact us.
### SSH errors
If you encounter SSH-related errors (like `Error reading SSH protocol banner` or permission issues), first retry the command.
For quick fixes, back up critical data and recreate the instance. Instances cannot be stopped or restarted.
For persistent SSH issues, see our [SSH on Thunder Compute guide](/cli/operations/ssh) for alternative connection methods.
## Recommended Guides
To help prevent common issues and get the most out of Thunder Compute, we recommend these guides:
* [Using Docker](/guides/using-docker-on-thundercompute) - Learn about GPU-enabled containers and troubleshooting Docker issues
* [Using Instance Templates](/guides/using-instance-templates) - Use pre-configured environments to minimize setup issues
## Production mode as a last resort
If you continue to experience compatibility issues or errors that cannot be resolved through the above methods, consider switching to production mode by modifying your instance ([VS Code](/vscode/operations/modifying-instances), [CLI](/cli/operations/modifying-instances), or [Console](/console/operations/modifying-instances)). Production mode provides maximum stability and reliability with all low-level optimizations disabled, ensuring complete compatibility for workloads that encounter persistent issues in the prototyping tier.
## Support
The fastest way to get support is to join [our discord](https://discord.com/invite/nwuETS9jJK). Our founding team will personally respond to help you as quickly as possible.
# Add SSH key to instance
Source: https://www.thundercompute.com/docs/api-reference/instances/add-ssh-key-to-instance
https://api.thundercompute.com:8443/openapi.json post /instances/{id}/add_key
Add an SSH key to an existing instance. If public_key is provided in the request body, it will be added to authorized_keys. If no public_key is provided, a new key pair will be generated and the private key returned.
# Create instance
Source: https://www.thundercompute.com/docs/api-reference/instances/create-instance
https://api.thundercompute.com:8443/openapi.json post /instances/create
Create a new compute instance
# Delete instance
Source: https://www.thundercompute.com/docs/api-reference/instances/delete-instance
https://api.thundercompute.com:8443/openapi.json post /instances/{id}/delete
Delete a compute instance by ID
# List instances
Source: https://www.thundercompute.com/docs/api-reference/instances/list-instances
https://api.thundercompute.com:8443/openapi.json get /instances/list
Get a list of user's compute instances
# Modify instance
Source: https://www.thundercompute.com/docs/api-reference/instances/modify-instance
https://api.thundercompute.com:8443/openapi.json post /instances/{id}/modify
Modify a running compute instance's resources
# Create a snapshot
Source: https://www.thundercompute.com/docs/api-reference/snapshots/create-a-snapshot
https://api.thundercompute.com:8443/openapi.json post /snapshots/create
Create a new snapshot from a running instance
# Delete a snapshot
Source: https://www.thundercompute.com/docs/api-reference/snapshots/delete-a-snapshot
https://api.thundercompute.com:8443/openapi.json delete /snapshots/{id}
Delete a snapshot by ID
# List snapshots
Source: https://www.thundercompute.com/docs/api-reference/snapshots/list-snapshots
https://api.thundercompute.com:8443/openapi.json get /snapshots/list
Get a list of all snapshots for the authenticated user's organization
# Add an SSH key
Source: https://www.thundercompute.com/docs/api-reference/ssh-keys/add-an-ssh-key
https://api.thundercompute.com:8443/openapi.json post /keys/add
Add a new SSH public key to the authenticated user's organization
# Delete an SSH key
Source: https://www.thundercompute.com/docs/api-reference/ssh-keys/delete-an-ssh-key
https://api.thundercompute.com:8443/openapi.json delete /keys/{id}
Delete an SSH key by ID
# List SSH keys
Source: https://www.thundercompute.com/docs/api-reference/ssh-keys/list-ssh-keys
https://api.thundercompute.com:8443/openapi.json get /keys/list
Get a list of all SSH keys for the authenticated user's organization
# Get current pricing
Source: https://www.thundercompute.com/docs/api-reference/utilities/get-current-pricing
https://api.thundercompute.com:8443/openapi.json get /pricing
Retrieve current hourly pricing information for compute resources
# Get GPU specifications
Source: https://www.thundercompute.com/docs/api-reference/utilities/get-gpu-specifications
https://api.thundercompute.com:8443/openapi.json get /specs
Retrieve GPU spec configurations for all supported GPU types, counts, and modes
# Get thunder templates
Source: https://www.thundercompute.com/docs/api-reference/utilities/get-thunder-templates
https://api.thundercompute.com:8443/openapi.json get /thunder-templates
Get available thunder templates for instance creation
# Billing
Source: https://www.thundercompute.com/docs/billing
Understand Thunder Compute's usage-based billing, payment methods, billing alerts, current rates, and tips for saving on GPU cloud costs.
## Payment Options
There are **two ways to pay** for Thunder Compute:
### Option 1: Auto-Pay
Set up auto-pay by saving a credit card. Go to [console.thundercompute.com/settings/billing](https://console.thundercompute.com/settings/billing) and click "Manage saved payment method" (or "Add card to enable auto-pay" if no card is on file).
### Option 2: Preload Credit
Add credit directly to your account as an alternative to auto-pay. This credit never expires and will be used before any saved payment method.
**Order of payment**
1. Any preloaded credit you've added
2. Charges to your saved payment method
You can switch between options or use both—set up auto-pay anytime, even if you started with preloaded credit.
## Billing Alerts
* **Instance reminders:** We'll email you about any running instances so you're never caught off guard.
* **Threshold charges:** As your usage grows, we'll bill your card at preset checkpoints (which rise over time) to prevent runaway bills.
## Our rates
All compute resources are billed per minute only while your instances run. Rates and promotions are subject to change without notice. For current rates, see our [pricing page](https://www.thundercompute.com/pricing).
## Credit Terms
* **Preloaded credit** does not expire and will be used before charging your saved card.
* **Promotional credit** can be revoked at our discretion.
* **Refunds:** Credit is non-refundable.
## Money-Saving Tips
While Thunder Compute is already the cheapest GPU cloud platform, there are a few strategies we recommend to reduce your bill:
* Delete instances when you're done with them to stop billing.
* Right‑size new workloads with `tnr create --gpu`, `--vcpus`, and related flags so you only pay for what you use.
We think this balances a smooth experience with strong verification—but if you have feedback or questions, please hop into our [Discord](https://discord.com/invite/nwuETS9jJK). We're always happy to improve!
# Data Processing Addendum
Source: https://www.thundercompute.com/docs/guides/data-processing-addendum
Review the Data Processing Addendum for Thunder Compute. Audit legal terms, data handling protocols, and privacy compliance for your organization.
## Sample Agreement
Data Processing Agreement
## Using this DPA
This DPA has 2 parts: (1) the Key Terms on this Cover Page and (2) the Common Paper DPA Standard Terms Version 1 posted at commonpaper.com/standards/data-processing-agreement/1.1 (“DPA Standard Terms”), which is incorporated by reference. If there is any inconsistency between the parts of the DPA, the Cover Page will control over the DPA Standard Terms. Capitalized and highlighted words have the meanings given on the Cover Page. However, if the Cover Page omits or does not define a highlighted word, the default meaning will be “none” or “not applicable” and the correlating clause, sentence, or section does not apply to this Agreement. All other capitalized words have the meanings given in the DPA Standard Terms or the Agreement. A copy of the DPA Standard Terms is attached for convenience only.
## Key Terms
The key legal terms of the DPA are as follows:
| Term | Details |
| ------------------------- | ---------------------------------------------------------------------------------------------- |
| Agreement | Reference to sales contract will be set when sending agreement |
| Approved Subprocessors | [https://www.thundercompute.com/sub-processors](https://www.thundercompute.com/sub-processors) |
| Provider Security Contact | `support@thundercompute.com` |
| Security Policy | As defined in the Agreement. |
### Service Provider Relationship
To the extent California Consumer Privacy Act, Cal. Civ. Code § 1798.100 et seq (“CCPA”) applies, the parties acknowledge and agree that Provider is a service provider and is receiving Personal Data from Customer to provide the Service as agreed in the Agreement and detailed below (see Nature and Purpose of Processing), which constitutes a limited and specified business purpose. Provider will not sell or share any Personal Data provided by Customer under the Agreement. In addition, Provider will not retain, use, or disclose any Personal Data provided by Customer under the Agreement except as necessary for providing the Service for Customer, as stated in the Agreement, or as permitted by Applicable Data Protection Laws. Provider certifies that it understands the restrictions of this paragraph and will comply with all Applicable Data Protection Laws. Provider will notify Customer if it can no longer meet its obligations under the CCPA.
## Restricted Transfers
### Governing Member State
* EEA Transfers: Ireland
* UK Transfers: England and Wales
## Annex I(A) List of Parties
### Data Exporter
* Name: the Customer signing this DPA
* Activities relevant to transfer: See Annex I(B)
* Role: Controller
### Data Importer
* Name: the Provider signing this DPA
* Contact person: Carl Peterson, CEO
* Address: 887 w marietta st nw, Suite N105, Georgia 30318, USA
* Activities relevant to transfer: See Annex I(B)
* Role: Processor
## Annex I(B) Description of Transfer and Processing Activities
### Service
The Service is: GPU cloud computing with on-demand cloud instances, backed by physical servers, in addition to data storage.
### Categories of Data Subjects
* Customer's employees
### Categories of Personal Data
* Name
* Contact information such as email, phone number, or address
* Financial information such as bank account numbers
* Transactional information such as account information or purchases
* User activity and analysis such as device information or IP address
* Location information
### Special Category Data
Is special category data (as defined in Article 9 of the GDPR) Processed? No
### Frequency of Transfer
Continuous
### Nature and Purpose of Processing
* Receiving data, including collection, accessing, retrieval, recording, and data entry
* Holding data, including storage, organization, and structuring
* Using data, including analysis, consultation, testing, automated decision making, and profiling
* Updating data, including correcting, adaption, alteration, alignment, and combination
* Protecting data, including restricting, encrypting, and security testing
* Sharing data, including disclosure, dissemination, allowing access, or otherwise making available
* Returning data to the data exporter or data subject
* Erasing data, including destruction and deletion
### Duration of Processing
Provider will process Customer Personal Data as long as required (i) to conduct the Processing activities instructed in Section 2.2(a)-(d) of the Standard Terms; or (ii) by Applicable Laws.
## Annex I(C)
### Competent Supervisory Authority
The supervisory authority will be the supervisory authority of the data exporter, as determined in accordance with Clause 13 of the EEA SCCs or the relevant provision of the UK Addendum.
## Annex II
### Technical and Organizational Security Measures
See Security Policy
Provider and Customer have not changed the DPA Standard Terms except for the details on the Cover Page above. By signing this Cover Page, each party agrees to enter into this DPA as of the last date of signature below.
## Signatures
| Field | Provider (Thunder Compute) | Customer |
| -------------------- | -------------------------- | -------- |
| Signature | | |
| Print Name | | |
| Title | | |
| Legal Notice Address | `carl@thundercompute.com` | |
| Date | | |
## 1. Processor and Subprocessor Relationships
### 1.1 Provider as Processor
In situations where Customer is a Controller of the Customer Personal Data, Provider will be deemed a Processor that is Processing Personal Data on behalf of Customer.
### 1.2 Provider as Subprocessor
In situations where Customer is a Processor of the Customer Personal Data, Provider will be deemed a Subprocessor of the Customer Personal Data.
## 2. Processing
### 2.1 Processing Details
Annex I(B) on the Cover Page describes the subject matter, nature, purpose, and duration of this Processing, as well as the Categories of Personal Data collected and Categories of Data Subjects.
### 2.2 Processing Instructions
Customer instructs Provider to Process Customer Personal Data: (a) to provide and maintain the Service; (b) as may be further specified through Customer’s use of the Service; (c) as documented in the Agreement; and (d) as documented in any other written instructions given by Customer and acknowledged by Provider about Processing Customer Personal Data under this DPA. Provider will abide by these instructions unless prohibited from doing so by Applicable Laws. Provider will immediately inform Customer if it is unable to follow the Processing instructions. Customer has given and will only give instructions that comply with Applicable Laws.
### 2.3 Processing by Provider
Provider will only Process Customer Personal Data in accordance with this DPA, including the details in the Cover Page. If Provider updates the Service to update existing or include new products, features, or functionality, Provider may change the Categories of Data Subjects, Categories of Personal Data, Special Category Data, Special Category Data Restrictions or Safeguards, Frequency of Transfer, Nature and Purpose of Processing, and Duration of Processing as needed to reflect the updates by notifying Customer of the updates and changes.
### 2.4 Customer Processing
Where Customer is a Processor and Provider is a Subprocessor, Customer will comply with all Applicable Laws that apply to Customer’s Processing of Customer Personal Data. Customer’s agreement with its Controller will similarly require Customer to comply with all Applicable Laws that apply to Customer as a Processor. In addition, Customer will comply with the Subprocessor requirements in Customer’s agreement with its Controller.
### 2.5 Consent to Processing
Customer has complied with and will continue to comply with all Applicable Data Protection Laws concerning its provision of Customer Personal Data to Provider and/or the Service, including making all disclosures, obtaining all consents, providing adequate choice, and implementing relevant safeguards required under Applicable Data Protection Laws.
### 2.6 Subprocessors
1. Provider will not provide, transfer, or hand over any Customer Personal Data to a Subprocessor unless Customer has approved the Subprocessor. The current list of Approved Subprocessors includes the identities of the Subprocessors, their country of location, and their anticipated Processing tasks. Provider will inform Customer at least 10 business days in advance and in writing of any intended changes to the Approved Subprocessors whether by addition or replacement of a Subprocessor, which allows Customer to have enough time to object to the changes before the Provider begins using the new Subprocessor(s). Provider will give Customer the information necessary to allow Customer to exercise its right to object to the change to Approved Subprocessors. Customer has 30 days after notice of a change to the Approved Subprocessors to object, otherwise Customer will be deemed to accept the changes. If Customer objects to the change within 30 days of notice, Customer and Provider will cooperate in good faith to resolve Customer’s objection or concern.
2. When engaging a Subprocessor, Provider will have a written agreement with the Subprocessor that ensures the Subprocessor only accesses and uses Customer Personal Data (i) to the extent required to perform the obligations subcontracted to it, and (ii) consistent with the terms of Agreement.
3. If the GDPR applies to the Processing of Customer Personal Data, (i) the data protection obligations described in this DPA (as referred to in Article 28(3) of the GDPR, if applicable) are also imposed on the Subprocessor, and (ii) Provider’s agreement with the Subprocessor will incorporate these obligations, including details about how Provider and its Subprocessor will coordinate to respond to inquiries or requests about the Processing of Customer Personal Data. In addition, Provider will share, at Customer’s request, a copy of its agreements (including any amendments) with its Subprocessors. To the extent necessary to protect business secrets or other confidential information, including personal data, Provider may redact the text of its agreement with its Subprocessor prior to sharing a copy.
4. Provider remains fully liable for all obligations subcontracted to its Subprocessors, including the acts and omissions of its Subprocessors in Processing Customer Personal Data. Provider will notify Customer of any failure by its Subprocessors to fulfill a material obligation about Customer Personal Data under the agreement between Provider and the Subprocessor.
## 3. Restricted Transfers
### 3.1 Authorization
Customer agrees that Provider may transfer Customer Personal Data outside the EEA, the United Kingdom, or other relevant geographic territory as necessary to provide the Service. If Provider transfers Customer Personal Data to a territory for which the European Commission or other relevant supervisory authority has not issued an adequacy decision, Provider will implement appropriate safeguards for the transfer of Customer Personal Data to that territory consistent with Applicable Data Protection Laws.
### 3.2 Ex-EEA Transfers
Customer and Provider agree that if the GDPR protects the transfer of Customer Personal Data, the transfer is from Customer from within the EEA to Provider outside of the EEA, and the transfer is not governed by an adequacy decision made by the European Commission, then by entering into this DPA, Customer and Provider are deemed to have signed the EEA SCCs and their Annexes, which are incorporated by reference. Any such transfer is made pursuant to the EEA SCCs, which are completed as follows:
1. Module Two (Controller to Processor) of the EEA SCCs apply when Customer is a Controller and Provider is Processing Customer Personal Data for Customer as a Processor.
2. Module Three (Processor to Sub-Processor) of the EEA SCCs apply when Customer is a Processor and Provider is Processing Customer Personal Data on behalf of Customer as a Subprocessor.
3. For each module, the following applies (when applicable):
* The optional docking clause in Clause 7 does not apply;
* In Clause 9, Option 2 (general written authorization) applies, and the minimum time period for prior notice of Subprocessor changes is 10 business days;
* In Clause 11, the optional language does not apply;
* All square brackets in Clause 13 are removed;
* In Clause 17 (Option 1), the EEA SCCs will be governed by the laws of Governing Member State;
* In Clause 18(b), disputes will be resolved in the courts of the Governing Member State; and
* The Cover Page to this DPA contains the information required in Annex I, Annex II, and Annex III of the EEA SCCs.
### 3.3 Ex-UK Transfers
Customer and Provider agree that if the UK GDPR protects the transfer of Customer Personal Data, the transfer is from Customer from within the United Kingdom to Provider outside of the United Kingdom, and the transfer is not governed by an adequacy decision made by the United Kingdom Secretary of State, then by entering into this DPA, Customer and Provider are deemed to have signed the UK Addendum and their Annexes, which are incorporated by reference. Any such transfer is made pursuant to the UK Addendum, which is completed as follows:
1. Section 3.2 of this DPA contains the information required in Table 2 of the UK Addendum.
2. Table 4 of the UK Addendum is modified as follows: Neither party may end the UK Addendum as set out in Section 19 of the UK Addendum; to the extent ICO issues a revised Approved Addendum under Section 18 of the UK Addendum, the parties will work in good faith to revise this DPA accordingly.
3. The Cover Page contains the information required by Annex 1A, Annex 1B, Annex II, and Annex III of the UK Addendum.
### 3.4 Other International Transfers
For Personal Data transfers where Swiss law (and not the law in any EEA member state or the United Kingdom) applies to the international nature of the transfer, references to the GDPR in Clause 4 of the EEA SCCs are, to the extent legally required, amended to refer to the Swiss Federal Data Protection Act or its successor instead, and the concept of supervisory authority will include the Swiss Federal Data Protection and Information Commissioner.
## 4. Security Incident Response
Upon becoming aware of any Security Incident, Provider will: (a) notify Customer without undue delay when feasible, but no later than 72 hours after becoming aware of the Security Incident; (b) provide timely information about the Security Incident as it becomes known or as is reasonably requested by Customer; and (c) promptly take reasonable steps to contain and investigate the Security Incident. Provider’s notification of or response to a Security Incident as required by this DPA will not be construed as an acknowledgment by Provider of any fault or liability for the Security Incident.
## 5. Audit & Reports
### 5.1 Audit Rights
Provider will give Customer all information reasonably necessary to demonstrate its compliance with this DPA and Provider will allow for and contribute to audits, including inspections by Customer, to assess Provider’s compliance with this DPA. However, Provider may restrict access to data or information if Customer’s access to the information would negatively impact Provider’s intellectual property rights, confidentiality obligations, or other obligations under Applicable Laws. Customer acknowledges and agrees that it will only exercise its audit rights under this DPA and any audit rights granted by Applicable Data Protection Laws by instructing Provider to comply with the reporting and due diligence requirements below. Provider will maintain records of its compliance with this DPA for 3 years after the DPA ends.
### 5.2 Security Reports
Customer acknowledges that Provider is regularly audited against the standards defined in the Security Policy by independent third-party auditors. Upon written request, Provider will give Customer, on a confidential basis, a summary copy of its then-current Report so that Customer can verify Provider’s compliance with the standards defined in the Security Policy.
### 5.3 Security Due Diligence
In addition to the Report, Provider will respond to reasonable requests for information made by Customer to confirm Provider’s compliance with this DPA, including responses to information security, due diligence, and audit questionnaires, or by giving additional information about its information security program. All such requests must be in writing and made to the Provider Security Contact and may only be made once a year.
## 6. Coordination & Cooperation
### 6.1 Response to Inquiries
If Provider receives any inquiry or request from anyone else about the Processing of Customer Personal Data, Provider will notify Customer about the request and Provider will not respond to the request without Customer’s prior consent. Examples of these kinds of inquiries and requests include a judicial or administrative or regulatory agency order about Customer Personal Data where notifying Customer is not prohibited by Applicable Law, or a request from a data subject. If allowed by Applicable Law, Provider will follow Customer’s reasonable instructions about these requests, including providing status updates and other information reasonably requested by Customer. If a data subject makes a valid request under Applicable Data Protection Laws to delete or opt out of Customer’s giving of Customer Personal Data to Provider, Provider will assist Customer in fulfilling the request according to the Applicable Data Protection Law. Provider will cooperate with and provide reasonable assistance to Customer, at Customer’s expense, in any legal response or other procedural action taken by Customer in response to a third-party request about Provider’s Processing of Customer Personal Data under this DPA.
### 6.2 DPIAs and DTIAs
If required by Applicable Data Protection Laws, Provider will reasonably assist Customer in conducting any mandated data protection impact assessments or data transfer impact assessments and consultations with relevant data protection authorities, taking into consideration the nature of the Processing and Customer Personal Data.
## 7. Deletion of Customer Personal Data
### 7.1 Deletion by Customer
Provider will enable Customer to delete Customer Personal Data in a manner consistent with the functionality of the Services. Provider will comply with this instruction as soon as reasonably practicable except where further storage of Customer Personal Data is required by Applicable Law.
### 7.2 Deletion at DPA Expiration
1. After the DPA expires, Provider will return or delete Customer Personal Data at Customer’s instruction unless further storage of Customer Personal Data is required or authorized by Applicable Law. If return or destruction is impracticable or prohibited by Applicable Laws, Provider will make reasonable efforts to prevent additional Processing of Customer Personal Data and will continue to protect the Customer Personal Data remaining in its possession, custody, or control. For example, Applicable Laws may require Provider to continue hosting or Processing Customer Personal Data.
2. If Customer and Provider have entered the EEA SCCs or the UK Addendum as part of this DPA, Provider will only give Customer the certification of deletion of Personal Data described in Clause 8.1(d) and Clause 8.5 of the EEA SCCs if Customer asks for one.
## 8. Limitation of Liability
### 8.1 Liability Caps and Damages Waiver
To the maximum extent permitted under Applicable Data Protection Laws, each party’s total cumulative liability to the other party arising out of or related to this DPA will be subject to the waivers, exclusions, and limitations of liability stated in the Agreement.
### 8.2 Related-Party Claims
Any claims made against Provider or its Affiliates arising out of or related to this DPA may only be brought by the Customer entity that is a party to the Agreement.
### 8.3 Exceptions
This DPA does not limit any liability to an individual about the individual’s data protection rights under Applicable Data Protection Laws. In addition, this DPA does not limit any liability between the parties for violations of the EEA SCCs or UK Addendum.
## 9. Conflicts Between Documents
This DPA forms part of and supplements the Agreement. If there is any inconsistency between this DPA, the Agreement, or any of their parts, the part listed earlier will control over the part listed later for that inconsistency: (1) the EEA SCCs or the UK Addendum, (2) this DPA, and then (3) the Agreement.
## 10. Term of Agreement
This DPA will start when Provider and Customer agree to a Cover Page for the DPA and sign or electronically accept the Agreement and will continue until the Agreement expires or is terminated. However, Provider and Customer will each remain subject to the obligations in this DPA and Applicable Data Protection Laws until Customer stops transferring Customer Personal Data to Provider and Provider stops Processing Customer Personal Data.
## 11. Definitions
### 11.1 Applicable Laws
“Applicable Laws” means the laws, rules, regulations, court orders, and other binding requirements of a relevant government authority that apply to or govern a party.
### 11.2 Applicable Data Protection Laws
“Applicable Data Protection Laws” means the Applicable Laws that govern how the Service may process or use an individual’s personal information, personal data, personally identifiable information, or other similar term.
### 11.3 Controller
“Controller” will have the meaning(s) given in the Applicable Data Protection Laws for the company that determines the purpose and extent of Processing Personal Data.
### 11.4 Cover Page
“Cover Page” means a document that is signed or electronically accepted by the parties that incorporates these DPA Standard Terms and identifies Provider, Customer, and the subject matter and details of the data processing.
### 11.5 Customer Personal Data
“Customer Personal Data” means Personal Data that Customer uploads or provides to Provider as part of the Service and that is governed by this DPA.
### 11.6 DPA
“DPA” means these DPA Standard Terms, the Cover Page between Provider and Customer, and the policies and documents referenced in or attached to the Cover Page.
### 11.7 EEA SCCs
“EEA SCCs” means the standard contractual clauses annexed to the European Commission's Implementing Decision 2021/914 of 4 June 2021 on standard contractual clauses for the transfer of personal data to third countries pursuant to Regulation (EU) 2016/679 of the European Parliament and of the European Council.
### 11.8 European Economic Area (EEA)
“European Economic Area” or “EEA” means the member states of the European Union, Norway, Iceland, and Liechtenstein.
### 11.9 GDPR
“GDPR” means European Union Regulation 2016/679 as implemented by local law in the relevant EEA member nation.
### 11.10 Personal Data
“Personal Data” will have the meaning(s) given in the Applicable Data Protection Laws for personal information, personal data, or other similar term.
### 11.11 Processing
“Processing” or “Process” will have the meaning(s) given in the Applicable Data Protection Laws for any use of, or performance of a computer operation on, Personal Data, including by automatic methods.
### 11.12 Processor
“Processor” will have the meaning(s) given in the Applicable Data Protection Laws for the company that Processes Personal Data on behalf of the Controller.
### 11.13 Report
“Report” means audit reports prepared by another company according to the standards defined in the Security Policy on behalf of Provider.
### 11.14 Restricted Transfer
“Restricted Transfer” means (a) where the GDPR applies, a transfer of personal data from the EEA to a country outside of the EEA which is not subject to an adequacy determination by the European Commission; and (b) where the UK GDPR applies, a transfer of personal data from the United Kingdom to any other country which is not subject to adequacy regulations adopted pursuant to Section 17A of the United Kingdom Data Protection Act 2018.
### 11.15 Security Incident
“Security Incident” means a Personal Data Breach as defined in Article 4 of the GDPR.
### 11.16 Service
“Service” means the product and/or services described in the Agreement.
### 11.17 Special Category Data
"Special Category Data” will have the meaning given in Article 9 of the GDPR.
### 11.18 Subprocessor
“Subprocessor” will have the meaning(s) given in the Applicable Data Protection Laws for a company that, with the approval and acceptance of Controller, assists the Processor in Processing Personal Data on behalf of the Controller.
### 11.19 UK GDPR
“UK GDPR” means European Union Regulation 2016/679 as implemented by section 3 of the United Kingdom’s European Union (Withdrawal) Act of 2018 in the United Kingdom.
### 11.20 UK Addendum
“UK Addendum” means the international data transfer addendum to the EEA SCCs issued by the Information Commissioner for Parties making Restricted Transfers under S119A(1) Data Protection Act 2018.
# Self-host Deepseek R1
Source: https://www.thundercompute.com/docs/guides/deepseek-r1-running-locally-on-thunder-compute
Self-host Deepseek R1 on Thunder Compute cloud GPUs. Local model deployment and configure hardware for optimized inference performance.
# Easily Run DeepSeek R1 on Thunder Compute
Looking for the **cheapest way to run DeepSeek R1** or just want to **try DeepSeek R1** without buying hardware? Thunder Compute lets you spin up pay‑per‑minute A100 GPUs so you only pay for the time you use. Follow the steps below to get the model running in minutes.
> **Quick reminder:** Make sure your Thunder Compute account is set up. If not, start with our [Quickstart Guide](/vscode/quickstart).
If you prefer video instructions, watch this overview:
## Step 1: Create a Cost‑Effective GPU Instance
Open your CLI and launch an 80 GB A100 GPU (perfect for the 70B variant):
```bash theme={null}
tnr create --gpu "a100xl" --template "ollama"
```
For details on instance templates, see our [templates guide](/guides/using-instance-templates).
## Step 2: Check Status and Connect
Verify the instance is running:
```bash theme={null}
tnr status
```
Connect with its ID:
```bash theme={null}
tnr connect
```
## Step 3: Start the Ollama Server
Inside the instance, start Ollama:
```bash theme={null}
start-ollama
```
If you run into issues, check our [troubleshooting guide](/troubleshooting).
Wait about 30 seconds for the web UI to load.
## Step 4: Access the Web UI and Load DeepSeek R1
1. Visit `http://localhost:8080` in your browser.
2. Choose **DeepSeek R1** from the dropdown. On an 80 GB A100, pick the **70B** variant for peak performance.
## Step 5: Run DeepSeek R1
Type a prompt in the web interface. For example:
> *"If the concepts of rCUDA were applied at scale, overcoming latency, what would it mean for the cost of GPUs on cloud providers?"*
The model will think through the answer and respond. A full reply can take up to 200 seconds.
## Conclusion
That's the **cheapest way to run DeepSeek R1** and a quick way to **try DeepSeek R1** on Thunder Compute. Explore more guides:
* [Using Docker on Thunder Compute](/guides/using-docker-on-thundercompute)
* [Using Instance Templates](/guides/using-instance-templates)
* [Running Jupyter notebooks](/guides/running-jupyter-notebooks-on-thunder-compute)
Happy building!
# Ephemeral Storage
Source: https://www.thundercompute.com/docs/guides/ephemeral-storage
Fast, temporary local storage for model weights, caches, and scratch files. Mounted at /ephemeral.
## What is Ephemeral Storage?
Ephemeral storage is fast, local disk space mounted at `/ephemeral` on your instance. It uses high-performance NVMe drives directly attached to the host machine, making it significantly faster than the persistent disk for I/O-heavy workloads.
Ephemeral storage is temporary. Data on `/ephemeral` is lost when you modify, delete, or migrate your instance. It is also not included in snapshots.
## When to Use Ephemeral Storage
Ephemeral storage is ideal for data that is large, frequently accessed, and easy to re-download:
* **Model weights** downloaded from Hugging Face or other registries
* **Pip/conda caches** to speed up environment rebuilds
* **Training checkpoints** (back up important ones to persistent disk or cloud storage)
* **Large datasets** that can be re-fetched
* **Scratch files** from preprocessing or intermediate computation
For data you need to keep, use the persistent disk (your home directory) or [snapshots](/cli/operations/snapshots).
## Configuring Ephemeral Storage
Ephemeral storage defaults to **0 GB** (disabled). You can add it when creating or modifying an instance.
```bash theme={null}
# Create with ephemeral storage
tnr create --ephemeral-disk 200
# Add to an existing instance
tnr modify --ephemeral-disk 200
# Disable ephemeral storage
tnr modify --ephemeral-disk 0
```
Set the **Ephemeral Storage** field in the create or modify instance dialog.
Set the **Ephemeral Storage** field in the create or modify instance dialog.
### Size Limits
See [thundercompute.com/pricing](https://www.thundercompute.com/pricing) for current ephemeral storage limits by instance mode.
## Using Ephemeral Storage
Once configured, the storage is available at `/ephemeral` inside your instance:
```bash theme={null}
# Check available space
df -h /ephemeral
# Download model weights to ephemeral storage
huggingface-cli download meta-llama/Llama-3-8B --local-dir /ephemeral/llama-3-8b
# Use as pip cache
pip install --cache-dir /ephemeral/pip-cache transformers
```
## What Happens to Ephemeral Data
| Event | Ephemeral data | Persistent disk |
| --------------------- | -------------- | --------------- |
| Instance running | Preserved | Preserved |
| Modify instance | Lost | Preserved |
| Delete instance | Lost | Lost |
| Create snapshot | Not included | Included |
| Restore from snapshot | Empty | Restored |
## Best Practices
1. **Store only re-downloadable data** on `/ephemeral`. Anything important should live on your persistent disk or be backed up to cloud storage.
2. **Use symlinks** to redirect cache directories to ephemeral storage:
```bash theme={null}
mkdir -p /ephemeral/huggingface
ln -s /ephemeral/huggingface ~/.cache/huggingface
```
3. **Set environment variables** to point tools at ephemeral storage:
```bash theme={null}
export HF_HOME=/ephemeral/huggingface
export PIP_CACHE_DIR=/ephemeral/pip-cache
export TRANSFORMERS_CACHE=/ephemeral/huggingface/transformers
```
4. **Back up training checkpoints** periodically from `/ephemeral` to your home directory or cloud storage if you need to keep them.
# Run GPT‑OSS 120B on Thunder Compute
Source: https://www.thundercompute.com/docs/guides/gpt-oss-running-locally-on-thunder-compute
Deploy GPT-OSS 120B on Thunder Compute hardware. Initialize the large language model and configure the local environment for high-performance use.
# Run GPT‑OSS 120B on Thunder Compute
Looking for the **cheapest way to self‑host GPT‑OSS 120B** or just want to **try it out** without buying hardware? Thunder Compute lets you spin up pay‑per‑minute NVIDIA A100 GPUs, so you only pay for what you use. Follow the steps below to get the model running in minutes.
> **Prerequisite:** Ensure your Thunder Compute account is ready. If not, start with our [Quickstart Guide](/vscode/quickstart).
## Step 1 — Create a Cost‑Effective Prototyping‑Mode GPU Instance
Launch an 80 GB A100 instance (large enough to host the full 120 B model):
```bash theme={null}
tnr create \
--gpu a100xl \
--vcpus 4 \
--mode prototyping \
--persistent-disk 200 \
--template "ollama"
```
This command starts a lower‑cost [prototyping‑mode](/prototyping-vs-production#prototyping-mode) instance with:
* **GPU:** A100 80 GB
* **vCPUs:** 4
* **Storage:** 200 GB (from the *Ollama* template)
> The GPU, vCPU Count, and Mode ([Prototyping](/prototyping-vs-production#prototyping-mode) / [Production](/prototyping-vs-production#production-mode)), can be changed later if your requirements change, and the amount of storage can be increased if needed.
For details on templates, see the [Instance Templates guide](/guides/using-instance-templates).
## Step 2 — Check Status and Connect
Verify that the instance is running, it can take a minute to spin up:
```bash theme={null}
tnr status
```
Connect to the instance:
```bash theme={null}
tnr connect
```
## Step 3 — Start Ollama and Download the Model
Inside the instance, start Ollama (this also launches OpenWebUI and a Cloudflare tunnel):
```bash theme={null}
start-ollama
```
While the UI is initializing, download the model, here we are downloading the 120B variant of GPT‑OSS, but any models can be downloaded from the [Ollama Model Library](https://ollama.com/library):
```bash theme={null}
ollama pull gpt-oss:120b
```
> **Tip:** If you encounter issues, consult the [troubleshooting guide](/troubleshooting).
Give the UI about 60 seconds to finish loading.
## Step 4 — Access the Web UI and Select the Model
1. Open `http://localhost:8080` in your browser.
2. Choose **gpt-oss:120b** from the model dropdown.
## Step 5 — Run GPT‑OSS 120B
Enter a prompt in the web interface, for example:
> *“Tell a tale of a seaman who found the treasure of the clouds by following the sound of thunder.”*
## Conclusion
That's it—the **cheapest way to run GPT‑OSS 120B** on Thunder Compute. For more, check out:
* [Using Docker on Thunder Compute](/guides/using-docker-on-thundercompute)
* [Using Instance Templates](/guides/using-instance-templates)
* [Running Jupyter Notebooks](/guides/running-jupyter-notebooks-on-thunder-compute)
Happy building!
# MCP Server
Source: https://www.thundercompute.com/docs/guides/mcp-server
Use Thunder Compute with AI coding agents like Claude Code, Cursor, Windsurf, and Codex via the Model Context Protocol (MCP).
Thunder Compute provides an MCP (Model Context Protocol) server that lets AI coding agents manage GPU instances on your behalf. Create, monitor, modify, and tear down instances without leaving your agent workflow.
## Prerequisites
1. A Thunder Compute account
2. An AI agent that supports remote MCP servers (Claude Code, Cursor, Codex, etc.)
No local installation or API tokens required — authentication is handled via OAuth in your browser.
## Setup
Run this in your terminal:
```bash theme={null}
claude mcp add --transport http thunder-compute https://www.thundercompute.com/mcp
```
Then start Claude Code and run `/mcp` to authenticate. A browser window will open for you to log in and authorize access.
Alternatively, add to `~/.claude.json` (global) or `.claude.json` in your project root:
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"url": "https://www.thundercompute.com/mcp"
}
}
}
```
Run this in your terminal:
```bash theme={null}
codex mcp add thunder-compute --url https://www.thundercompute.com/mcp
```
Codex will prompt you to authenticate via OAuth when you first use a Thunder Compute tool.
Add to `.cursor/mcp.json` in your project root (or `~/.cursor/mcp.json` for global access):
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"type": "http",
"url": "https://www.thundercompute.com/mcp"
}
}
}
```
Add to your MCP configuration:
```json theme={null}
{
"mcpServers": {
"thunder-compute": {
"serverUrl": "https://www.thundercompute.com/mcp",
"headers": {
"Content-Type": "application/json"
}
}
}
}
```
Run the interactive setup:
```bash theme={null}
opencode mcp add
```
When prompted:
* **Server name:** `thunder-compute`
* **Server type:** `Remote`
* **URL:** `https://www.thundercompute.com/mcp`
* **Requires OAuth:** `Yes`
* **Pre-registered client ID:** `No`
```bash theme={null}
opencode mcp auth thunder-compute
```
A browser window will open for you to log in and authorize access.
```bash theme={null}
opencode
```
The Thunder Compute tools are now available in your session.
Alternatively, add to `~/.config/opencode/opencode.json`:
```json theme={null}
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"thunder-compute": {
"type": "remote",
"url": "https://www.thundercompute.com/mcp",
"oauth": {}
}
}
}
```
Then run `opencode mcp auth thunder-compute` to authenticate.
If you use an MCP client that supports [Smithery](https://smithery.ai), you can install directly:
```bash theme={null}
npx @smithery/cli install @thunder-compute/thunder-compute
```
Or browse the [Thunder Compute listing on Smithery](https://smithery.ai/server/@thunder-compute/thunder-compute) and click **Install** for your client.
For custom integrations, the MCP server uses [Streamable HTTP transport](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http) at a single endpoint. Authentication is via OAuth 2.0 with standard MCP discovery.
**Endpoint:** `https://www.thundercompute.com/mcp`
```bash theme={null}
curl -X POST https://www.thundercompute.com/mcp \
-H "Authorization: Bearer " \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-agent", "version": "1.0.0" }
},
"id": 1
}'
```
## Authentication
No API tokens or environment variables needed. When you first connect, a browser window opens for you to log in with your Thunder Compute account and authorize access. Tokens refresh automatically, so you only authenticate once per session.
## Available Tools
### Instance Management
| Tool | Description |
| ----------------- | -------------------------------------------------------------------------------------- |
| `list_instances` | List all GPU instances with status, IP, and configuration |
| `create_instance` | Create a new GPU instance (specify GPU type, template, mode, etc.) |
| `delete_instance` | Delete an instance (irreversible) |
| `modify_instance` | Change instance config (GPU type, vCPUs, disk, mode) |
| `run_command` | Execute a shell command on a running instance and return stdout, stderr, and exit code |
### Information
| Tool | Description |
| ------------------ | ------------------------------------------------------------ |
| `get_specs` | Get available GPU specs (VRAM, vCPU options, storage ranges) |
| `get_availability` | Get current GPU availability status for each spec |
| `get_pricing` | Get current per-hour GPU pricing |
| `list_templates` | List available OS templates (Ubuntu, PyTorch, etc.) |
### Snapshots
| Tool | Description |
| ----------------- | -------------------------------- |
| `list_snapshots` | List all instance snapshots |
| `create_snapshot` | Create a snapshot of an instance |
| `delete_snapshot` | Delete a snapshot (irreversible) |
### SSH Keys
| Tool | Description |
| ------------------------- | -------------------------------------------------------------- |
| `list_ssh_keys` | List SSH keys in your organization |
| `create_ssh_key` | Add an SSH public key to your organization |
| `delete_ssh_key` | Delete an SSH key |
| `add_ssh_key_to_instance` | Add an SSH public key to a running instance's authorized\_keys |
### Port Forwarding
| Tool | Description |
| -------------- | --------------------------------------------- |
| `list_ports` | List all instances with their forwarded ports |
| `forward_port` | Forward HTTP ports on an instance |
| `delete_port` | Remove forwarded ports from an instance |
### Connectivity
| Tool | Description |
| ----------------- | ----------------------------------------------------- |
| `get_ssh_command` | Get the SSH command to connect to an instance |
| `get_scp_command` | Get the SCP command to copy files to/from an instance |
### Billing & Usage
| Tool | Description |
| ---------------------- | --------------------------------------------------------------------------- |
| `get_meter_data` | Get GPU usage metrics for a time period (hourly, daily, weekly, or monthly) |
| `get_upcoming_invoice` | Get estimated charges for the current billing period |
| `get_invoice_history` | Get historical invoices for your organization |
| `get_subscription` | Get subscription details including plan, status, and payment info |
### API Tokens
| Tool | Description |
| -------------- | ----------------------------------------------- |
| `list_tokens` | List all named API tokens for your organization |
| `create_token` | Create a new named API token |
| `delete_token` | Delete a named API token |
## Prompts
The MCP server includes built-in prompts that guide your agent through common multi-step workflows:
| Prompt | Description |
| ----------------------- | ------------------------------------------------------------------- |
| `create-dev-instance` | Set up a GPU development instance with sensible defaults |
| `deploy-model` | Deploy an ML model (supports Ollama, vLLM, and Transformers) |
| `check-costs` | Review current GPU usage and costs |
| `snapshot-and-teardown` | Save instance state and clean up |
| `setup-comfyui` | Spin up a GPU instance with ComfyUI for AI image generation |
| `setup-jupyter` | Launch a Jupyter Lab environment on a GPU instance |
| `fine-tune-model` | Set up a GPU instance for fine-tuning with LoRA or full fine-tuning |
| `benchmark-gpu` | Run a quick GPU benchmark on an instance to verify performance |
## Example Usage
Once configured, you can ask your AI agent things like:
* "Spin up an A100 instance with PyTorch"
* "What GPU types are available and how much do they cost?"
* "Which GPUs are available right now?"
* "List my running instances"
* "Run `nvidia-smi` on my instance"
* "Delete instance inst-abc123"
* "Forward port 8080 on my instance"
* "Create a snapshot of my instance before I make changes"
* "Deploy Llama 3 on a GPU"
* "How much have I spent this month?"
* "Show my invoice history"
* "Create an API token for my CI pipeline"
## Troubleshooting
**Authentication fails or browser doesn't open**: Run `/mcp` in Claude Code to manually trigger authentication. Make sure you're logged in to your Thunder Compute account in the browser.
**"Protected resource does not match" error**: The URL in your MCP config must match the server's configured resource URL exactly. Ensure you're using `https://www.thundercompute.com/mcp`.
**"token has invalid issuer" error**: This is a server-side configuration issue. The MCP authentication client must be configured with the correct Stytch Connected Apps domain.
**Tools not appearing**: Restart your AI agent after changing MCP configuration. Most agents only read MCP config on startup.
## MCP Directories
Thunder Compute is listed on major MCP directories for easy discovery:
* [**Smithery**](https://smithery.ai/server/@thunder-compute/thunder-compute) — One-click install for supported clients
* [**MCP Registry**](https://registry.modelcontextprotocol.io) — The official Model Context Protocol server registry
* [**Glama**](https://glama.ai) — Auto-indexed from the MCP Registry
* [**PulseMCP**](https://pulsemcp.com) — Auto-indexed from the MCP Registry
If your MCP client supports browsing directories, search for "Thunder Compute" to find and install the server directly.
# Thunder Compute Referral Program
Source: https://www.thundercompute.com/docs/guides/referral-program
Earn credits by referring friends to Thunder Compute. Get 3% of every dollar your referrals spend on GPU instances with our lifetime rewards program.
**Refer a friend, earn credit.** Share your unique referral link and receive credits every time someone you refer spends on Thunder Compute GPUs.
This program is currently in beta. Terms may evolve as we improve the program based on user feedback.
## How It Works
Our referral program rewards you with **3% of every dollar** your referrals spend on GPU instances. Here's what you need to know:
* **Reward Rate:** 3% of all spending by referred users
* **Duration:** Lifetime rewards for each referred customer
* **Credits:** Paid out in Thunder Compute credits (non-transferable)
* **Tracking:** Credits apply to paid, consumed compute resources. These typically post within minutes of a finalized invoice for consumed compute.
We created this program as a way to give back to our community. Rather than paying advertisers, we want to reward you for your contribution to Thunder Compute.
By referring even a medium-size startup you can often receive thousands of dollars of free compute.
## Getting Started
### 1. Find Your Referral Link
1. Sign in to the [Thunder Compute Console](https://console.thundercompute.com/)
2. Navigate to **Referrals** in the sidebar
3. Copy your unique referral link
4. Share it anywhere—social media, tutorials, blog posts, or direct messages
### 2. Share and Earn
Once someone creates a new account using your link and starts using GPU instances, you'll automatically earn 3% of their payments as credits.
## Eligibility Requirements
### For Referrers
* Active Thunder Compute account in good standing
* No restrictions on sharing methods or platforms
### For Referrals
* Must create a **new account** via your referral link
* Existing accounts that sign up through referral links are not eligible
* Self-referrals and duplicate accounts are prohibited
Credits are non-transferable and cannot be converted to cash. They can only be used for Thunder Compute services.
## Program Rules
### Fair Use Policy
We maintain strict anti-fraud measures to ensure program integrity:
* Creating fake accounts is prohibited
* Self-referrals will result in credit removal
* Violating Thunder Compute's Terms & Conditions may lead to account suspension
* All referral activity is monitored for suspicious patterns
### Program Changes
Thunder Compute reserves the right to:
* Modify reward rates or eligibility requirements
* Update program terms with advance notice
* Discontinue the program if necessary
We'll announce any changes through email notifications and documentation updates.
## Frequently Asked Questions
**Q: When do I receive my referral credits?**
A: Credits are typically added to your account within minutes of your referral's successful invoice.
**Q: Is there a limit to how much I can earn?**
A: No, there's no cap on referral earnings. The more successful referrals you make, the more you earn.
**Q: Can I refer existing Thunder Compute users?**
A: No, only new users who create accounts through your referral link are eligible.
**Q: What counts as a qualifying payment?**
A: Only direct card payments for GPU instances qualify for referral rewards. Usage on free or referral credits do not qualify.
## Need Help?
Have questions about referral eligibility, credit posting, or the program in general? Contact our support team:
* **Discord:** Join our [community server](https://discord.com/invite/nwuETS9jJK)
Thank you for giving back to the Thunder Compute community!
# Jupyter Notebooks
Source: https://www.thundercompute.com/docs/guides/running-jupyter-notebooks-on-thunder-compute
Execute Jupyter Notebooks on Thunder Compute cloud GPUs. Configure remote kernels and process intensive data workloads in a notebook environment.
## Prerequisites for a Jupyter Notebook with Cloud GPU
* A supported editor installed: VSCode, Cursor, or Windsurf
* The Thunder Compute extension installed in that editor
* The Jupyter extension installed in that editor
## Steps to Launch Your Notebook
### 1. Connect to a Thunder Compute cloud GPU in VSCode
Follow the instructions in our [quickstart](/vscode/quickstart) guide to set and connect to a remote instance in VSCode.
### 2. Install the Jupyter extension in your cloud workspace
Open the Extensions panel and install the Jupyter extension inside your Thunder Compute instance.
### 3. Verify GPU availability inside the notebook
Create a Jupyter Notebook, which is now connected to a Thunder Compute instance with GPU capabilities. To confirm that the GPU is accessible, run the following in a notebook cell:
```
import torch
print(torch.cuda.is_available())
```
If everything is set up correctly, the output should be:
```
True
```
You now have a Jupyter Notebook running on a Thunder Compute cloud GPU, a fast and low-cost alternative to Colab for indie developers, researchers, and data scientists.
# Speeding Up Snapshots
Source: https://www.thundercompute.com/docs/guides/speeding-up-snapshots
Accelerate snapshot creation and restoration on Thunder Compute. technical optimizations to reduce backup latency and improve data speed.
The size of your instance's disk directly affects how long snapshots take to create and restore.
This guide focuses on simple, high-impact steps to reduce snapshot size and speed up restores. We’ll expand this guide as more snapshot features ship.
## Quick Wins
1. **Keep your instance disk lean**: Remove large, transient files before snapshotting.
2. **Exclude non-essential data**: Use `.thunderignore` to skip caches, build outputs, and generated assets.
## .thunderignore Files for Exclusion
Often, you may want to exclude certain heavy files, cache directories, or generated files from a snapshot. You can do this using a `.thunderignore` file. This will help speed up snapshot creation and restoration.
1. Create a `.thunderignore` file in the `/` directory of your instance.
2. Add all paths you would like to ignore (absolute paths or relative to `/`). Patterns are supported - the syntax for these is the same as [`filepath.Match`](https://pkg.go.dev/path/filepath#Match) in Go. Patterns are matched against paths, not just basenames, so use `/` to anchor from the root (for example, `/data/*.parquet`). `*` and `?` are supported; `**` is not special and is treated literally. Blank lines are ignored, and lines starting with `#` are treated as comments.
3. Create your snapshot. The `.thunderignore` file is included in the snapshot so your exclusions persist on restore.
Start by excluding caches, build outputs, and temporary files. You’ll usually see the biggest size reductions there.
Make sure you don’t exclude anything required to run your workloads after restore, such as model weights or datasets you actually need.
Example `.thunderignore`:
```
# Caches and build artifacts
.cache/*
*.tmp
# Large data
/data/*.parquet
/models/*.pt
# Common language build outputs
/node_modules/*
/dist/*
/target/*
```
# Stopping Instances
Source: https://www.thundercompute.com/docs/guides/stopping-instances
Manage instance states to optimize billing on Thunder Compute. Learn how to pause and resume compute resources using snapshots.
## The Workflow
Thunder Compute does not have a native "Stop" feature for instances. Fortunately, you can achieve the same result by using snapshots.
To "stop" an instance, follow these three steps:
1. **Create a snapshot:** This saves the current state of the running instance.
2. **Delete the instance:** Once snapshot creation is underway, you can safely delete the running instance.
3. **Restore the snapshot:** Create a new instance by using your saved Snapshot as the template.
### 1. Create a Snapshot
First, capture the current state of your running instance. You can trigger this through any of our supported interfaces:
**Guides:** [VS Code](https://www.thundercompute.com/docs/vscode/operations/snapshots#create-a-snapshot) | [CLI](https://www.thundercompute.com/docs/cli/operations/snapshots#create-a-snapshot) | [Console](https://www.thundercompute.com/docs/console/operations/snapshots#create-a-snapshot)
### 2. Delete the Running Instance
Once the snapshot is initiated, delete the instance.
**Guides:** [VS Code](/vscode/operations/deleting-instances#delete-an-instance) | [CLI](/cli/operations/deleting-instances#delete-an-instance) | [Console](/console/operations/deleting-instances#delete-an-instance)
You can delete your instance immediately after triggering the snapshot.
### 3. Restore from Snapshot
When you are ready to resume, create a new instance using your snapshot as the template.
**Guides:** [VS Code](/vscode/operations/snapshots#restore-from-a-snapshot) | [CLI](/cli/operations/snapshots#restore-from-a-snapshot) | [Console](/console/operations/snapshots#restore-from-a-snapshot)
***
## Important Notes
The time required to create and restore snapshots varies based on the size of the snapshot.
* **Cost Efficiency:** You only pay for the snapshot storage while the instance is deleted; significantly cheaper than keeping an instance running.
# Using Docker
Source: https://www.thundercompute.com/docs/guides/using-docker-on-thundercompute
Containerize applications using Docker on Thunder Compute. Manage images, deploy containers, and optimize Docker environments on cloud GPU instances.
## Disclaimer: Docker support is experimental
Docker has experimental support inside Thunder Compute instances. Because Thunder Compute instances
are themselves containers, running Docker on Thunder Compute is like running Docker inside of
Docker. To get this to work, our instances come with a modified version of `dockerd`, and there are
certain situations when it might not work exactly like the official Docker (eg, advanced
networking features).
# Running Docker
Start your container with the `--device nvidia.com/gpu=all` flag in order to expose GPUs. For example: `docker run -it --rm --device nvidia.com/gpu=all ubuntu:latest`.
Some tutorials will tell you to use `--runtime=nvidia` or `--gpus=all`. These are outdated options and are not supported in Thunder Compute. `--device nvidia.com/gpu=all` is the only supported way to add a GPU to a docker container.
# Known issues
* Docker Compose does not work.
* The container network is not isolated. This means that even ports you don't list with `-p` will be available, and could potentially conflict with other processes or containers.
* Sometimes when the container is destroyed, the processes in it will not be properly killed. This can cause e.g. port conflicts if you then try to start the same container again. You can use standard tools like `ps aux` and `kill` to find and stop any remaining container processes.
If you run into issues, please [contact us](https://www.thundercompute.com/contact).
# Use Instance Templates for AI
Source: https://www.thundercompute.com/docs/guides/using-instance-templates
Quickly deploy LLMs (Ollama) and AI image generators (ComfyUI) on Thunder Compute using pre-configured instance templates. Get started fast.
Thunder Compute gives indie developers, researchers and data scientists instant access to **affordable cloud GPUs**. Our pre-configured **instance templates** set up popular AI stacks automatically, so you can **run LLMs** or **generate AI images** in minutes.
## AI Templates on Cheap Cloud GPUs
We currently offer:
* **Ollama** – launches an Ollama server for open-source large language models
* **ComfyUI** – installs ComfyUI for fast AI-image generation workflows
## Deploy a Template
1. **Create an instance**
```bash theme={null}
# Launch an Ollama instance
tnr create --template ollama
# Launch ComfyUI
tnr create --template comfy-ui
```
2. **Check the instance is ready**
```bash theme={null}
tnr status
```
Wait until the status shows the instance is running before connecting.
3. **Connect to the instance**
```bash theme={null}
tnr connect 0 # replace 0 with your instance ID
```
Port forwarding is handled automatically when you connect. The `-t` flag is unnecessary.
4. **Start the service**
```bash theme={null}
# Ollama
start-ollama
# ComfyUI
start-comfyui
```
Required ports forward to your local machine automatically.
## Template Details
### Ollama Template
* Forwards port **11434**
* Access the API at `http://localhost:11434`
* Ready for popular Ollama models
### ComfyUI Template
* Forwards port **8188**
* Mounts the `ComfyUI` directory to your Mac or Linux host
* UI at `http://localhost:8188`
* Includes common nodes and extensions
## Need Help?
Encounter problems or have questions? Reach out to our support team any time.
# Weights & Biases
Source: https://www.thundercompute.com/docs/guides/weights-and-biases
Track, debug, and optimize GPU-heavy workloads on Thunder Compute instances using Weights & Biases (wandb).
Weights & Biases (wandb) is an experiment tracking and model management platform that’s particularly useful when training large models on Cloud GPUs. It helps you:
* Track training runs, hyperparameters, and metrics
* Monitor GPU/CPU utilization in real time
* Version datasets and model checkpoints
* Run large-scale hyperparameter sweeps across many GPU instances
On Thunder Compute, wandb helps you monitor GPU utilization, identify bottlenecks, and track training metrics.
***
## Prerequisites
* A Thunder Compute GPU instance created and connected
* Python environment set up on your instance
* A Weights & Biases account ([https://wandb.ai/site](https://wandb.ai/site))
***
## Installation
Install wandb on your Thunder Compute instance:
```bash theme={null}
pip install wandb
```
Or add to a `requirements.txt`:
```bash theme={null}
echo "wandb" >> requirements.txt
pip install -r requirements.txt
```
***
## Authentication
Authenticate with:
```bash theme={null}
wandb login
```
Or via environment variable:
```bash theme={null}
export WANDB_API_KEY="your_api_key"
wandb login --relogin
```
You will see the following below:
```
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
```
Enter your API key which can be found on the homepage of wandb.ai after you create an account, once entered you will see:
```
wandb: No netrc file found, creating one.
wandb: Appending key for api.wandb.ai to your netrc file: /home/ubuntu/.netrc
wandb: Currently logged in as: username (entity-name) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
```
For shared or production Thunder instances, environment variables or secret
managers are preferred over pasting API keys directly.
***
## Getting Started
Follow these steps to run your first wandb experiment on your Thunder Compute instance.
### Step 1 — Create a Training File
Create a new Python file on your instance:
```bash theme={null}
nano train.py
```
Or create a new file within your IDE connected over SSH.
### Step 2 — Paste Minimal Working Example
Copy this minimal example into your `train.py` file:
```python theme={null}
import wandb
import time
# Initialize wandb
wandb.init(
project="thunder-resnet",
name="quick-test",
config={
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 5,
},
)
# Simple training loop simulation
for epoch in range(5):
# Simulate training metrics
train_loss = 1.0 / (epoch + 1)
train_acc = 0.5 + epoch * 0.1
# Log metrics to wandb
wandb.log({
"epoch": epoch,
"train/loss": train_loss,
"train/accuracy": train_acc,
})
time.sleep(0.5) # Simulate work
wandb.finish()
```
### Step 3 — Run the Script
Execute your training script:
```bash theme={null}
python train.py
```
### Step 4 — Expected Output
You should see output similar to:
```
wandb: Currently logged in as: your-username (entity-name) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.23.0
wandb: Run data is saved locally in /home/ubuntu/wandb/run-20251120_135726-abcd
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run quick-test
wandb: ⭐️ View project at https://wandb.ai/entity-name/thunder-resnet
wandb: 🚀 View run at https://wandb.ai/entity-name/thunder-resnet/runs/abcd
wandb:
wandb: Run history:
wandb: epoch ▁▃▅▆█
wandb: train/accuracy ▁▃▅▆█
wandb: train/loss █▄▂▁▁
wandb:
wandb: Run summary:
wandb: epoch 4
wandb: train/accuracy 0.9
wandb: train/loss 0.2
wandb:
wandb: 🚀 View run quick-test at: https://wandb.ai/entity-name/thunder-resnet/runs/abcd
wandb: ⭐️ View project at: https://wandb.ai/entity-name/thunder-resnet
wandb: Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20251120_135726-abcd/logs
```
### Step 5 — View Your Results
1. **View your dashboard**: Click the link in the output or visit [https://wandb.ai](https://wandb.ai) and navigate to your project
2. **View in Table view**: Go to **(Project Name)** > **Projects** > **thunder-resnet** > **Table** to see all your runs in a tabular format
3. **Compare runs**: Run the script multiple times with different configurations to compare results
4. **Add artifacts**: See the [Model Checkpointing with Weights & Biases Artifacts](#model-checkpointing-with-weights--biases-artifacts) section to version checkpoints and datasets
5. **Scale to multi-GPU**: Check out [Distributed Training](#distributed-training-ddp-lightning-deepspeed) for multi-GPU setups
6. **Run sweeps**: Use [Hyperparameter Sweeps](#hyperparameter-sweeps-multi‑gpu-multi‑instance) for automated hyperparameter search
***
## Viewing Results
1. Visit [https://wandb.ai/site](https://wandb.ai/site)
2. Select your project
3. Explore:
* Metrics charts
* GPU utilization
* Model checkpoints
* Dataset artifacts
* Sweep dashboards
***
## Core Concepts for Cloud GPU Workloads
When using remote GPUs, these wandb features matter most:
1. **Run tracking** — metrics, hyperparameters, logs
2. **GPU/system monitoring** — GPU utilization, power, memory, CPU load
3. **Artifacts** — versioned checkpoints and datasets
4. **Sweeps** — distributed hyperparameter search
5. **Groups & jobs** — organize multi-GPU/distributed training
***
## Basic Usage
### Initialize a Run
```python theme={null}
import wandb
wandb.init(
project="my-thunder-project",
name="baseline-resnet50",
config={
"learning_rate": 3e-4,
"batch_size": 64,
"epochs": 20,
"optimizer": "adamw",
"precision": "fp16",
},
)
```
### Log Metrics
```python theme={null}
wandb.log({
"train/loss": loss,
"train/accuracy": acc,
"step": step,
})
```
### Best Logging Practices
* Log every **N steps** (e.g., 10–50) to minimize overhead
* Avoid logging huge tensors every step
* Use artifacts for large files
***
## GPU & System Monitoring
Wandb automatically collects:
* GPU utilization
* GPU memory usage
* GPU temperature and power
* CPU usage
* RAM usage
* Disk and network I/O
Use these graphs to diagnose:
* **GPU-bound** workloads
* **Data-bound** workloads
* **Bottlenecks** due to I/O or preprocessing
* **Too-small batch sizes**
### Improving GPU Utilization
* Increase batch size until GPU memory is near capacity
* Use **mixed precision** (`torch.cuda.amp`)
* Increase dataloader workers
* Preload/augment data on the GPU
* Reduce unnecessary synchronizations
***
## Model Checkpointing with Weights & Biases Artifacts
When you train on Thunder Compute GPU instances, it's important that your model checkpoints are **not** tied to a single machine. Weights & Biases Artifacts provide a simple way to:
* Persist checkpoints even if the instance is deleted
* Move checkpoints between different Thunder instances (or GPU types)
* Share models with your team
* Reproduce and resume long-running training jobs
This section provides a walkthrough of how to do checkpointing with wandb.
***
### Why use Artifacts for checkpoints?
Saving checkpoints only to the local filesystem is risky:
* Thunder instances may be stopped or recreated
* You may want to resume training on a *different* GPU (A100 → H100)
* Your team may need to reuse your model
* You may want versioned, reproducible training history
Artifacts solve this by storing checkpoints in W\&B's managed, versioned storage.
***
### Step 1 — Save a checkpoint locally during training
Inside your real training loop, periodically save a checkpoint.\
For real projects (PyTorch):
```python theme={null}
import torch
# ... inside your training loop ...
if (epoch + 1) % 5 == 0:
ckpt_path = f"checkpoints/model_epoch_{epoch+1}.pt"
torch.save(model.state_dict(), ckpt_path)
```
> It is best practice to save checkpoints inside a dedicated `checkpoints/` folder.
***
### Step 2 — Log the checkpoint as a W\&B Artifact
Right after saving your file:
```python theme={null}
import wandb
artifact = wandb.Artifact(
name=f"resnet50-epoch-{epoch+1}",
type="model",
metadata={
"epoch": epoch + 1,
"val_loss": float(val_loss),
"val_accuracy": float(val_acc),
},
)
artifact.add_file(ckpt_path)
wandb.log_artifact(artifact)
```
This uploads your checkpoint to W\&B and keeps a permanent copy.
***
### Step 3 — View & manage checkpoints in the W\&B UI
1. Go to your wandb project
2. Open the **Artifacts** tab
3. Click your model artifact
4. You can now:
* View version history (v0, v1, v2…)
* Open the metrics/metadata
* Download the checkpoint
* Use it as an input for new runs
***
### Step 4 — Restore a checkpoint on another Thunder instance
On a fresh machine:
```python theme={null}
import wandb
import torch
run = wandb.init(project="my-thunder-project", job_type="restore")
artifact = run.use_artifact(
"wato/my-thunder-project/resnet50-epoch-10:latest",
type="model",
)
artifact_dir = artifact.download()
checkpoint = torch.load(f"{artifact_dir}/model_epoch_10.pt", map_location="cuda")
model.load_state_dict(checkpoint)
model.to("cuda")
```
You now have the exact model weights from your previous run — even if the original instance is gone.
***
### Step 5 — Resume training
```python theme={null}
model.load_state_dict(checkpoint)
model.to("cuda")
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
start_epoch = 10
for epoch in range(start_epoch, config.epochs):
train_one_epoch(...)
validate(...)
wandb.log({"epoch": epoch})
```
***
### Example: Adding Checkpointing to a Minimal `train.py`
Here is a working example using the simple training script from the Getting Started section.
This example simulates a checkpoint file (JSON), but the workflow is identical for real model weights.
```python theme={null}
import wandb
import time
import json
import os
# Initialize wandb
wandb.init(
project="thunder-resnet",
name="quick-test",
config={
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 5,
},
)
os.makedirs("checkpoints", exist_ok=True)
for epoch in range(5):
# Simulate training metrics
train_loss = 1.0 / (epoch + 1)
train_acc = 0.5 + epoch * 0.1
# Log metrics to wandb
wandb.log({
"epoch": epoch,
"train/loss": train_loss,
"train/accuracy": train_acc,
})
# ---- Checkpointing Example ----
# In a real project this would be torch.save(model.state_dict(), ...)
checkpoint_path = f"checkpoints/epoch_{epoch}.json"
with open(checkpoint_path, "w") as f:
json.dump({
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc,
}, f)
# Log checkpoint as an artifact
artifact = wandb.Artifact(
name=f"quick-test-epoch-{epoch}",
type="model",
metadata={
"epoch": epoch,
"train_loss": train_loss,
"train_accuracy": train_acc
},
)
artifact.add_file(checkpoint_path)
wandb.log_artifact(artifact)
# --------------------------------
time.sleep(0.5)
wandb.finish()
```
This example demonstrates:
* how checkpoint files are created
* how they are logged as Artifacts
* how each epoch becomes a tracked, versioned checkpoint
These appear in the **Artifacts** tab of your project.
***
### Quick Reference: Other Artifact Types
Artifacts aren't just for model checkpoints. You can also version datasets:
```python theme={null}
# Logging a Dataset
dataset = wandb.Artifact("imagenet-subset", type="dataset")
dataset.add_dir("data/imagenet_subset")
wandb.log_artifact(dataset)
```
***
## Hyperparameter Sweeps (Multi‑GPU, Multi‑Instance)
Sweeps allow large-scale hyperparameter search across many Thunder Compute instances.
### Step 1 — Create `sweep.yaml`
```yaml theme={null}
program: train.py
project: thunder-resnet
method: bayes
metric:
name: val/accuracy
goal: maximize
parameters:
learning_rate:
min: 0.00001
max: 0.001
batch_size:
values: [32, 64, 128]
weight_decay:
min: 0.0
max: 0.1
augment:
values: ["none", "light", "heavy"]
```
Output:
```
wandb: Creating sweep from: sweep.yaml
wandb: Creating sweep with ID: fgbkmk3q
wandb: View sweep at: https://wandb.ai/entity-name/thunder-resnet/sweeps/fgbkmk3q
wandb: Run sweep agent with: wandb agent entity-name/thunder-resnet/fgbkmk3q
```
### Step 2 — Initialize the sweep:
```bash theme={null}
wandb sweep sweep.yaml
```
### Step 3 — Run agents on Thunder GPU instances:
```bash theme={null}
wandb agent //
```
Each agent pulls new hyperparameters and launches a run automatically.
***
## Distributed Training (DDP, Lightning, DeepSpeed)
### PyTorch DDP Example
```python theme={null}
wandb.init(
project="thunder-ddp",
group="llama7b-a100x4",
job_type="training",
)
```
Set run names per rank:
```python theme={null}
wandb.run.name = f"gpu-{rank}"
```
### PyTorch Lightning Example
```python theme={null}
from lightning.pytorch import Trainer
from lightning.pytorch.loggers import WandbLogger
wandb_logger = WandbLogger(project="thunder-lightning-demo")
trainer = Trainer(
logger=wandb_logger,
accelerator="gpu",
devices=4,
strategy="ddp",
max_epochs=50,
)
trainer.fit(model)
```
Lightning automatically:
* Logs metrics and gradients
* Tracks checkpoints
* Handles multi-GPU logging
***
## Offline Mode (Air‑Gapped or Firewalled Environments)
Thunder instances may have intermittent or restricted internet access.
### Run in offline mode:
```bash theme={null}
export WANDB_MODE=offline
python train.py
```
### Sync later:
```bash theme={null}
wandb sync /path/to/wandb/run-folder
```
### Fully disable wandb:
```bash theme={null}
export WANDB_MODE=disabled
```
***
## Best Practices for Thunder Compute GPU Instances
### Run Management
* Use meaningful run names that include dataset + model + GPU type
* Log all hyperparameters in `wandb.config`
* Track system metrics to diagnose bottlenecks
* Organize multi-GPU runs using `group`
* Reduce logging overhead by batching logs
### Artifacts & Checkpointing
* Use meaningful artifact names (e.g. `llama7b-a100-epoch20`)
* Attach useful metadata (epoch, val metrics, dataset version)
* Log fewer but higher-quality checkpoints
* Always use artifacts for long or expensive runs
* Use `use_artifact(...).download()` to restore weights anywhere
* Use artifacts for datasets and checkpoints
### Experimentation
* Use sweeps for expensive experiments
* Compare runs systematically using the dashboard
* Monitor GPU utilization to optimize batch sizes
***
## Troubleshooting
### Authentication Issues
```bash theme={null}
wandb login --relogin
```
### GPU Metrics Not Showing
* Ensure `nvidia-smi` works inside the environment
* Use GPU-enabled containers (`--gpus all`)
* Call `wandb.init()` early
### Connection Issues
* Verify outbound internet access
* Firewalls must allow connections to `*.wandb.ai`
* Use offline mode if required
### Large File Uploads
* Always use artifacts for multi-GB files
* Compress large checkpoints
* Prune old versions
***
## Need Help?
* W\&B Docs: [https://docs.wandb.ai](https://docs.wandb.ai)
* Thunder Compute Discord: [https://discord.com/invite/nwuETS9jJK](https://discord.com/invite/nwuETS9jJK)
* Email support: `support@thundercompute.com`
# Prototyping vs Production
Source: https://www.thundercompute.com/docs/prototyping-vs-production
Differentiate prototyping and production environments on Thunder Compute. Select hardware and configurations optimized for your project scale.
Thunder Compute offers two modes for running instances.
| Feature | Prototyping | Production |
| ------------------------- | ------------------------ | ----------------------- |
| Cost | Lower | Higher |
| Compatibility | Most ML workloads | Full CUDA compatibility |
| GPUs | A6000, A100, H100 | A100, H100 |
| Multi-GPU | A100, H100: up to 2 GPUs | Up to 8 GPUs |
| Graphics (OpenGL, Vulkan) | No | Yes |
## Prototyping Mode
Prototyping mode is currently in beta and exclusively available on Thunder Compute.
Prototyping mode applies CUDA-level optimizations to maximize GPU utilization, significantly reducing costs for AI/ML development workflows.
### Supported Software
* **PyTorch**: Fully supported (downgrading from the pre-installed version may cause issues)
* **TensorFlow**
* **JAX**
* **Jupyter Notebooks**
* **Model Serving**: ComfyUI, Ollama, VLLM, and others
* **Fine Tuning**: Unsloth and others
### Unsupported Workloads
* Graphics workloads (OpenGL, Vulkan, FFMPEG)
* Custom CUDA kernels (may exhibit unpredictable behavior)
* Hardware-specific profiling tools
If you encounter issues with an unsupported workload, switch to production mode with [modify](/vscode/operations/modifying-instances) for full compatibility.
## Production Mode
Production mode provisions a standard virtual machine with full CUDA compatibility and predictable performance.
### When to Choose Production
* Long-running training jobs
* Multi-GPU workloads (up to 8 GPUs)
* Graphics workloads (OpenGL, Vulkan, FFMPEG)
* Custom CUDA kernels
* Workloads requiring accurate hardware metrics
## Switching Between Modes
[Modify existing instances](/vscode/operations/modifying-instances) to switch between prototyping and production mode. This also lets you change GPU type, vCPUs, and RAM. Storage can be expanded but not reduced.
## Learn More
* [Technical Specifications](/technical-specs) - Hardware, networking, and storage details
# Restrictions
Source: https://www.thundercompute.com/docs/restrictions
Analyze platform restrictions for Thunder Compute services. Review technical limitations, usage policies, and resource constraints for accounts.
## Prohibited Activities
### Cryptocurrency Mining
Mining, staking, or otherwise interacting with cryptocurrency is strictly prohibited on Thunder Compute. If cryptocurrency-related activity is detected:
* The associated account is immediately banned
* Any billing credit is revoked
* The account is billed for the full amount of usage
## Geographic Availability
### B2B Requirements
Thunder Compute is only available for B2B customers and requires a VAT ID (or similar) in the following countries:
* United Arab Emirates
* Angola
* Bahrain
* Brazil
* Switzerland
* Côte d’Ivoire (Ivory Coast)
* Colombia
* Algeria
* Georgia
* Iraq
* Jordan
* Kazakhstan
* South Korea (Republic of Korea)
* Kuwait
* Morocco
* North Macedonia
* Oman
* Paraguay
* Qatar
* Saudi Arabia
* Tunisia
* Turkey (Türkiye)
* Tanzania
* Ukraine
* Uganda
* Uzbekistan
* Yemen
* India
* Moldova (Republic of Moldova)
### Restricted Countries
Thunder Compute is not currently available in the following countries:
* Belarus
* China
* Cuba
* Indonesia
* Iran
* Kenya
* North Korea
* Malaysia
* Mexico
* Nigeria
* Russia
* Sudan
* Syria
* Uruguay
If you're located in one of these countries and need access to Thunder Compute, please contact us to discuss potential alternatives.
## Usage Guidelines
### Acceptable Use
Thunder Compute instances are intended for legitimate computational workloads, particularly:
* AI/ML development and training
* Scientific computing
* Data processing and analysis
* Software development and testing
We have a strict one-account-per-user policy.
### Resource Usage
Users must comply with fair use policies and avoid activities that:
* Violate terms of service
* Engage in illegal or unethical activities
## Support
If you have questions about restrictions or need clarification on acceptable use, contact our support team.
# Technical Specifications
Source: https://www.thundercompute.com/docs/technical-specs
Hardware specifications, networking details, and pre-installed software for Thunder Compute instances
## Instance Infrastructure
### Hardware Specifications
* **GPU and CPU configs**: Check [pricing page](https://www.thundercompute.com/pricing) for latest availability
* **Location**: North America
### Pre-installed Software
* **CUDA**: Version 13.0
* **CUDA Driver**: Version 580
* **PyTorch**: Version 2.9.0+cu128
* **JupyterLab**: Pre-installed
* Additional scientific Python libraries (NumPy, Pandas, etc.)
Do not attempt to reinstall CUDA. If compatibility issues arise, use a venv and change the versions of your other dependencies (e.g., PyTorch) rather than modifying the CUDA libraries.
## Storage
* **Persistent Disk**: Your home directory and OS. Preserved across modifications and included in snapshots. Can be expanded but not shrunk.
* **Ephemeral Storage**: Optional fast local NVMe disk mounted at `/ephemeral`. Not included in snapshots and lost when the instance is modified or deleted. Ideal for model weights, caches, and scratch files. See [Ephemeral Storage](/guides/ephemeral-storage).
| Storage Type | Prototyping Range | Production Range |
| ----------------- | ----------------- | ---------------- |
| Persistent Disk | 100 - 400 GB | 100 - 1000 GB |
| Ephemeral Storage | 0 - 300 GB | 0 - 500 GB |
## Networking
* **Egress/Ingress**: 7 Gbps
* **IP Address**: Dynamic
### Port Access
* **Public URLs (CLI)**: Use `tnr ports forward` to expose HTTP services at `https://-.thundercompute.net` with automatic HTTPS and DDoS protection. See [Port Forwarding](/cli/operations/port-forwarding) for details.
* **Local tunneling (CLI)**: Use `tnr connect -t ` to tunnel ports to your local machine
* **VS Code**: Use the built-in [port forwarding](https://code.visualstudio.com/docs/debugtest/port-forwarding) feature
# Troubleshooting
Source: https://www.thundercompute.com/docs/troubleshooting
Troubleshoot common Thunder Compute errors. Find solutions for connection issues, function errors, SSH problems, and access logs. Get support via Discord.
## Common solutions
1. Reconnect to the instance with `ctrl + d` and `tnr connect `
2. Upgrade tnr. Depending on your install method, you may have to use `pip install tnr --upgrade` or re-download the binary from the website
3. Back up any important data, then delete and recreate the instance.
## Common errors
### Function not implemented
A common error you may encounter is some variant of "This function is not implemented." What this means is that your program touches a portion of the CUDA API that we do not currently support. Check our [Prototyping vs Production](/prototyping-vs-production) guide for supported features, and if you encounter this, please contact us.
### SSH errors
If you encounter SSH-related errors (like `Error reading SSH protocol banner` or permission issues), first retry the command.
For quick fixes, back up critical data and recreate the instance. Instances cannot be stopped or restarted.
For persistent SSH issues, see our [SSH on Thunder Compute guide](/cli/operations/ssh) for alternative connection methods.
## Recommended Guides
To help prevent common issues and get the most out of Thunder Compute, we recommend these guides:
* [Using Docker](/guides/using-docker-on-thundercompute) - Learn about GPU-enabled containers and troubleshooting Docker issues
* [Using Instance Templates](/guides/using-instance-templates) - Use pre-configured environments to minimize setup issues
## Production mode as a last resort
If you continue to experience compatibility issues or errors that cannot be resolved through the above methods, consider switching to production mode by modifying your instance ([VS Code](/vscode/operations/modifying-instances), [CLI](/cli/operations/modifying-instances), or [Console](/console/operations/modifying-instances)). Production mode provides maximum stability and reliability with all low-level optimizations disabled, ensuring complete compatibility for workloads that encounter persistent issues in the prototyping tier.
## Support
The fastest way to get support is to join [our discord](https://discord.com/invite/nwuETS9jJK). Our founding team will personally respond to help you as quickly as possible.