Skip to main content
Important Change: Starting October 8, 2025, all new Thunder Compute instances use ephemeral storage. This means instances can only be created and deleted, not started and stopped. All data is lost when an instance is deleted.

What is Ephemeral Storage?

Ephemeral storage means that your instance’s disk is temporary and exists only for the lifetime of that instance. When you delete an instance, all data on it is permanently removed. This change enables several important benefits:
  • Better GPU availability and lower pricing
  • Access to new GPU types, including H100s and 8-GPU nodes
  • Faster instance creation and deletion
While we plan to reintroduce persistent storage in a more scalable form in the future, we recommend using external backup solutions to preserve your important data. With ephemeral storage, you’ll save money and avoid surprise bills from forgotten instances. Here’s how to manage your data effectively: GitHub should be your primary backup solution for:
  • Code and scripts
  • Configuration files
  • Requirements and dependencies
  • Jupyter notebooks
  • Documentation
This keeps your project versioned and easily recoverable on any new instance.

2. For Large Files, Choose What Works Best for You

For datasets, models, and checkpoints, you have two good options:

Option A: Download to Your Local Computer

The simplest approach - just download large files to your local machine when you’re done with them. This is:
  • Free - no storage costs
  • Fast - direct download/upload when you need it
  • Simple - no additional services to set up

Option B: Use Cloud Object Storage

If local storage isn’t practical, cloud services are much cheaper than our legacy persistent storage:
  • Cloudflare R2 - S3-compatible storage with zero egress fees (10GB free)
  • Google Drive - Simple and familiar interface (15GB free)
These cloud storage options cost significantly less than Thunder Compute’s legacy persistent storage and prevent you from accidentally leaving instances running and getting surprise bills.

Setting Up Backups

Using GitHub

For your code and configuration:
# Initialize a git repository
git init

# Add your project files
git add .

# Commit your changes
git commit -m "Initial commit"

# Push to GitHub (create a repo on GitHub first)
git remote add origin https://github.com/yourusername/your-project.git
git push -u origin main

Downloading Files to Your Local Computer

The simplest way to preserve your data is to download it directly to your local machine using scp, or by dragging and dropping in VSCode. With scp:
# Download a single file
scp tnr-0:~/model_checkpoint.pth ./local_backups/

# Download an entire directory
scp -r tnr-0:~/training_outputs ./local_backups/

# Upload files when creating a new instance
scp ./dataset.tar.gz tnr-0:~/
Make sure you’ve connected to your instance with tnr connect first. This sets up the tnr-0 SSH alias.

Best Practices

  1. Commit frequently - Push your code changes to GitHub regularly, especially before deleting an instance.
  2. Download important results - When you complete a training run or generate important outputs, download them to your local machine or upload to cloud storage right away.
  3. Separate data from code - Keep your code in GitHub and large datasets either on your local machine or in cloud storage (R2/Drive).
  4. Save checkpoints during long runs - For multi-day training jobs, periodically download checkpoints or upload them to cloud storage.
  5. Use automation - Create scripts that automatically save your outputs:
    # Example: Auto-upload checkpoints after each epoch
    import boto3
    
    def save_checkpoint(epoch, model):
        # Save locally
        torch.save(model.state_dict(), f'checkpoint_{epoch}.pth')
        
        # Upload to R2
        s3 = boto3.client('s3', endpoint_url='https://your-account.r2.cloudflarestorage.com')
        s3.upload_file(f'checkpoint_{epoch}.pth', 'your-bucket', f'checkpoints/checkpoint_{epoch}.pth')
    
  6. Create setup scripts - Document your environment setup in a script that can quickly recreate your environment on a new instance:
    #!/bin/bash
    # setup.sh
    
    # Clone your code repository
    git clone https://github.com/yourusername/your-project.git
    cd your-project
    
    # Install dependencies
    pip install -r requirements.txt
    
    # Upload your dataset from local machine using scp before running this
    # Or pull data from cloud storage if needed:
    # aws s3 sync s3://your-bucket/datasets ./data --profile r2 --endpoint-url https://your-account.r2.cloudflarestorage.com
    

Accessing Data from Old Instances

If you have existing instances with data you need to retrieve:
  1. Change your instance type to a T4 (no GPU) to reduce costs
  2. Download your data using one of the backup methods above
  3. You have 30 days from October 8, 2025 to retrieve your data
After 30 days, data on old instances will be permanently deleted. Make sure to back up anything important before the deadline.

Need Help?

If you run into any issues setting up your backup workflow or have questions about ephemeral storage: We’re here to help you transition smoothly to ephemeral storage!
I