ComfyUI is the tool of choice for professionals creating AI images and videos. Its node-based workflows give you precise control over every stage of the generation pipeline, from model loading to final output. But to unlock its full potential, including video generation, high-resolution image synthesis, and multi-model pipelines, you need GPU hardware that can keep up.
This guide covers everything you need to know about using ComfyUI in the cloud: system requirements, notable models, and how to get started on a cloud GPU.
ComfyUI System Requirements
Running ComfyUI locally requires hardware and software that have grown alongside the models themselves. The good news is that ComfyUI is one of the most memory-efficient frontends available. Its dynamic VRAM management unloads unused models between steps, running pipelines that would crash other tools.
.table_style { width: 100%; border-collapse: collapse; border: 1px solid #FDFFFA; color: #FDFFFA; font-family: sans-serif; }
.table_style th, .table_style td { border: 1px solid #FDFFFA; padding: 12px; text-align: left; }
.table_style th { background-color: #4C3B4D; font-weight: 800; }
.table_style td a { color: inherit; text-decoration: underline; }
.better { background-color: #093824; color: #A2CAFA; font-weight: 600; }
.table_footer { font-size: 0.8em; text-align: left; opacity: 0.9; }
Component Minimum Recommended Ideal VRAM 6GB+ 24 GB 40GB+ System RAM 16 GB 32 GB 64 GB Storage 20 GB 100 GB+ Python 3.9 3.13 PyTorch 2.4+ Latest stable release OS Windows 10, macOS Linux
The biggest limiting factor is VRAM. Running a basic Stable Diffusion XL (SDXL) workflow at 1024×1024 requires around 6–8 GB of VRAM. Add a ControlNet, a few LoRAs, and a hi-res upscaling pass, and that floor can climb past 16 GB.
Video generation models like Wan 2.2's 14B variant require 16–24 GB for 480p output and more than 24 GB for 720p, putting them well beyond most consumer GPUs.
NVIDIA GPUs with Ampere architecture (RTX 30** onwards, RTX A6000, or A100s) onward are best for ComfyUI because they support FP16 and BF16 natively, precisions most modern models use. Older cards can run ComfyUI but lack hardware-accelerated half-precision and are likely to be deprecated in future CUDA releases.
ComfyUI Models
The model you load into ComfyUI determines what you can generate and how much VRAM you need. ComfyUI supports every major model architecture, and the community produces new checkpoints constantly.
Image-to-Video with ComfyUI
Image-to-video (I2V) is among the most demanding uses of ComfyUI. You provide a reference image, and the model generates a short video clip that animates from that starting frame. The quality of modern I2V models has improved dramatically, but so have the VRAM requirements.
Wan 2.2 is the leading open-source I2V model. For lighter workloads, its 5B can run on 8 GB of VRAM with ComfyUI's native offloading enabled, making it accessible to a wider range of hardware. The 14B variant, which produces significantly better motion quality and temporal consistency, requires 16–24 GB for 480p output.
Text-to-Image with ComfyUI
Text-to-image (T2I) is the foundational ComfyUI workflow. Stable Diffusion remains a popular base architecture for T2I in 2026, with a massive ecosystem of LoRAs, ControlNets, and fine-tuned checkpoints available on Civitai and Hugging Face.
A simple Stable Diffusion workload connects a checkpoint loader, two CLIP text encoders, a KSampler, and a VAE decode node, and the pipeline converts a text prompt into an image.

For more on running diffusion models in the cloud, see our guide on
.
Image-to-Image with ComfyUI
Image-to-image (I2I) workflows take an existing image as input and transform it based on a prompt and a denoising strength parameter. Lower denoising preserves more of the original image; higher values give the model more freedom to reinterpret the scene.
This is particularly useful for style transfers, consistent character editing, and iterative refinement of images that are close to but not quite what you want.
I2I is also the foundation for inpainting (replacing a masked region of an image) and outpainting (extending the canvas beyond the original edges). ComfyUI handles all three modes natively through its node graph, and models like Flux.1 Fill-dev are specifically trained for high-quality inpainting.
How to Use ComfyUI
ComfyUI is a node graph editor served through your browser. Each node represents one operation in the generation pipeline: loading a model, encoding a prompt, sampling in latent space, or decoding the output to a visible image.
You connect nodes between their input and output ports, and when executed the graph runs from left to right.
Workflows can be imported and exported as JSON files. You can load them by dragging a .json file directly onto the ComfyUI canvas, or by using Workflow > Open from the top menu. Sites like OpenArt and ComfyWorkflows provide hundreds of community-created workflows.
ComfyUI in the Cloud: Skip the Hardware Bottleneck
The hardware requirements are making many users move their ComfyUI workflows to cloud GPUs. Video generation models alone can require 60–80 GB of VRAM, which is expensive to own. Even for image generation, the math changes significantly when you factor in the cost of buying, running, and upgrading hardware versus renting a GPU.
Thunder Compute lets you run the latest models with minimal setup and for as little as $0.35/hr. You get enterprise-grade GPU instances with persistent storage so your models and workflows stay ready between sessions. There's no local installation to manage, no driver conflicts, and no waiting for models to download.
Getting started takes three commands:
<ol><li><a href="https://www.thundercompute.com/docs/cli/quickstart">Install the tnr CLI</a></li><li>In your favorite terminal, run <code>tnr create --template comfy-ui</code></li><li>Then run <code>tnr connect 0</code> to connect to the remote instance</li><li>Finally run <code>start comfyui</code>, this will output a URL where you can use the tool.</li></ol>
Once it's running, open ComfyUI at {UUID}-8188.thundercompute.com ("UUID" is the unique identifier for your instance). This URL can be found in the terminal after running start comfyui or collapsed under your instance in VSCode.
You can import any workflow JSON, install custom nodes, and download models from Hugging Face or Civitai.
Installing Models for ComfyUI in the Cloud
By default, the Thunder Compute ComfyUI template only includes the basic Stable Diffusion checkpoint. This is enough for some light tinkering, but it won't get you very far.
To get the most out of ComfyUI you need to download models into your instance. If unsure how to start, the Template Library has some preconfigured workflows to get you started.

Most templates include model links within a note block. If you pass these links in the prompt below along to your favourite AI, it will give you the bash commands you need to install these models.
Generate a set of bash commands to run at once and install models for ComfyUI on a Thunder Compute instance with Ubuntu. Install each model in the corresponding existing directories in '/home/ubuntu/ComfyUI/models'.
{{MODEL_URLs}}
Run these commands within your instance to download the models to the correct subdirectories under ComfyUI/models/. Once it finishes, you should be able to run the workflow template.

On Thunder Compute, model downloads are fast because instances run in data centers with high-bandwidth connections. Downloading around 20GB of models to run Z-image-turbo takes around 2 minutes.
Last Thoughts on How to Use ComfyUI
ComfyUI is the most capable and flexible frontend for AI image and video generation available in 2026.
<ul><li>Node-based architecture makes complex pipelines reproducible and shareable</li><li>Efficient VRAM management lets you run larger models</li><li>Active community produces new workflows and custom nodes</li></ul>
The main barrier is hardware. Running large models at full quality, or building pipelines with multiple ControlNets and LoRAs stacked together means you need more VRAM than most consumer GPUs provide. Cloud GPUs remove that constraint entirely.
Thunder Compute gives you enterprise-grade GPU instances for as little as $0.35/hr, with a pre-configured ComfyUI template that gets you generating in minutes rather than hours. No driver conflicts, no local environment setup, no hardware to buy. Just a fast GPU and a ComfyUI instance ready to run whatever model you want to try next.
Pick the
FAQ
What is ComfyUI?
ComfyUI is an open-source, node-based interface for building and running AI image and video generation pipelines. You connect processing nodes in a visual graph, each one representing a step in the generation process: loading a model, encoding a prompt, sampling, or decoding the output. This gives you control over every parameter and makes complex workflows easy to share as JSON files.
How to use ComfyUI?
ComfyUI can be used as a standalone app. To get started you can choose a workflow from the template library or import it from a JSON file. Then add a text prompt in the CLIP Text Encode nodes, and click "Run". The graph executes from left to right. Most workflows will need you to download and install relevant model files.
