Z-Image Turbo ComfyUI: Install Guide and VRAM Requirements (July 2026)

Carl PetersonJuly 16, 202610 min read

Z-Image Turbo is an open-weight text-to-image model released by Alibaba Tongyi Lab in November 2025. It quickly gained popularity for its speed, image fidelity, and low hardware requirements relative to the quality it produces.

Understanding Z-Image Turbo: 6B Parameters, 8-Step Speed

Z-Image Turbo uses a Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture. It processes text, visual semantic, and image VAE tokens simultaneously to achieve strong output quality at only 6 billion parameters. On an RTX 4090, it generates a 1024x1024 image in roughly 2.3 seconds with 8 inference steps.

The full training workflow took 314K H800 GPU hours. The model excels at photorealistic portrait generation, cinematic lighting, natural skin textures, and bilingual text rendering. It is fully open-weight under the Apache 2.0 license, making it suitable for commercial use with few restrictions.

An anime-focused variant, Z-Anime, is also available in the same ecosystem and uses the same ComfyUI workflow pattern. The setup steps below apply to both.

Z-Image Turbo vs. Other AI Image Generation Models

Z-Image Turbo competes directly with FLUX.1, SDXL, and Midjourney, standing out through parameter efficiency and inference speed.

Model	Parameters	Min VRAM	Steps	License	Photorealism
Z-Image Turbo	6B	~6GB (GGUF)	8	Apache 2.0	Excellent
FLUX.1 Dev	12B	~12GB (FP8)	20–50	Non-commercial	Excellent
Flux.2 Klein 4B	4B	~8GB	4	Apache 2.0	Very good
SDXL	3.5B	~8GB	20–40	Open Rail M	Good
Midjourney v7	Closed	Cloud only	N/A	Subscription	Excellent
VRAM figures are approximate and vary by resolution and quantization format. Flux.1 Dev FP8 figure includes text encoder loaded in VRAM.

Z-Image Turbo delivers output quality comparable to FLUX.1 for 80% less compute cost. For users who want local image generation without a subscription and without the VRAM overhead of Flux.1 Dev, it is one of the most accessible options available today.

Z-Image Turbo VRAM Requirements

The VRAM you need depends on which precision format you choose. The three available formats cover every GPU tier from consumer 6GB cards to professional workstation GPUs.

Format	File	Min VRAM	System RAM	Quality
BF16	z_image_turbo_bf16.safetensors	14–16GB	16GB+	Best
FP8	z_image_turbo_fp8.safetensors	8GB	16GB+	Excellent (near BF16)
GGUF (Q4/Q3)	Community GGUF builds	5–6GB	32GB recommended	Good
VRAM figures include the Qwen 3 4B text encoder (~7GB) and ae.safetensors VAE (~335MB). System RAM requirements increase for GGUF variants because weights are streamed between RAM and VRAM during inference.

System RAM requirements: Z-Image Turbo requires at least 16GB of system RAM for BF16 and FP8 variants. For GGUF variants with CPU offloading, 32GB of system RAM is recommended. On 6GB cards using the GGUF Q3 variant, insufficient system RAM is the most common cause of crashes during generation.

Beyond VRAM and RAM, you also need Python 3.10 or higher, CUDA 12.x (recommended), and at least 30GB of free disk space.

Starting at $0.35/hr, Thunder Compute lets you spin up a ComfyUI instance ready for Z-Image Turbo in minutes.

Downloading Z-Image Turbo: Models, Text Encoders, and VAE

Z-Image Turbo requires three model files, all hosted on Hugging Face. Each goes into a specific subdirectory inside your ComfyUI installation.

File	Type	Destination Folder	Size
z_image_turbo_bf16.safetensors	Diffusion model	ComfyUI/models/diffusion_models/	~12GB
z_image_turbo_fp8_e4m3fn.safetensors (8GB cards)	Diffusion model	ComfyUI/models/diffusion_models/	~6GB
qwen_3_4b.safetensors	Text encoder	ComfyUI/models/text_encoders/	~7GB
ae.safetensors	VAE	ComfyUI/models/vae/	~335MB
The text encoder and VAE files are the same regardless of which diffusion model variant (BF16, FP8, or GGUF) you download. Only the diffusion model file changes between precision levels.

If your GPU has 8GB or less of VRAM, download z_image_turbo_fp8.safetensors instead of the BF16 version. For GGUF variants, download from community repositories on Hugging Face or CivitAI and place the .gguf file in ComfyUI/models/unet/. You will also need the ComfyUI-GGUF custom node by city96.

See how Z-Image Turbo compares to other open-source image generation models.

How to Install Z-Image Turbo in ComfyUI

Setting Up Your ComfyUI Directory Structure

After placing the three model files, your ComfyUI folder structure should look like this:

ComfyUI/
└── models/
    ├── diffusion_models/
    │   └── z_image_turbo_bf16.safetensors
    ├── text_encoders/
    │   └── qwen_3_4b.safetensors
    └── vae/
        └── ae.safetensors

Before continuing, make sure ComfyUI is updated to the latest version. Open ComfyUI Manager, click Update ComfyUI in the top toolbar, then restart. An outdated installation is the most common reason Z-Image nodes fail to appear or show errors on load.

Loading the Z-Image Turbo Workflow in ComfyUI

The official Z-Image Turbo workflow JSON is maintained by Comfy-Org and comes with all nodes pre-wired. Download it from the Comfy-Org GitHub repository and drag the JSON file onto the ComfyUI canvas.

If the canvas shows nodes highlighted in red, ComfyUI needs to install missing custom node packages. Use ComfyUI Manager to install them, restart, and reload the workflow. In most cases, the official workflow needs no extra custom nodes beyond a current ComfyUI installation.

Skip setup by launching instances with ready-to-launch templates for ComfyUI and Forge Neo.

Configuring Nodes: Sampler, Steps, and CFG Settings

Z-Image Turbo is distilled for 8-step inference. Start with 8 steps and a CFG scale between 1.5 and 2.0. Unlike non-distilled models, high CFG values (4+) render worse results. This is the most important setting to get right. Raising CFG to "improve prompt adherence" on a distilled model degrades the image. If a prompt isn't landing, change the wording or try a different seed before touching CFG.

For resolution, generate at 1024x1024 for the best native quality. Going directly to 2K can introduce distortion. If you need higher resolution, use an upscale node followed by a second KSampler pass at a noise value of around 0.3 to preserve similarity. Increase noise toward 0.6 to 0.7 if you want more creative variation in the upscaled result.

How to Use Z-Image Turbo in ComfyUI

Running Your First Text-to-Image Generation

Enter your prompt in the text conditioning node and click Queue Prompt. Z-Image Turbo does not need the heavy prompt engineering that older models like SDXL require. Natural-language descriptions outperform keyword lists. Skip terms like "masterpiece, best quality, 8k" since the model already understands stylistic intent from context.

For portraits, describe the lighting style ("soft window light"), skin texture, and background to give the model strong anchors. Make sure all three model files are selected in their loader nodes before running. Missing one of them produces a gray, noisy output with no error message.

Tips for Getting the Best Results

Quality beats length when writing prompts. Make focused descriptions that include lighting and environment explicitly, and avoid contradictory instructions. For portraits, photography terms like "85mm portrait lens, shallow depth of field" produce more grounded, photorealistic results than abstract aesthetic descriptors.

Z-Image Turbo responds particularly well to prompts that describe:

Lighting source and quality ("soft overcast daylight", "golden hour rim light", "studio three-point lighting")
Camera and lens characteristics ("medium close-up", "wide angle", "macro detail")
Subject texture and material ("weathered leather jacket", "smooth porcelain skin", "rough concrete background")

Using LoRAs with Z-Image Turbo for Consistent Style

LoRAs (Low-Rank Adaptation files) let you steer Z-Image Turbo toward a specific style, character, or visual theme across multiple generations. The key requirement is that the LoRA must be trained specifically on the Z-Image architecture. SDXL-trained LoRA weights do not transfer effectively, and using them produces inconsistent or degraded output.

Community Z-Image LoRAs are available on CivitAI and Hugging Face. To use one, place the .safetensors file in ComfyUI/models/loras/, then add a Load LoRA node between the diffusion model loader and the KSampler. A strength value of 0.6 to 1.0 is typical for style LoRAs. When stacking multiple LoRAs, reduce each to 0.3 to 0.5 to avoid compounding artifacts.

Stack up to three LoRA weights through ComfyUI's LoRA loader nodes for maximum stylistic control. This is particularly useful for maintaining character consistency across a series of portrait images with different backgrounds or lighting setups.

Troubleshooting Z-Image Turbo in ComfyUI

Gray or noisy output: At least one of the three required model files (diffusion model, text encoder, VAE) is missing from its loader node or placed in the wrong directory. Open each loader node and confirm the correct file is selected in the dropdown.

Nodes highlighted in red after loading the workflow: The ComfyUI version is outdated or missing a required custom node pack. Open ComfyUI Manager, click Update ComfyUI, then Install Missing Custom Nodes, and restart.

CUDA out of memory error: You are using the BF16 model on a card with less than 14GB of VRAM. Switch to the FP8 variant, or use a GGUF checkpoint with the --lowvram launch flag and at least 32GB of system RAM.

Oversaturated or blocky output: CFG is set too high. Drop it to 1.5 to 2.0. Z-Image Turbo is a distilled model trained with specific low-CFG settings. Standard SD values of 6 to 9 are incompatible with its inference recipe.

Text encoder not appearing in dropdown: The qwen_3_4b.safetensors file is in the wrong directory. It must be in ComfyUI/models/text_encoders/, not checkpoints/ or models/.

Last Thoughts on Z-Image Turbo

Z-Image Turbo delivers FLUX-level photorealism from a 6B architecture in just 8 inference steps, and it runs on consumer hardware. Setup in ComfyUI is straightforward, and on Thunder Compute the 20GB of models download in about 2 minutes, so you can start generating in under 5 minutes from account creation.

Start generating with Z-Image Turbo on Thunder Compute from $0.35/hr.

FAQ

What are the Z-Image Turbo VRAM requirements?

Z-Image Turbo BF16 requires 14–16GB of VRAM. The FP8 variant runs on 8GB. GGUF quantized variants scale down to 6GB for lower-end cards. All figures assume standard system RAM (16GB minimum, 32GB recommended for GGUF offloading).

How many inference steps does Z-Image Turbo use?

Z-Image Turbo is distilled for 8-step inference. Start with 8 steps and a CFG scale of 1.5 to 2.0. Unlike non-distilled models, high CFG values (4+) produce worse results. The model is opinionated about these settings. Changing them significantly degrades output quality.

Do SDXL LoRAs work with Z-Image Turbo?

No. SDXL-trained LoRA weights do not transfer effectively to Z-Image Turbo. Use only LoRAs trained specifically on the Z-Image architecture. Community Z-Image LoRAs are available on CivitAI and Hugging Face.

Can Z-Image Turbo run on 6GB VRAM?

Yes, with GGUF quantization. Download a GGUF variant from community repositories and load it using the ComfyUI-GGUF custom node by city96. The Q3 or Q4 variants typically run on 6GB cards, with a moderate quality reduction compared to FP8 or BF16.

Why is my Z-Image Turbo output gray or noisy?

At least one of the three required model files (diffusion model, text encoder, VAE) is missing from its loader node or in the wrong directory. Open each loader node and confirm the correct file is selected in the dropdown.