[Commercial Use OK] Z-Image_clear_photoreal Released! A Fast, High-Quality Photorealistic Model for Image Generation AI

Z-Image_clear_photoreal_upscale_10
  • Z-Image (Base) → rich in variation
  • Z-Image-Turbo → excellent stability & high speed
  • Z-Image_clear_photoreal → best of both worlds

Introduction

Hi everyone, this is Easygoing.

Today I’m excited to introduce Z-Image_clear_photoreal, the ultimate image generation AI model that delivers both high speed and high quality while being fully usable for commercial purposes.

A highly detailed, realistic portrait of a woman generated with Z-Image_clear_photoreal. Featuring vivid lighting and impressive skin texture.
Z-Image_clear_photoreal

What is Z-Image?

Z-Image is a generative AI model released by Alibaba Group on January 27, 2026, under the developer-friendly Apache-2.0 license.


gantt
    title Lightweight Image Generative AIs
    dateFormat YYYY-MM-DD
    axisFormat %Y
	tickInterval 12month

    section Alibaba
        Z-Image-Turbo :  done, 2025-11-25, 2026-02-06
        Z-Image :  2026-1-27, 2026-02-06

    section Black Forest Labs
        Flux.2 [klein]: crit, 2026-01-15, 2026-02-06

The Z-Image series first gained attention in November 2025 with the release of the fast distilled version Z-Image-Turbo, which became popular as an image generation model that runs smoothly even on mid-range PCs.

While the fast, distilled Z-Image-Turbo gained traction in late 2025 for its ability to run smoothly on mid-range PCs, the newly released Z-Image (Base) is the original core model. It represents the AI's raw "brain" exactly as it was originally trained, before any distillation process.

Why Merge? (Distilled vs. Base)

Distilled models are great for efficiency but often come with trade-offs:

  • Enhanced stability and speed
  • Potential loss in fine detail
  • Reduced diversity in outputs

To prevent generation failures, distilled models are tuned for high stability, allowing images to converge in fewer steps.

However, this can lead to muted color palettes and residual noise. Furthermore, excessive stability often results in repetitive faces and compositions.

A highly detailed portrait of a woman in a blue outfit with cake and wine, generated by Z-Image_clear_photoreal.
Distilled models produce less variety

Merging the core model into the distilled base

For Z-Image_clear_photoreal, we started with the previously introduced distilled model Z-Image-Turbo_clear and blended in layers from the original Z-Image (Base) at specific points, with the goal of improving both quality and variation.

This approach maintains the high-speed generation without CFG that Z-Image-Turbo offers, while pushing for even higher image quality.

Actual generation examples!

Let’s compare some real outputs.

Left: the newly tuned Z-Image_clear_photoreal model
Right: the original Z-Image-Turbo_clear

Fireplace room

Comparison using the “room with fireplace” prompt: Z-Image_clear_photoreal model Comparison using the “room with fireplace” prompt: Z-Image-Turbo_clear model
Z-Image_clear_photoreal  |  Z-Image-Turbo_clear

Late-night radio

Generation comparison for “late-night radio studio”: Z-Image_clear_photoreal model Generation comparison for “late-night radio studio”: Z-Image-Turbo_clear model

Close-up comparison

Close-up comparison of skin texture. The granular noise visible in the right Turbo version is smoothly improved in the left photoreal version.

When comparing image quality, skin texture is one of the easiest ways to see the difference.
The right side (Z-Image-Turbo_clear) shows the characteristic granular noise typical of fast distilled models, while the left side (Z-Image_clear_photoreal) is noticeably cleaner and smoother.

Let’s compare variation!

Next, let’s look at diversity.

We generated 8 images in a row using the exact same prompt but changing only the seed value.

Prompt (woman in traditional clothing)

realistic, photorealistic, 

a female wears a purple and gold traditional dress and jewelry, standing in front of a snowy village at sunset or sunrise, surrounded by snow-covered houses and a warm orange and pink sky with orange and pink hues.,

dynamic angle, dutch angle, upper body, close up, face close up, happy, smile, laugh, peaceful, wind, rouge, alizarin, burgundy, maroon, indigo, royal blue, deep blue, deep purple, royal purple, stylish, elegant, turn around

Z-Image_clear_photoreal (this merged model)

Z-Image_clear_photoreal_1 Z-Image_clear_photoreal_2 Z-Image_clear_photoreal_3 Z-Image_clear_photoreal_4 Z-Image_clear_photoreal_5 Z-Image_clear_photoreal_6 Z-Image_clear_photoreal_7 Z-Image_clear_photoreal_8
Rich variation

Z-Image-Turbo_clear (distilled model)

Z-Image-Turbo_clear_1 Z-Image-Turbo_clear_2 Z-Image-Turbo_clear_3 Z-Image-Turbo_clear_4 Z-Image-Turbo_clear_5 Z-Image-Turbo_clear_6 Z-Image-Turbo_clear_7 Z-Image-Turbo_clear_8
Limited change in faces and composition

Comparing the two models clearly shows that the tuned Z-Image_clear_photoreal produces rich variation in faces and compositions, while the base Z-Image-Turbo_clear tends to generate very similar-looking images with little diversity.

Merge recipe for Z-Image_clear_photoreal

Here is the merge recipe used to create Z-Image_clear_photoreal.

Screenshot of the “Model Merge Z-Image” node in ComfyUI showing per-layer merge settings between Base and Turbo models.

The Model Merge Z-Image node on the right mixes two Z-Image models.
A value of 0 uses the layer from Z-Image (Base), while 1 uses the layer from Z-Image-Turbo_clear.

In other words:

  • Early layers use the diverse training data from Z-Image (Base) → increases variation
  • Mid-to-late layers use the distilled Z-Image-Turbo_clear → preserves stability and generation speed

Custom node: Model Merge Z-Image

Use Z-Image_clear_vae as the VAE!

For the Z-Image_clear_photoreal model, I strongly recommend using the custom VAE I created: Z-Image_clear_vae.

  • Z-Image_natural_vae: lower saturation, more natural look
  • Z-Image_clear_vae: higher saturation, vivid and vibrant look

I’ve prepared several color-style variations, so feel free to choose the one that matches your preference.

Next time: Z-Image_clear_anime!

This time I created a photoreal-focused model because the tuning directions for photorealistic and anime-style illustrations turned out to be completely different.

Teaser image for Z-Image_clear_anime — vibrant anime-style character illustration.
Next up: Z-Image_clear_anime!

In the next post, I plan to introduce the anime-oriented Z-Image_clear_anime model.

Summary: Try Z-Image_clear_photoreal!

  • Z-Image (Base) → rich in variation
  • Z-Image-Turbo → excellent stability & high speed
  • Z-Image_clear_photoreal → best of both worlds

That’s the introduction to Z-Image_clear_photoreal.

When you actually start fine-tuning with Z-Image, you realize how little redundant structure it has—Alibaba clearly conducted deep research on Qwen-Image and achieved outstanding optimization and lightweight design.

Portrait of a woman standing in front of a drink refrigerator in a store — concept graphic for Z-Image_clear_photoreal.
Z-Image strikes an exquisite balance between speed and quality

Now that the core Z-Image (Base) model has been released, it has become much easier for the community to develop improved versions. I expect many excellent derivative models to appear from the community in the future.

If you haven’t tried it yet, why not take this opportunity to experience Z-Image’s combination of high quality and blazing-fast generation?

Thank you so much for reading until the end!