Announcing Z-Image-Turbo_clear! The Lightweight, High-Speed Next-Gen Model

Z-Image-Turbo_clear_cover_image
  • Z-Image is lightweight and high-speed.
  • Color expression is improved through model and VAE refinement.
  • It has the potential to replace SDXL.

Introduction

Hello everyone, I'm Easygoing.

Today, I'm excited to announce the release of the new, high-definition, lightning-fast next-generation model: Z-Image-Turbo_clear.

Anime-style illustration generated with Z-Image-Turbo_clear + Z-Image_clear_vivid_vae, showing a woman in a blue kimono standing in front of a temple with autumn leaves.
Z-Image-Turbo_clear + Z-Image_clear_vivid_vae

What is Z-Image-Turbo?

Z-Image is a new image generation AI model released by China's Alibaba Group on November 25, 2025.


gantt
    title Local Image Generating AIs
    dateFormat YYYY-MM-DD
    axisFormat %Y-%m

    section Stability AI
        Stable Difusion XL: done, 2023-10-02, 2025-12-06
		Stable Diffusion 3: done, 2024-06-12, 2025-12-06

    section Black Forests Labs
        Flux.1: done, 2024-08-01, 2025-12-06
        Flux.2: done, 2025-11-25, 2025-12-06

    section Vivago AI
        HiDream: done, 2025-04-06, 2025-12-06

    section Alibaba
        Qwen-Image: done, 2025-08-04, 2025-12-06
        Z-Image :  2025-11-25, 2025-12-06

While the trend in image generation AI has been toward larger model sizes alongside improved performance, Z-Image bucks this trend by being a small, lightweight model.

Image Generation Model Sizes

Currently, only the fast distillation model, Z-Image-Turbo, has been released in the Z-Image series. However, the foundational model, Z-Image-Base, and the image editing model, Z-Image-Edit, are scheduled for future release.

How Fast is Z-Image-Turbo?

Just how quickly can Z-Image-Turbo generate illustrations?

Let's measure the time it takes for various image generation AI models to produce an illustration.

Measurement Conditions

  • RTX 4060 Ti 16GB
  • RAM 64 GB
  • Measured twice using high-quality settings

Note that the image generation AI model Flux.2-dev could not run in BF16 format on my hardware due to insufficient GPU performance, so it was measured in the computationally lighter FP8 format.

Image Generation Time Required (FP16 / BF16 Format)

While the Z-Image-Turbo model takes longer to generate illustrations than the SDXL model, it is demonstrably exceptionally fast among the next-generation models.

Generated Images from Each Model

Z-Image-Turbo is fast, but what about its image quality?

Let's compare the actual illustrations generated by each model.

SDXL: RealVisXL_v5.0

Illustration of a blonde woman relaxing in a cafe generated by SDXL (RealVisXL_v5.0): An example where the face and hands have noticeable inconsistencies.
Woman in a cafe Inconsistencies in facial feature placement and hand depiction.

Z-Image-Turbo: Z-Image-Turbo_clear

Illustration of a woman working in a cafe generated by Z-Image-Turbo_clear: Accurate realistic depiction, though some noise remains.
Some noise remains, but the accuracy is excellent.

Flux.1 [dev]: Flux1-Krea-dev

Illustration of a woman reading in a cafe generated by Flux.1 [dev] (Krea): High-quality realistic lighting and material texture expression.
High texture quality.

HiDream: HiDream-I1-Dev_clear

Illustration of a woman enjoying afternoon tea generated by HiDream-I1-Dev_clear: Artistic atmosphere and warm color tone.
Artistic atmosphere.

Qwen-Image: Qwen-Image-Edit-2509_clear

Illustration of a woman relaxing in a cafe generated by Qwen-Image-Edit-2509_clear: Natural and flowing expression.
Natural expression.

Flux.2 [dev]: Flux2-dev (FP8)

Illustration of a woman reading in a cafe generated by Flux.2 [dev] (FP8): Realistic, but resolution is insufficient due to FP8 format.
Realistic, but does not fully resolve in FP8 format.

The illustration quality is generally better with the higher-end models that require more generation time. However, the Z-Image-Turbo model achieves a great balance between image quality and generation speed at a high level.

Introducing Z-Image-Turbo_clear

Now, let's introduce the custom model I created: Z-Image-Turbo_clear.

Z-Image-Turbo_clear is a custom-tuned version of Z-Image-Turbo, primarily focused on improving the detail.

Woman in a Blue Dress

Z-Image-Turbo_clear_blue_dress. Z-Image-Turbo_oeiginal_blue_dress.
Z-Image-Turbo_clear | Z-Image-Turbo
Portrait of a woman in a blue dress generated with Z-Image-Turbo_clear: An example of detail improvement.

Since the Z-Image-Turbo model is a distillation model itself rather than a foundational model, there is limited scope for adjustment.

The current adjustments slightly improve details like the eyelashes, but the color expression is not ideal; it still exhibits the yellowish, poor-complexion skin tone typical of distillation models (which I call "zombie skin").

Therefore, the next step is to further adjust the VAE to refine the colors.

The VAE Finalizes the Illustration

The VAE (Variational Autoencoder) is an independent AI model used in the final stage of image generation to convert the compressed representation, known as the latent space, back into a normal dimension.

flowchart LR

A1(Prompt)

subgraph Latent Space
B1("Transformer")
end

C1(Image)

A1-->|Text Encoder|B1
B1-->|VAE|C1

The VAE is involved in the "finishing touches" of the illustration. By adjusting it, you can fine-tune the details and colors.

The default VAE for Z-Image is Flux1_ae.safetensors, which is also common to Flux.1, HiDream, and Wan. I've adjusted this to achieve a better-looking, healthier skin tone.

Examples of Z-Image_clear_vae!

Here is the VAE I adjusted:

The image on the left uses the adjusted model and VAE, while the image on the right is the original.

Night in the City

Comparison of anime illustrations of a woman walking in a night city: Z-Image-Turbo_clear + Z-Image_clear_vae Comparison of anime illustrations of a woman walking in a night city: Original
Z-Image-Turbo_clear + Z-Image_clear_vae | Z-Image-Turbo + Flux1_ae

Afternoon Tea

illustration of a woman enjoying afternoon tea: Z-Image-Turbo + Z-Image_clear_vae. illustration of a woman enjoying afternoon tea: Original.
Comparison of Z-Image-Turbo + Z-Image_clear_vae and the original, close-up of the face.

Adjusting the VAE increased contrast and saturation, improving the skin's color and making the illustration clearer overall, while also mitigating the dimness.

These examples use Z-Image_clear_vae, but a higher-saturation version, Z-Image_clear_vivid_vae, is also available on the distribution page, so feel free to try whichever you prefer.

Comparison with Qwen-Image...

While the Z-Image-Turbo_clear model improves details and color expression, it still has more noise and the skin texture is not quite as good as the company's higher-end model, Qwen-Image.

Comparison of skin texture: Qwen-Image-Edit-2509_clear Comparison of skin texture: Z-Image-Turbo_clear
Qwen-Image-Edit-2509_clear | Z-Image-Turbo_clear

Nevertheless, considering that the Z-Image-Turbo model is much lighter and a distillation model with CFG disabled, its image quality is remarkably competitive.

I expect the quality and texture will further improve with the upcoming Z-Image-Base and Z-Image-Edit models, and I eagerly await their release.

Z-Image Has a Permissive License!

Finally, let's look at the Z-Image model's license.

Developer Model License Commercial Use
Stability AI SDXL CreativeML Open RAIL++-M, etc. Mostly 🟒
Black Forest Labs Flux.1 [dev] FLUX.1 [dev] Non-Commercial License ❌~🟑
Flux.2 [dev] FLUX [dev] Non-Commercial License v2.0 ❌~🟑
Vivago AI HiDream MIT 🟒
Alibaba Qwen-Image Apache-2.0 🟒
Z-Image 🟒

Z-Image is licensed under Apache-2.0, just like Qwen-Image, which permits free development and commercial use.

Moreover, there are unofficial rumors that the development team is interested in large-scale anime illustration training.

Image Generation AI Model Licenses

As a lightweight model, Z-Image has the potential for community growth and could eventually replace SDXL.

The Z-Image series, including the forthcoming Base and Edit models, is arguably the most anticipated model in the local image generation AI space.

Conclusion: Try Z-Image-Turbo_clear!

  • Z-Image is lightweight and high-speed.
  • Color expression is improved through model and VAE refinement.
  • It has the potential to replace SDXL.

This time, I introduced the Z-Image series and the custom model Z-Image-Turbo_clear.

Z-Image is a welcome return to small, lightweight models, and combined with its permissive license, there is great anticipation for its future development.

Illustration symbolizing the future potential of the Z-Image series: Light particle effects representing the possibility of high-speed generation.
Looking forward to future developments!

The release of the even higher-quality Base and Edit models is on the horizon, but until then, why not give the Z-Image-Turbo_clear model a try and experience high-quality, high-speed generation for yourself?

Thank you for reading all the way to the end!


Reference Articles

Qwen-Image-Edit-2509_clear

HiDream-I1-Dev_clear