Does AI Illustration Turn Green? A Thorough Comparison of VAE Precision and Color Accuracy Between SDXL and Flux!
- SDXL’s VAE tends to shift toward green
- Flux’s VAE offers high precision
- Introduction of SDXL_anime_natural_vae
Introduction
Hello, this is Easygoing.
This time, I’d like to take a closer look at VAE, a topic that always comes up in image generation AI.
AI Image Generation Requires Massive Computation
AI image generation involves repeatedly performing matrix operations, which demands enormous computational power.
Modern image generation AI models use a specialized AI model called VAE (Variational Auto Encoder) to compress the image space and perform calculations more efficiently.
flowchart LR
A1(Original Image)
subgraph VAE
B1(VAE Encode)
D1(VAE Decode)
end
subgraph Latent Space
C1(Latent Image)
end
E1(Generated Image)
A1-->B1
B1-->C1
C1-->D1
D1-->E1
The compressed space created by the VAE is called the latent space, and the images processed within it are specifically referred to as latent images.
Standard Information Volume is 1024 x 1024 x 3
In image generation AI models from SDXL onward, the standard resolution is set to 1024 x 1024.
Furthermore, in the RGB color space we use, there are three color channels — Red, Green, and Blue — resulting in a total information volume of 1024 x 1024 x 3.
VAE Compresses It to 1/48th!
Looking at the major VAEs in chronological order, they have evolved as follows: SD1.5_vae → SDXL_0.9_vae → Flux.1_vae → Flux.2_vae.
gantt
title VAE Roadmap
dateFormat YYYY-MM-DD
tickInterval 12month
axisFormat %Y
section Stability AI
Stable Diffusion 1 : done, 2022-08-22, 2026-04-18
Stable Diffusion XL 0.9 : done, 2023-06-22, 2026-04-18
section Black Forest Labs
Flux.1 : 2024-08-01, 2026-04-18
Flux.2 : 2025-11-25, 2026-04-18
| Model | SD_1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| Original Resolution | 512 x 512 x 3 | 1024 x 1024 x 3 | 1024 x 1024 x 3 | 1024 x 1024 x 3 |
| Latent Resolution | 64 x 64 x 4 | 128 x 128 x 4 | 128 x 128 x 16 | 128 x 128 x 32 |
| Compression Ratio | 1/48 | 1/48 | 1/12 | 1/6 |
| Representative Models | SD1.5 | SDXL | Flux.1 HiDream Z-Image |
Flux.2 |
Among them, SDXL_0.9_vae compresses the vertical and horizontal resolution to 128 x 128 (1/8th of the original) and converts the channels into 4 dedicated latent channels, thereby reducing the total information volume to 1/48th of the original.
Images Change When Passed Through VAE!
So, how does image quality change when going through the VAE?
From here, I will actually run VAE encode → VAE decode in ComfyUI to observe the changes.
The Image Difference Checker node used in this analysis measures image differences using a difference map, MAE, and SSIM, and also allows color changes to be checked via the tone curve.
- Mean Absolute Error (MAE): Mainly detects color differences
- Structural Similarity Index (SSIM): Detects structural differences in black-and-white (luminance)
ComfyUI-easygoing-nodes custom node
Comparison with Actual Images!
Now let’s compare using real images. This time, I will examine the precision of four VAEs: SD1.5_vae, SDXL_0.9_vae, Flux.1_vae, and Flux.2_vae.
1. White Uniform (Anime)
Difference Map
Looking at the difference map, SD1.5_vae shows relatively large image quality degradation, especially around the character’s outlines.
SDXL_0.9_vae also shows degradation in the same areas, but the degree is considerably milder compared to SD1.5_vae.
With Flux.1_vae and Flux.2_vae, the changes from the original image are much smaller, indicating that both are excellent VAEs with minimal degradation.
MAE and SSIM
| Image1 | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| MAE_similarity | 97.8 % | 98.3 % | 99.0 % | 98.8 % |
| SSIM_similarity | 98.8 % | 99.1 % | 99.8 % | 99.8 % |
I calculated MAE and SSIM to compare the precision of each VAE numerically.
While VAE precision improves with each generation, Flux.2_vae’s MAE is slightly worse than Flux.1_vae’s, showing that Flux.1_vae has better color reproduction.
Color Changes
| Image1 | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| Red | -0.5 % | 0.2 % | 0.0 % | 0.4 % |
| Green | -0.9 % | 0.2 % | 0.1 % | -0.3 % |
| Blue | 0.0 % | 0.0 % | 0.0 % | 0.7 % |
Next, looking at color changes: SD1.5_vae has less green compared to the original, while Flux.2_vae shows an increase in red and blue.
Since the tendency is not very clear with just one illustration, I compared four additional illustrations for each VAE.
2. Night Harbor View (Anime)
| Image2 | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| MAE_similarity | 98.8 % | 99.0 % | 99.4 % | 99.2 % |
| SSIM_similarity | 99.5 % | 99.6 % | 99.9 % | 99.9 % |
| Red | 0.0 % | 0.3 % | 0.0 % | 0.5 % |
| Green | -0.7 % | 0.5 % | 0.1 % | -0.4 % |
| Blue | -0.2 % | 0.0 % | -0.2 % | 0.3 % |
- MAE and SSIM show the same tendency
- SDXL_0.9_vae increases green
3. Red Flowers (Anime)
| Image3 | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| MAE_similarity | 97.1 % | 97.8 % | 98.8 % | 98.9 % |
| SSIM_similarity | 98.0 % | 98.6 % | 99.7 % | 99.8 % |
| Red | -0.4 % | 0.0 % | -0.2 % | 0.4 % |
| Green | -0.8 % | 0.1 % | 0.0 % | -0.1 % |
| Blue | -0.4 % | 0.1 % | 0.0 % | 0.1 % |
- Flux.2_vae shows the smallest changes in both MAE and SSIM
4. Takeoff (Photorealistic)
| Image4 | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| MAE_similarity | 98.0 % | 98.3 % | 99.0 % | 99.0 % |
| SSIM_similarity | 98.3 % | 98.8 % | 99.8 % | 99.8 % |
| Red | 0.0 % | 0.2 % | 0.1 % | 0.3 % |
| Green | -0.8 % | 0.3 % | 0.2 % | 0.0 % |
| Blue | -0.3 % | 0.3 % | 0.2 % | 0.2 % |
- Flux.1_vae and Flux.2_vae perform equally well
5. Oil Painting (Photorealistic)
| Image5 | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| MAE_similarity | 97.9 % | 98.3 % | 98.7 % | 98.8 % |
| SSIM_similarity | 99.1 % | 99.3 % | 99.7 % | 99.8 % |
| Red | 0.1 % | -0.1 % | 0.0 % | 0.6 % |
| Green | -1.1 % | 0.4 % | 0.3 % | -0.3 % |
| Blue | -0.2 % | -0.1 % | -0.1 % | 0.4 % |
- Flux.1_vae and Flux.2_vae perform equally well
Looking at the Average of All Five Illustrations
Now let’s examine the average across the five illustrations tested this time.
Image Changes
| Average | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| MAE_similarity | 97.9 % | 98.3 % | 99.0 % | 98.9 % |
| SSIM_similarity | 98.7 % | 99.1 % | 99.8 % | 99.8 % |
- Passing through VAE causes 0.2–2% degradation in the image
- Color accuracy (MAE): Flux.1 > Flux.2 > SDXL > SD1.5
- Structural accuracy (SSIM): Flux.2 ≈ Flux.1 > SDXL > SD1.5
Overall, images degrade by about 0.2–2% when passed through a VAE.
While performance improved steadily from SD1.5 to SDXL, the jump from SDXL to Flux.1 represents a dramatic leap in performance.
| Model | SD_1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| Original Resolution | 512 x 512 x 3 | 1024 x 1024 x 3 | 1024 x 1024 x 3 | 1024 x 1024 x 3 |
| Latent Resolution | 64 x 64 x 4 | 128 x 128 x 4 | 128 x 128 x 16 | 128 x 128 x 32 |
| Compression Ratio | 1/48 | 1/48 | 1/12 | 1/6 |
| Representative Models | SD1.5 | SDXL | Flux.1 HiDream Z-Image |
Flux.2 |
Flux.1 increased the number of latent space channels from 4 to 16, quadrupling the information volume compared to SDXL. This increase in information capacity is believed to be the main reason for the significant improvement in precision.
Color Changes
Next, let’s look at the color shifts for each VAE.
| Average | SD1.5_vae | SDXL_0.9_vae | Flux.1_vae | Flux.2_vae |
|---|---|---|---|---|
| Red | -0.2 % | 0.1 % | 0.0 % | 0.4 % |
| Green | -0.9 % | 0.3 % | 0.1 % | -0.2 % |
| Blue | -0.2 % | 0.1 % | 0.0 % | 0.3 % |
- SD1.5_vae: Overall darkening, shifts toward purple
- SDXL_0.9_vae: Shifts toward green
- Flux.1_vae: Almost perfectly accurate
- Flux.2_vae: Shifts toward purple
Regarding color, SD1.5_vae reduces all colors, making the entire illustration darker, with green being lost the most.
SDXL_0.9_vae shows less overall color change than SD1.5, but instead increases green.
Flux.1_vae excels at color reproduction, with almost no change.
Flux.2_vae emphasizes red and blue (magenta-blue). This is likely an intentional correction to compensate for the blue and red tones that tend to be lost when removing noise in AI illustrations.
Creating an Improved SDXL VAE!
Since we discovered that SDXL_0.9_vae tends to shift toward green, I adjusted a new VAE called SDXL_anime_natural_vae specifically for anime illustrations. It has the smallest MAE and SSIM changes and stays most faithful to the original color expression.
SDXL_anime_natural_vae
| Image2 | SDXL_anime_natural_vae | SDXL_0.9_vae |
|---|---|---|
| MAE_similarity | 99.4 % | 99.0 % |
| SSIM_similarity | 99.9 % | 99.6 % |
| Red | -0.2 % | 0.3 % |
| Green | -0.1 % | 0.5 % |
| Blue | 0.0 % | 0.0 % |
Since SDXL_anime_natural_vae was fine-tuned while monitoring MAE and SSIM, its numerical precision is significantly improved compared to the original.
Because it causes minimal deviation from the original image, it is expected to improve image quality especially in workflows that repeatedly process VAE, such as Hires.fix or Detailer.
SDXL_anime_clear_vae
Also available on the same page is SDXL_anime_clear_vae, which darkens the overall illustration and boosts contrast to emphasize a heavy, rich atmosphere.
| Image2 | SDXL_anime_clear_vae | SDXL_0.9_vae |
|---|---|---|
| MAE_similarity | 99.1 % | 99.0 % |
| SSIM_similarity | 99.9 % | 99.6 % |
| Red | -0.4 % | 0.3 % |
| Green | -0.3 % | 0.5 % |
| Blue | -0.2 % | 0.0 % |
This version has a stronger “flavor” compared to SDXL_anime_natural_vae, so it may not suit every illustration. However, if you try it and find it matches your desired expression, it’s worth using.
Does Image Quality Differ Between FP32 and BF16 Formats?
All verifications so far were performed using the highest-precision FP32 format of the VAE.
In actual image generation, FP16 / BF16 formats are used most frequently, so I compared image quality between FP32 and BF16 at the end.
sdxl_anime_natural_vae_BF16
| Image2 | SDXL_anime_natural_vae_FP32 | SDXL_anime_natural_vae_BF16 |
|---|---|---|
| MAE_similarity | 99.40961 % | 99.35945 % |
| SSIM_similarity | 99.92685 % | 99.91910 % |
The FP32 format is slightly superior in image quality compared to BF16.
If you are pursuing the highest possible image quality in illustration generation, it may be worth trying the FP32 version of the VAE.
When using an FP32 VAE in ComfyUI, download the FP32 model and launch ComfyUI with the --fp32-vae argument.
What Makes Flux.2_vae So Good?
Finally, I’d like to introduce an article by Shiba*2 that explains Flux.2’s VAE in detail.
In this verification, I compared against the original images, so the difference between Flux.1_vae and Flux.2_vae was not very large.
Flux.2_vae is a next-generation VAE that understands the “meaning” contained in images, so its performance may improve further with different verification methods or future enhancements.
Summary
- SDXL’s VAE tends to shift toward green
- Flux’s VAE offers high precision
- Introduction of SDXL_anime_natural_vae
This time, I investigated VAE.
Unlike UNet/Transformer or text encoders, it’s hard to understand how VAE affects illustrations, but using the Image Difference Checker node allowed me to objectively compare its precision.
The SDXL_anime_natural_vae introduced this time improves image quality simply by swapping out the regular VAE, making it a highly recommended VAE for everyone.
I will continue exploring AI illustration quality from various angles.
Thank you for reading until the end!