Does AI Illustration Turn Green? A Thorough Comparison of VAE Precision and Color Accuracy Between SDXL and Flux!

an_animated_female_character_with_silver_hair_and_blue_eyes_wears_a_white_uniform_with_a_blue_ribbon_and_a_red_ribbon,_set_against_a_dark_background_with_a_sta
  • SDXL’s VAE tends to shift toward green
  • Flux’s VAE offers high precision
  • Introduction of SDXL_anime_natural_vae

Introduction

Hello, this is Easygoing.

This time, I’d like to take a closer look at VAE, a topic that always comes up in image generation AI.

Anime illustration of a silver-haired girl wearing a white uniform
Today’s topic: VAE

AI Image Generation Requires Massive Computation

AI image generation involves repeatedly performing matrix operations, which demands enormous computational power.

Modern image generation AI models use a specialized AI model called VAE (Variational Auto Encoder) to compress the image space and perform calculations more efficiently.


flowchart LR

A1(Original Image)

subgraph VAE

B1(VAE Encode)
D1(VAE Decode)

end

subgraph Latent Space

C1(Latent Image)

end

E1(Generated Image)

A1-->B1
B1-->C1
C1-->D1
D1-->E1

The compressed space created by the VAE is called the latent space, and the images processed within it are specifically referred to as latent images.

Standard Information Volume is 1024 x 1024 x 3

In image generation AI models from SDXL onward, the standard resolution is set to 1024 x 1024.

Anime illustration of a silver-haired girl in a white uniform smiling at the viewer
1024 x 1024 x 3 channels is standard

Furthermore, in the RGB color space we use, there are three color channels — Red, Green, and Blue — resulting in a total information volume of 1024 x 1024 x 3.

VAE Compresses It to 1/48th!

Looking at the major VAEs in chronological order, they have evolved as follows: SD1.5_vae → SDXL_0.9_vae → Flux.1_vae → Flux.2_vae.

gantt
    title VAE Roadmap
    dateFormat YYYY-MM-DD
    tickInterval 12month
	axisFormat %Y

    section Stability AI
        Stable Diffusion 1 : done, 2022-08-22, 2026-04-18
        Stable Diffusion XL 0.9 : done, 2023-06-22, 2026-04-18
        
    section Black Forest Labs
        Flux.1 : 2024-08-01, 2026-04-18
        Flux.2 : 2025-11-25, 2026-04-18
		
Model SD_1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
Original Resolution 512 x 512 x 3 1024 x 1024 x 3 1024 x 1024 x 3 1024 x 1024 x 3
Latent Resolution 64 x 64 x 4 128 x 128 x 4 128 x 128 x 16 128 x 128 x 32
Compression Ratio 1/48 1/48 1/12 1/6
Representative Models SD1.5 SDXL Flux.1
HiDream
Z-Image
Flux.2

Among them, SDXL_0.9_vae compresses the vertical and horizontal resolution to 128 x 128 (1/8th of the original) and converts the channels into 4 dedicated latent channels, thereby reducing the total information volume to 1/48th of the original.

Images Change When Passed Through VAE!

So, how does image quality change when going through the VAE?

From here, I will actually run VAE encode → VAE decode in ComfyUI to observe the changes.

ComfyUI workflow diagram connecting VAE Encode and VAE Decode to verify image changes
Connecting VAE encode → VAE decode
UI of the Image Difference Checker node showing difference map, MAE/SSIM values, and tone curve
Analyzing the difference map and tone curve

The Image Difference Checker node used in this analysis measures image differences using a difference map, MAE, and SSIM, and also allows color changes to be checked via the tone curve.

ComfyUI-easygoing-nodes custom node

Comparison with Actual Images!

Now let’s compare using real images. This time, I will examine the precision of four VAEs: SD1.5_vae, SDXL_0.9_vae, Flux.1_vae, and Flux.2_vae.

1. White Uniform (Anime)

Test image: Anime-style female character wearing a white uniform

Difference Map

Comparison of difference maps for each VAE. SD1.5 and SDXL show noise around outlines, while Flux shows less.

Looking at the difference map, SD1.5_vae shows relatively large image quality degradation, especially around the character’s outlines.

SDXL_0.9_vae also shows degradation in the same areas, but the degree is considerably milder compared to SD1.5_vae.

With Flux.1_vae and Flux.2_vae, the changes from the original image are much smaller, indicating that both are excellent VAEs with minimal degradation.

MAE and SSIM

Image1 SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
MAE_similarity 97.8 % 98.3 % 99.0 % 98.8 %
SSIM_similarity 98.8 % 99.1 % 99.8 % 99.8 %

I calculated MAE and SSIM to compare the precision of each VAE numerically.

While VAE precision improves with each generation, Flux.2_vae’s MAE is slightly worse than Flux.1_vae’s, showing that Flux.1_vae has better color reproduction.

Color Changes

Image1 SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
Red -0.5 % 0.2 % 0.0 % 0.4 %
Green -0.9 % 0.2 % 0.1 % -0.3 %
Blue 0.0 % 0.0 % 0.0 % 0.7 %

Next, looking at color changes: SD1.5_vae has less green compared to the original, while Flux.2_vae shows an increase in red and blue.

Since the tendency is not very clear with just one illustration, I compared four additional illustrations for each VAE.

2. Night Harbor View (Anime)

Test image: Anime illustration of a harbor at night
Difference map comparison for each VAE on the night harbor illustration
Image2 SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
MAE_similarity 98.8 % 99.0 % 99.4 % 99.2 %
SSIM_similarity 99.5 % 99.6 % 99.9 % 99.9 %
Red 0.0 % 0.3 % 0.0 % 0.5 %
Green -0.7 % 0.5 % 0.1 % -0.4 %
Blue -0.2 % 0.0 % -0.2 % 0.3 %
  • MAE and SSIM show the same tendency
  • SDXL_0.9_vae increases green

3. Red Flowers (Anime)

Test image: Vibrant red flower anime illustration
Difference map comparison for each VAE on the red flower illustration
Image3 SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
MAE_similarity 97.1 % 97.8 % 98.8 % 98.9 %
SSIM_similarity 98.0 % 98.6 % 99.7 % 99.8 %
Red -0.4 % 0.0 % -0.2 % 0.4 %
Green -0.8 % 0.1 % 0.0 % -0.1 %
Blue -0.4 % 0.1 % 0.0 % 0.1 %
  • Flux.2_vae shows the smallest changes in both MAE and SSIM

4. Takeoff (Photorealistic)

Image4
Difference map comparison for each VAE on the airplane photo
Image4 SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
MAE_similarity 98.0 % 98.3 % 99.0 % 99.0 %
SSIM_similarity 98.3 % 98.8 % 99.8 % 99.8 %
Red 0.0 % 0.2 % 0.1 % 0.3 %
Green -0.8 % 0.3 % 0.2 % 0.0 %
Blue -0.3 % 0.3 % 0.2 % 0.2 %
  • Flux.1_vae and Flux.2_vae perform equally well

5. Oil Painting (Photorealistic)

Test image: Heavy-textured oil painting style illustration
Difference map comparison for each VAE on the oil painting illustration
Image5 SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
MAE_similarity 97.9 % 98.3 % 98.7 % 98.8 %
SSIM_similarity 99.1 % 99.3 % 99.7 % 99.8 %
Red 0.1 % -0.1 % 0.0 % 0.6 %
Green -1.1 % 0.4 % 0.3 % -0.3 %
Blue -0.2 % -0.1 % -0.1 % 0.4 %
  • Flux.1_vae and Flux.2_vae perform equally well

Looking at the Average of All Five Illustrations

Now let’s examine the average across the five illustrations tested this time.

Image Changes

Average SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
MAE_similarity 97.9 % 98.3 % 99.0 % 98.9 %
SSIM_similarity 98.7 % 99.1 % 99.8 % 99.8 %
  • Passing through VAE causes 0.2–2% degradation in the image
  • Color accuracy (MAE): Flux.1 > Flux.2 > SDXL > SD1.5
  • Structural accuracy (SSIM): Flux.2 ≈ Flux.1 > SDXL > SD1.5

Overall, images degrade by about 0.2–2% when passed through a VAE.

While performance improved steadily from SD1.5 to SDXL, the jump from SDXL to Flux.1 represents a dramatic leap in performance.

Model SD_1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
Original Resolution 512 x 512 x 3 1024 x 1024 x 3 1024 x 1024 x 3 1024 x 1024 x 3
Latent Resolution 64 x 64 x 4 128 x 128 x 4 128 x 128 x 16 128 x 128 x 32
Compression Ratio 1/48 1/48 1/12 1/6
Representative Models SD1.5 SDXL Flux.1
HiDream
Z-Image
Flux.2

Flux.1 increased the number of latent space channels from 4 to 16, quadrupling the information volume compared to SDXL. This increase in information capacity is believed to be the main reason for the significant improvement in precision.

Color Changes

Next, let’s look at the color shifts for each VAE.

Color Wheel 360-degree color wheel displaying the full range of hues in a circular format Color Wheel
Color and complementary color relationships
Average SD1.5_vae SDXL_0.9_vae Flux.1_vae Flux.2_vae
Red -0.2 % 0.1 % 0.0 % 0.4 %
Green -0.9 % 0.3 % 0.1 % -0.2 %
Blue -0.2 % 0.1 % 0.0 % 0.3 %
  • SD1.5_vae: Overall darkening, shifts toward purple
  • SDXL_0.9_vae: Shifts toward green
  • Flux.1_vae: Almost perfectly accurate
  • Flux.2_vae: Shifts toward purple

Regarding color, SD1.5_vae reduces all colors, making the entire illustration darker, with green being lost the most.

SDXL_0.9_vae shows less overall color change than SD1.5, but instead increases green.

Flux.1_vae excels at color reproduction, with almost no change.

Flux.2_vae emphasizes red and blue (magenta-blue). This is likely an intentional correction to compensate for the blue and red tones that tend to be lost when removing noise in AI illustrations.

Anime illustration of a silver-haired girl in a white uniform smiling confidently at the viewer
Flux.2 applies intentional color correction

Creating an Improved SDXL VAE!

Since we discovered that SDXL_0.9_vae tends to shift toward green, I adjusted a new VAE called SDXL_anime_natural_vae specifically for anime illustrations. It has the smallest MAE and SSIM changes and stays most faithful to the original color expression.

SDXL_anime_natural_vae

Output example using sdxl_anime_natural_vae
Image2 SDXL_anime_natural_vae SDXL_0.9_vae
MAE_similarity 99.4 % 99.0 %
SSIM_similarity 99.9 % 99.6 %
Red -0.2 % 0.3 %
Green -0.1 % 0.5 %
Blue 0.0 % 0.0 %

Since SDXL_anime_natural_vae was fine-tuned while monitoring MAE and SSIM, its numerical precision is significantly improved compared to the original.

Because it causes minimal deviation from the original image, it is expected to improve image quality especially in workflows that repeatedly process VAE, such as Hires.fix or Detailer.

SDXL_anime_clear_vae

Also available on the same page is SDXL_anime_clear_vae, which darkens the overall illustration and boosts contrast to emphasize a heavy, rich atmosphere.

Output sample using SDXL_anime_clear_vae
Image2 SDXL_anime_clear_vae SDXL_0.9_vae
MAE_similarity 99.1 % 99.0 %
SSIM_similarity 99.9 % 99.6 %
Red -0.4 % 0.3 %
Green -0.3 % 0.5 %
Blue -0.2 % 0.0 %

This version has a stronger “flavor” compared to SDXL_anime_natural_vae, so it may not suit every illustration. However, if you try it and find it matches your desired expression, it’s worth using.

Does Image Quality Differ Between FP32 and BF16 Formats?

All verifications so far were performed using the highest-precision FP32 format of the VAE.

In actual image generation, FP16 / BF16 formats are used most frequently, so I compared image quality between FP32 and BF16 at the end.

sdxl_anime_natural_vae_BF16

Workflow for measuring image quality difference between VAE data formats (FP32 vs BF16)
Image2 SDXL_anime_natural_vae_FP32 SDXL_anime_natural_vae_BF16
MAE_similarity 99.40961 % 99.35945 %
SSIM_similarity 99.92685 % 99.91910 %

The FP32 format is slightly superior in image quality compared to BF16.

If you are pursuing the highest possible image quality in illustration generation, it may be worth trying the FP32 version of the VAE.

Screenshot explaining how to download models on Hugging Face
Click the model on Hugging Face
Screenshot showing where to check the model’s Precision (F32) in the file details
If Precision is F32, it is the FP32 format model

When using an FP32 VAE in ComfyUI, download the FP32 model and launch ComfyUI with the --fp32-vae argument.

What Makes Flux.2_vae So Good?

Finally, I’d like to introduce an article by Shiba*2 that explains Flux.2’s VAE in detail.

In this verification, I compared against the original images, so the difference between Flux.1_vae and Flux.2_vae was not very large.

Flux.2_vae is a next-generation VAE that understands the “meaning” contained in images, so its performance may improve further with different verification methods or future enhancements.

Summary

  • SDXL’s VAE tends to shift toward green
  • Flux’s VAE offers high precision
  • Introduction of SDXL_anime_natural_vae

This time, I investigated VAE.

Unlike UNet/Transformer or text encoders, it’s hard to understand how VAE affects illustrations, but using the Image Difference Checker node allowed me to objectively compare its precision.

Anime illustration of a silver-haired girl in a white uniform with clear eyes looking at the viewer
Try SDXL_anime_natural_vae!

The SDXL_anime_natural_vae introduced this time improves image quality simply by swapping out the regular VAE, making it a highly recommended VAE for everyone.

I will continue exploring AI illustration quality from various angles.

Thank you for reading until the end!