Enhance Flux.1's Expression! How to Incorporate SDXL's Composition Techniques

Four-frame comic strip about a new female fighter pilot
  • Flux.1 struggles with composition
  • SDXL anime models excel in composition
  • ComfyUI allows flexible combinations

Introduction

Hello, I'm Easygoing.

This time, we'll explore ways to further enhance the expressiveness of the trending image generation AI, Flux.1.

Theme: Fighter Jet and Rookie Female Pilot

The theme for this project is a rookie female pilot taking on her first flight in a fighter jet.

We'll aim to capture both the tension of her first flight and the weighty presence of the fighter jet.

Flux.1's Amazing Texture Quality

Flux.1 is a new image generation AI that debuted in August 2024.

Compared to previous AIs, Flux.1 excels in texture quality.

Realistic illustration of a cat

Its overwhelming texture quality is so lifelike it could be mistaken for the real thing.

The Rookie Has Weaknesses

Flux.1 is a major newcomer that has significantly raised the bar for image generation quality, but it still has some weaknesses at this stage.

Namely, its lack of experience.

Comparing Flux.1 to the previous-generation model SDXL reveals several insights.

Superior Specs, But...

First, let's compare the text encoders (language understanding models) of Flux.1 and SDXL.

Both are equipped with two text encoders each, enhancing their ability to understand prompts.

Flux.1

  • T5-XXL (Text-to-Text Transfer Transformer Extra Extra Large text encoder) 9.6 GB
  • CLIP-L (Contrastive Language-Image Pre-training Large text encoder) 0.3 GB

SDXL

  • OpenCLIP-ViT/G (Open-source Contrastive Language-Image Pre-training Vision Transformer Gigantic) 1.4 GB
  • CLIP-ViT/L (Contrastive Language-Image Pre-training Vision Transformer Large) 0.3 GB

What's the Difference?

Both sound impressive, but there's a significant difference in capacity.

The comprehension ability of language understanding models is proportional to their information capacity. In theory, Flux.1, with its larger capacity, should have much better prompt comprehension.

Illustration of a female rookie pilot flying a fighter jet for the first time, looking nervous2

However, in practice, SDXL reproduces prompts more accurately.

This difference stems from experience. SDXL has been fine-tuned by many users, resulting in greater practical comprehension.

Flux.1 is like a gifted rookie, while SDXL is a seasoned veteran with rich real-world experience.

Is Flux.1 Boring?

How does the difference in comprehension between Flux.1 and SDXL manifest?

Let's first look at some images generated by Flux.1.

Realistic illustration of a front-facing catflux1-dev-Q8_0.gguf
flux1-dev-Q8_0.gguf
Photo-like illustration of fighter jets flying in formation in the sky
Realistic illustration of a female pilot on the ground in front of a fighter jet, looking at the camera
FluxesCore-Dev - V1.0 - fp16

Each of Flux.1's images is excellent as a standalone piece.

However, when viewed in succession, they feel somewhat monotonous.

The Issue Lies in Composition

The images above all have simple compositions.

They feature the subject centered with the camera held level, a style known as the "Hinomaru composition" (akin to Japan's flag).

Japanese national flag (Hinomaru)
Japanese national flag (Hinomaru)

The Hinomaru composition is effective for simply drawing attention to the center, but it becomes repetitive and tiring when used repeatedly.

Most Photos Use Hinomaru Composition

When we take photos, a tilted horizon is often considered a mistake.

In group photos, aside from professional or skilled amateur photographers, the subject is typically centered.

Since much of the training data on the internet follows this composition, Flux.1 tends to replicate it.

SDXL Can Shoot Boldly

Next, let's look at images generated by SDXL.

Anime-style illustration of a cute cat’s face captured with a tilted camera
anima_pencil-XL-v5.0.0
Anime-style illustration of steaming coffee captured from a slightly tilted angle

While Flux.1 surpasses in overall texture quality, SDXL uses dynamic techniques like tilting the camera or cropping parts of the subject for interesting compositions.

This playfulness brings movement to the images.

Among SDXL models, those fine-tuned for anime-style art, particularly Animagine-XL 3.0 and its derivatives, excel at such bold expressions.

Flux.1 Struggles to Reproduce Prompts

For this project, I used the following prompts to add movement to the images:

  • Dutch angle: Shooting with a tilted camera
  • Close-up: Shooting with a telephoto lens

While SDXL faithfully reproduced these prompts, resulting in varied compositions, Flux.1 struggles to interpret prompts due to limited practical fine-tuning.

A Veteran to the Rescue?

This led to the idea of compensating for Flux.1's weaknesses with SDXL.


flowchart LR
subgraph SDXL
A(Original<br>Sketch)
end
subgraph Flux.1
B(Redraw)
C(Upscaling)
end
subgraph SDXL
D(Final<br>Touch)
end
A-->B
B-->C
C-->D

The concept is to leverage Flux.1's strength in texture quality while letting SDXL handle composition and finishing touches.

Putting It into Practice

Let's look at the actual images.

Rough sketch of a female pilot flying a fighter jet for the first time, looking nervous
SDXL original sketch
High-resolution version of the female pilot flying a fighter jet for the first time, looking nervous2
Flux.1 redraw
Finished illustration of the female pilot flying a fighter jet for the first time2
SDXL final touch

First, SDXL creates a rough original image.

Next, Flux.1 redraws it to enhance resolution and texture, correcting details like fingers.

Since the character's texture can become overly realistic, SDXL is used again to soften the texture for the final touch.

Finding the right balance to express both the character's softness and the jet's texture required several attempts.

Trying ComfyUI!

Automating this process was not feasible with Stable Diffusion webUI Forge, which I've used so far, so I took this opportunity to adopt ComfyUI.

Learning ComfyUI was challenging, and I wrestled with error messages for about three days, but I finally managed to generate images.

ComfyUI Reveals VRAM Usage!

Using ComfyUI makes it clear which processes consume VRAM.

Managing VRAM is key to running Flux.1, and optimizing VRAM usage improves image quality and generation speed.

Workflow

Here are the workflows used.

SDXL-Flux1-SDXL_anime 2025.4.11

SDXL-Flux1-SDXL for Anime

Anime illustration of a female fighter pilot panicking on her first sortie

Verified Models

This is the workflow used for the anime-style illustration.

SDXL-Flux1 for Semi-Realistic

semi-realistic Illustration of a cat sleeping comfortably on a futon

Verified Models

This is the workflow for semi-realistic illustrations incorporating anime-style compositions.

Summary

  • Flux.1 struggles with composition.
  • SDXL's anime models excel at composition.
  • ComfyUI enables flexible combinations.

I see great potential in Flux.1.

As Flux.1's training progresses, techniques like relying on SDXL, as introduced here, may become unnecessary.

Until then, the partnership with the veteran SDXL will likely continue.

Thank you for reading to the end!


Bonus

In fighter jets, the term "Fire!" is used when launching weapons...

Illustration of a female rookie pilot in a fighter jet cockpit lighting a fire

No! That's not what it means!


Update History

2025.4.14

Updated the workflow in line with ComfyUI's update.

2025.4.11

Updated the workflow.

2024.9.23

Fixed and re-uploaded the workflow.