Enhance Flux.1's Expression! How to Incorporate SDXL's Composition Techniques

Flux.1 #SDXL

2024-9-282025-6-17

Four-frame comic strip about a new female fighter pilot

Flux.1 struggles with composition
SDXL anime models excel in composition
ComfyUI allows flexible combinations

Introduction

Hello, I'm Easygoing.

This time, we'll explore ways to further enhance the expressiveness of the trending image generation AI, Flux.1.

Theme: Fighter Jet and Rookie Female Pilot

The theme for this project is a rookie female pilot taking on her first flight in a fighter jet.

We'll aim to capture both the tension of her first flight and the weighty presence of the fighter jet.

Flux.1's Amazing Texture Quality

Flux.1 is a new image generation AI that debuted in August 2024.

Compared to previous AIs, Flux.1 excels in texture quality.

Its overwhelming texture quality is so lifelike it could be mistaken for the real thing.

The Rookie Has Weaknesses

Flux.1 is a major newcomer that has significantly raised the bar for image generation quality, but it still has some weaknesses at this stage.

Namely, its lack of experience.

Comparing Flux.1 to the previous-generation model SDXL reveals several insights.

Superior Specs, But...

First, let's compare the text encoders (language understanding models) of Flux.1 and SDXL.

Both are equipped with two text encoders each, enhancing their ability to understand prompts.

Flux.1

T5-XXL (Text-to-Text Transfer Transformer Extra Extra Large text encoder) 9.6 GB
CLIP-L (Contrastive Language-Image Pre-training Large text encoder) 0.3 GB

SDXL

OpenCLIP-ViT/G (Open-source Contrastive Language-Image Pre-training Vision Transformer Gigantic) 1.4 GB
CLIP-ViT/L (Contrastive Language-Image Pre-training Vision Transformer Large) 0.3 GB

What's the Difference?

Both sound impressive, but there's a significant difference in capacity.

The comprehension ability of language understanding models is proportional to their information capacity. In theory, Flux.1, with its larger capacity, should have much better prompt comprehension.

Illustration of a female rookie pilot flying a fighter jet for the first time, looking nervous2

However, in practice, SDXL reproduces prompts more accurately.

This difference stems from experience. SDXL has been fine-tuned by many users, resulting in greater practical comprehension.

Flux.1 is like a gifted rookie, while SDXL is a seasoned veteran with rich real-world experience.

Is Flux.1 Boring?

How does the difference in comprehension between Flux.1 and SDXL manifest?

Let's first look at some images generated by Flux.1.

Realistic illustration of a front-facing catflux1-dev-Q8_0.gguf — flux1-dev-Q8_0.gguf

Photo-like illustration of fighter jets flying in formation in the sky

Realistic illustration of a female pilot on the ground in front of a fighter jet, looking at the camera — FluxesCore-Dev - V1.0 - fp16

Each of Flux.1's images is excellent as a standalone piece.

However, when viewed in succession, they feel somewhat monotonous.

The Issue Lies in Composition

The images above all have simple compositions.

They feature the subject centered with the camera held level, a style known as the "Hinomaru composition" (akin to Japan's flag).

The Hinomaru composition is effective for simply drawing attention to the center, but it becomes repetitive and tiring when used repeatedly.

Most Photos Use Hinomaru Composition

When we take photos, a tilted horizon is often considered a mistake.

In group photos, aside from professional or skilled amateur photographers, the subject is typically centered.

Since much of the training data on the internet follows this composition, Flux.1 tends to replicate it.

SDXL Can Shoot Boldly

Next, let's look at images generated by SDXL.

Anime-style illustration of a cute cat’s face captured with a tilted camera — anima_pencil-XL-v5.0.0

Anime-style illustration of steaming coffee captured from a slightly tilted angle

While Flux.1 surpasses in overall texture quality, SDXL uses dynamic techniques like tilting the camera or cropping parts of the subject for interesting compositions.

This playfulness brings movement to the images.

Among SDXL models, those fine-tuned for anime-style art, particularly Animagine-XL 3.0 and its derivatives, excel at such bold expressions.

Flux.1 Struggles to Reproduce Prompts

For this project, I used the following prompts to add movement to the images:

Dutch angle: Shooting with a tilted camera
Close-up: Shooting with a telephoto lens

While SDXL faithfully reproduced these prompts, resulting in varied compositions, Flux.1 struggles to interpret prompts due to limited practical fine-tuning.

A Veteran to the Rescue?

This led to the idea of compensating for Flux.1's weaknesses with SDXL.


flowchart LR
subgraph SDXL
A(Original<br>Sketch)
end
subgraph Flux.1
B(Redraw)
C(Upscaling)
end
subgraph SDXL
D(Final<br>Touch)
end
A-->B
B-->C
C-->D

The concept is to leverage Flux.1's strength in texture quality while letting SDXL handle composition and finishing touches.