HiDream-O1-Image_clear_v1: Easy Local Image Editing with UiT Architecture
- HiDream-O1-Image_clear delivers clear and vibrant outputs
- Supports Image-to-Image and various image editing tasks
- MXFP8 is a format exclusive to RTX 5000 series
Introduction
Hello, I'm Easygoing.
This time, I've released the HiDream-O1-Image_clear_v1 model, which makes image editing easy to do locally, so let me introduce it to you.
HiDream-O1-Image is a versatile image generation and editing model
HiDream-O1-Image is a versatile AI-powered image generation and editing model.
HiDream-O1-Image adopts the UiT architecture, which understands language and images in the same dimensional space. Thanks to this, even though it is a lightweight model, it possesses high image editing capabilities.
Since the HiDream-O1-Image model is a newly released base model, its outputs tend to be slightly blurry. This time, I fine-tuned it to produce clearer illustrations.
Real illustration examples!
Let’s compare some actual illustrations.
The left side shows the adjusted HiDream-O1-Image_clear_v1, and the right side shows the output from the original HiDream-O1-Image.
Neon Sign
Paris at Dusk
You can see that the left side (HiDream-O1-Image_clear_v1) produces higher contrast and clearer illustrations compared to the right side (original HiDream-O1-Image).
On the other hand, since HiDream-O1-Image_clear_v1 was mainly fine-tuned on anime illustrations, photorealistic images may sometimes feel a bit too high in contrast. In such cases, try adjusting parameters by lowering the CFG scale or noise_level.
Let’s try image editing!
Now, let’s actually try image editing using the HiDream-O1-Image_clear_v1 model.
In these examples, the source images were also generated with HiDream-O1-Image_clear.
I’ve also attached the ComfyUI workflows I actually used.
Convert to Anime Illustration
change to anime illustration
Using as Refiner to Redraw and Enhance Details
Costume & Background Transformation
A full-length portrait of a man in a fantasy world, holding a long sword and wearing plate armor, beaming with excitement as he looks forward to his upcoming adventures. The background features a medieval village with a wheat field in the distance.
Convert to Black and White Manga
change to black and white manga artwork
Combining Two Images
draw image1 cat on image2 carpet
ComfyUI-uit-hidream-o1 Custom Node
For this workflow, I used the ComfyUI-uit-hidream-o1 custom node.
UIT Sampler Node
Inputs
- model: Input the model
- clip: Dummy input (required for ComfyUI connection)
- vae: Dummy input (required for ComfyUI connection)
- input_image: Use an input image instead of initial white noise
- reference_image: Directly converts the input image into tokens (can be used as a replacement for or in combination with text prompts)
Settings
-
width, height: HiDream-O1’s default resolution is 2048 x 2048
- When an input_image is provided, it resizes the input_image to 4 megapixels and uses that resolution. In that case, width and height are ignored.
- noise_scale: Strength of the noise
The official workflow uses more than half dummy nodes
HiDream-O1-Image has an official workflow published by Comfy Org, but it uses many dummy nodes for features that don’t actually exist. It is not recommended if you want to understand the UiT architecture.
UiT architecture does not have clip, vae, external conditioning, or latent.
ComfyUI is a tool specialized for inference using CLIP and VAE, which appeared after Stable Diffusion 1. Therefore, it is somewhat inevitable that implementing the simpler UiT architecture becomes relatively complex.
FP8_scaled vs MXFP8
On the release page for HiDream-O1-Image_clear_v1, I have published two types of high-precision FP8 format models: FP8_scaled and MXFP8.
- FP8_scaled: Improved precision, runs fast on RTX 4000 series
- MXFP8: Further improved precision, runs fast on RTX 5000 series
FP8_scaled is a model that uses scaling to efficiently utilize the overall bit count and suppress precision loss.
MXFP8 further divides the data into blocks of 32 for scaling, which avoids the influence of outliers and improves precision even more. However, hardware support is currently limited to NVIDIA’s RTX 5000 series GPUs.
gantt
title GPU Series Roadmap
dateFormat YYYY-MM-DD
tickInterval 12month
axisFormat %Y
section NVIDIA
GTX 1000 : 2016-05-27, 2026-05-30
RTX 2000 : 2018-09-20, 2026-05-30
RTX 3000 : 2020-09-17, 2026-05-30
RTX 4000 : 2022-10-12, 2026-05-30
RTX 5000 : 2025-01-30, 2026-05-30
section AMD
RX 5000 : 2019-07-07, 2026-05-30
RX 6000 : 2020-11-18, 2026-05-30
RX 7000 : 2022-12-13, 2026-05-30
RX 9000 : 2025-03-06, 2026-05-30
section Intel
Arc A : 2022-03-30, 2026-05-30
Arc B : 2024-12-13, 2026-05-30
| FP32 | FP16 | BF16 | FP8 | MXFP8 | FP4 | |
|---|---|---|---|---|---|---|
| NVIDIA | ||||||
| RTX 5000(Blackwell) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| RTX 4000(Ada Lovelace) | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| RTX 3000(Ampere) | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| RTX 2000(Turing) | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| GTX 1000(Pascal) | ✅ | ⚠️ | ❌ | ❌ | ❌ | ❌ |
| AMD | ||||||
| RX 9000 (RDNA4) | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| RX 7000 (RDNA3) | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| RX 6000 (RDNA2) | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| RX 5000 (RDNA1) | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Intel | ||||||
| Arc B (Battlemage) | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| Arc A (Alchemist) | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
Both FP8_scaled and MXFP8 will fall back to FP16 / BF16 format on unsupported GPUs, resulting in slower processing.
The model format and processing time vary greatly depending on your environment, so please try different models to find the best one for you.
Summary: Try HiDream-O1-Image_clear_v1!
- HiDream-O1-Image_clear delivers clear and vibrant outputs
- Supports Image-to-Image and various image editing tasks
- MXFP8 is a format exclusive to RTX 5000 series
This time, I introduced the HiDream-O1-Image_clear_v1 model.
Previously, when I fine-tuned base models such as Z-Image-Base and Flux [klein] 4B-Base, I had to consider VAE post-processing, and color adjustment was often particularly difficult.
Because HiDream-O1-Image uses the VAE-free UiT architecture, even base models were much easier to fine-tune.
I plan to continue following the latest developments on the UiT architecture.
Thank you for reading until the end!