HiDream-O1-Image_clear_v1: Easy Local Image Editing with UiT Architecture

HiDream-O1-Image_clear_v1_anime_LAB_adjust_00054_.png (1600❌1600)
  • HiDream-O1-Image_clear delivers clear and vibrant outputs
  • Supports Image-to-Image and various image editing tasks
  • MXFP8 is a format exclusive to RTX 5000 series

Introduction

Hello, I'm Easygoing.

This time, I've released the HiDream-O1-Image_clear_v1 model, which makes image editing easy to do locally, so let me introduce it to you.

Anime-style woman generated with HiDream-O1-Image_clear_v1. Long black hair with a mature atmosphere
HiDream-O1-Image_clear_v1

HiDream-O1-Image is a versatile image generation and editing model

HiDream-O1-Image is a versatile AI-powered image generation and editing model.

HiDream-O1-Image adopts the UiT architecture, which understands language and images in the same dimensional space. Thanks to this, even though it is a lightweight model, it possesses high image editing capabilities.

Since the HiDream-O1-Image model is a newly released base model, its outputs tend to be slightly blurry. This time, I fine-tuned it to produce clearer illustrations.

Real illustration examples!

Let’s compare some actual illustrations.

The left side shows the adjusted HiDream-O1-Image_clear_v1, and the right side shows the output from the original HiDream-O1-Image.

Neon Sign

HiDream-O1-Image_clear_v1: Anime illustration of a brown-haired woman smiling at a neon sign bar. HiDream-O1-Image (original): Anime illustration of a brown-haired woman smiling at a neon sign bar.
HiDream-O1-Image_clear_v1  |  HiDream-O1-Image (Original)
HiDream-O1-Image_clear_v1_original_compare_anime_close_up.png (1600❌797)

Paris at Dusk

HiDream-O1-Image_clear_v1: photorealistic illustration of a woman in a black hat against a Paris dusk background. HiDream-O1-Image (original): photorealistic illustration of a woman in a black hat against a Paris dusk background.
HiDream-O1-Image_clear_v1_original_compare_photoreal_close_up.png (1600❌797)

You can see that the left side (HiDream-O1-Image_clear_v1) produces higher contrast and clearer illustrations compared to the right side (original HiDream-O1-Image).

On the other hand, since HiDream-O1-Image_clear_v1 was mainly fine-tuned on anime illustrations, photorealistic images may sometimes feel a bit too high in contrast. In such cases, try adjusting parameters by lowering the CFG scale or noise_level.

Let’s try image editing!

Now, let’s actually try image editing using the HiDream-O1-Image_clear_v1 model.

In these examples, the source images were also generated with HiDream-O1-Image_clear.

I’ve also attached the ComfyUI workflows I actually used.

Convert to Anime Illustration

Photo of a woman with long black hair in a blue dress Photo of a woman with long black hair in a blue dress
ComfyUI workflow for converting a photorealistic image to anime illustration using image-to-image
change to anime illustration

Using as Refiner to Redraw and Enhance Details

Photorealistic illustration of a woman in an amphitheater Redrawn version with enhanced details
ComfyUI workflow using HiDream-O1-Image_clear_v1 as a refiner to enhance details

Costume & Background Transformation

Unimpressive middle-aged Japanese man wearing glasses Man smiling widely while holding a long sword and wearing plate armor
ComfyUI workflow for changing costume and background
A full-length portrait of a man in a fantasy world, holding a long sword and wearing plate armor, beaming with excitement as he looks forward to his upcoming adventures. The background features a medieval village with a wheat field in the distance.

Convert to Black and White Manga

Photorealistic illustration of a supercar on a night coast Converted to black and white manga style
ComfyUI workflow for converting to black and white manga style
change to black and white manga artwork

Combining Two Images

Photorealistic illustration of a cat curled up sleeping on a blanket Photorealistic illustration of a living room with a fireplace and carpet
Photorealistic illustration of a cat sleeping on a carpet
ComfyUI workflow for combining two images
draw image1 cat on image2 carpet  

ComfyUI-uit-hidream-o1 Custom Node

For this workflow, I used the ComfyUI-uit-hidream-o1 custom node.

Screenshot of searching for
Nodes Manager search screen

UIT Sampler Node

Screenshot of the UIT Sampler node

Inputs

  • model: Input the model
  • clip: Dummy input (required for ComfyUI connection)
  • vae: Dummy input (required for ComfyUI connection)
  • input_image: Use an input image instead of initial white noise
  • reference_image: Directly converts the input image into tokens (can be used as a replacement for or in combination with text prompts)

Settings

  • width, height: HiDream-O1’s default resolution is 2048 x 2048
    • When an input_image is provided, it resizes the input_image to 4 megapixels and uses that resolution. In that case, width and height are ignored.
  • noise_scale: Strength of the noise

The official workflow uses more than half dummy nodes

HiDream-O1-Image has an official workflow published by Comfy Org, but it uses many dummy nodes for features that don’t actually exist. It is not recommended if you want to understand the UiT architecture.

Screenshot of the official Comfy Org HiDream-O1-Image workflow
More than half of the official Comfy Org workflow consists of dummy nodes.
UiT architecture does not have clip, vae, external conditioning, or latent.

ComfyUI is a tool specialized for inference using CLIP and VAE, which appeared after Stable Diffusion 1. Therefore, it is somewhat inevitable that implementing the simpler UiT architecture becomes relatively complex.

FP8_scaled vs MXFP8

On the release page for HiDream-O1-Image_clear_v1, I have published two types of high-precision FP8 format models: FP8_scaled and MXFP8.

Screenshot of the model list in the Hugging Face HiDream-O1-Image_clear repository
  • FP8_scaled: Improved precision, runs fast on RTX 4000 series
  • MXFP8: Further improved precision, runs fast on RTX 5000 series

FP8_scaled is a model that uses scaling to efficiently utilize the overall bit count and suppress precision loss.

MXFP8 further divides the data into blocks of 32 for scaling, which avoids the influence of outliers and improves precision even more. However, hardware support is currently limited to NVIDIA’s RTX 5000 series GPUs.

gantt
    title GPU Series Roadmap
    dateFormat YYYY-MM-DD
    tickInterval 12month
    axisFormat %Y
    section NVIDIA
        GTX 1000  : 2016-05-27, 2026-05-30
        RTX 2000  : 2018-09-20, 2026-05-30
        RTX 3000  : 2020-09-17, 2026-05-30
        RTX 4000  : 2022-10-12, 2026-05-30
        RTX 5000  : 2025-01-30, 2026-05-30
    section AMD
        RX 5000   : 2019-07-07, 2026-05-30
        RX 6000   : 2020-11-18, 2026-05-30
        RX 7000   : 2022-12-13, 2026-05-30
        RX 9000   : 2025-03-06, 2026-05-30
    section Intel
        Arc A     : 2022-03-30, 2026-05-30
        Arc B     : 2024-12-13, 2026-05-30

FP32 FP16 BF16 FP8 MXFP8 FP4
NVIDIA
RTX 5000(Blackwell)
RTX 4000(Ada Lovelace)
RTX 3000(Ampere)
RTX 2000(Turing)
GTX 1000(Pascal) ⚠️
AMD
RX 9000 (RDNA4)
RX 7000 (RDNA3)
RX 6000 (RDNA2)
RX 5000 (RDNA1)
Intel
Arc B (Battlemage)
Arc A (Alchemist)

Both FP8_scaled and MXFP8 will fall back to FP16 / BF16 format on unsupported GPUs, resulting in slower processing.

The model format and processing time vary greatly depending on your environment, so please try different models to find the best one for you.

Anime illustration of a woman with long black hair and a mature atmosphere indoors
The best format varies from person to person.

Summary: Try HiDream-O1-Image_clear_v1!

  • HiDream-O1-Image_clear delivers clear and vibrant outputs
  • Supports Image-to-Image and various image editing tasks
  • MXFP8 is a format exclusive to RTX 5000 series

This time, I introduced the HiDream-O1-Image_clear_v1 model.

Previously, when I fine-tuned base models such as Z-Image-Base and Flux [klein] 4B-Base, I had to consider VAE post-processing, and color adjustment was often particularly difficult.

Anime illustration of a woman with long black hair and a mature atmosphere outdoors

Because HiDream-O1-Image uses the VAE-free UiT architecture, even base models were much easier to fine-tune.

I plan to continue following the latest developments on the UiT architecture.

Thank you for reading until the end!