The Most Powerful Image Generation AI "HiDream" Runs on 12GB VRAM!

HiDream #ComfyUI #Text Encorder

2025-5-52025-6-17

A male character with orange hair and blue eyes wearing a black suit and tie, set against a futuristic background_2

HiDream is the culmination of image generation AI
Fully open-source
Supports commercial use

Introduction

Hello, I'm Easygoing.

Today, I’m introducing HiDream, the most powerful local image generation AI.

a young male character with blue eyes and dark hair is dressed in a formal suit standing in a dimly lit room filled with computer screens displaying various images surrounded by silhouetted figures in — HiDream-I1-Dev

What is HiDream?

HiDream is a new image generation AI released on April 8, 2025.

17 billion parameters, surpassing Flux.1
Usable with 12GB VRAM
Open license, allowing commercial use

HiDream is a model with the largest parameters locally, yet it can be used on mid-range GPUs and supports full commercial use!

Downloading the HiDream High-Quality Model Set

First, let me introduce the download page for the complete set of models to use HiDream at the highest quality.

easygoing0114/HiDream_HQ-models · Hugging Face

HiDream operates with 12GB VRAM and 64GB system RAM in the workflow described later.

If system RAM is insufficient, using the lightweight versions below can reduce RAM usage.

HiDream Models

Text Encoder

FP8 Text Encoder

History of Local Image Generation AI

Let’s take a look at the journey leading up to HiDream.

The timeline of major local image generation AIs is as follows:


gantt
title Image Generation AI Generations
dateFormat YYYY-MM-DD
tickInterval 6month
section 1st Generation
Stable Diffusion 1 :done, a1, 2022-08-22, 2025-05-01
section 2nd Generation
Stable Diffusion XL 1.0 :done, c2, 2023-07-27, 2025-05-01
section 3rd Generation     
Stable Diffusion 3 : d1, 2024-06-12, 2025-05-01
AuraFlow : d2, 2024-07-12, 2025-05-01
section 3.1 Generation  
Flux.1   : d3, 2024-08-01, 2025-05-01
section 3.5 Generation  
HiDream   : d4, 2025-04-06, 2025-05-01

	Release	Parameters	Training Resolution	Text Encoders	Required VRAM
1st Generation	August 2022	1 billion	512 x 512	1	4 GB
2nd Generation	July 2023	3.5 billion	1024 x 1024	2	6 GB
3rd Generation ~	June 2024	6.8 billion ~	1024 x 1024	1~4	12 GB

※ The classification of image generation AI generations is based on the author’s perspective.

Third-Generation Models

Currently available third-generation models are all based on Stable Diffusion 3 technology.

	Country of Development	Parameters	Architecture	Open Source
Stable Diffusion 3	UK	8 billion	MMDiT	○
AuraFlow	USA	6.8 billion	DiT + MMDiT	○
Flux.1	Germany	12 billion	MMDiT + α?	△
HiDream	China	17 billion	MoE (including DiT and MMDiT)	○

Stable Diffusion 3

SD3.5_large — Stable Diffusion 3.5 Large
Image-to-image from SDXL anime model, with face and eyes redrawn using SDXL and Flux.1

Stable Diffusion 3, introduced in June 2024, adopts an architecture called MMDiT, improving quality and prompt fidelity compared to previous models.

The Large model of Stable Diffusion 3 was initially private but was fully released alongside Stable Diffusion 3.5 in October 2024.

AuraFlow

Introduced in July 2024, AuraFlow is a lightweight model that uses a single text encoder and replaces parts of MMDiT with a simpler DiT.

Since Stable Diffusion 3 was not fully open initially, AuraFlow became the first open third-generation model.

Flux.1 (Generation 3.1)

Introduced in August 2024, Flux.1 was released by a new company founded by members involved in developing Stable Diffusion.

Flux.1 boasts extremely high quality.

However, only the distilled models Flux[dev] and Flux.1[schnell] were released, and technical details remained private.

HiDream (Generation 3.5)

a young male animated character with blue eyes and short dark hair stands in front of a futuristic cityscape at night surrounded by digital screens with silhouettes of other characters in the backgrou — HiDream-I1-Dev

Introduced in April 2025, HiDream adopts an MoE architecture, combining multiple models, including the conventional MMDiT model.

HiDream features a new mechanism that switches internal models based on the generated illustration.

Although it was released about eight months after Flux.1, it was fully released under an open license, including the non-distilled Full model.

HiDream vs. Flux.1 Comparison!

Let’s compare HiDream and Flux.1 in detail.

a young man with blonde hair and blue eyes stands in a high - tech control room or control room surrounded by digital screens displaying various data and symbols with a silhouetted figure standing in

Flux.1 is Restricted

Flux.1 was developed by Black Forest Lab, a German company.

	Parameters	Model	Open Source	Development Use of Outputs	Commercial Use
Flux.1[pro]	12 billion	Non-distilled	×	×	×
Flux.1[dev]	12 billion	Distilled	△	×	×
Flux.1[schnell]	12 billion	Distilled	△	○	○

Flux.1’s detailed technical information is private, and both Flux.1[dev] and Flux.1[schnell] are distilled models, making development challenging.

While the Flux.1[dev] model can generate high-quality illustrations, using its outputs for training other models is prohibited, creating a system where Black Forest Lab reclaims technology developed by the open community.

HiDream is Fully Open

HiDream was developed by VIVAGO AI, a Hong Kong AI startup, and all models are fully open.

	Parameters	Model	Open Source	Development Use of Outputs	Commercial Use
HiDream-I1-Full	17 billion	Non-distilled	○	○	○
HiDream-I1-Dev	17 billion	Distilled	○	○	○
HiDream-I1-Fast	17 billion	Distilled	○	○	○
HiDream-E1-Full	17 billion	Unknown	○	○	○

AI Video & Image Creation Tools | vivago.ai Free Trial

The company originally offered a paid online AI generation service, but HiDream is likely the core of their image generation service released for free.

Although HiDream’s technical paper has not yet been published, it includes not only distilled Dev and Fast models but also the non-distilled Full model, making it easier for the community to develop compared to Flux.1.

HiDream’s Four Models

HiDream has the following four models:

	CFG Scale	Negative Prompt	Recommended Steps	Use Case
HiDream-I1-Full	Enabled	○	50 steps	Development
HiDream-I1-Dev	Disabled	×	28 steps	High Quality
HiDream-I1-Fast	Disabled	×	16 steps	High Speed
HiDream-E1-Full	Enabled	○	28 steps	Redrawing Only

The CFG scale in the table amplifies the effect of input prompts in image generation AI but doubles the rendering time when enabled.

The HiDream-I1-Full model with CFG scale enabled has a high recommended step count, making it unsuitable for daily use.

On the other hand, the Dev and Fast models with CFG scale disabled cannot use Negative Prompts but process faster.

Is Negative Prompt Necessary? Unleashing AI’s Creativity! | AI image journey

HiDream-E1-Full is a custom version of HiDream-I1-Full, a dedicated model for redrawing images based on prompt instructions.

an animated female character with blonde hair and blue eyes stands in a well - lit supermarket aisle wearing a blue blazer white shirt and holding a cup with a red design surrounded by shelves close up — Instruction in Japanese: “Change the cup in hand to a beer”

HiDream-E1-Full image — Instruction in Japanese: “Change the cup in hand to a beer”

Using Four Text Encoders!

HiDream uses four text encoders.

	Release	Developer	Parameters	Comprehension
Llama-3.1-8B-Instruct	July 2024	Meta	8 billion	Very Long Text
T5-XXL (v1.1)	March 2022	Google	11 billion	Very Long Text
CLIP-G	January 2023	LAION + HuggingFace	340 million	Long Text
CLIP-L	January 2021	LAION + HuggingFace	63 million	Short Text/Words

an animated female character with red - brown hair and blue eyes stands in the rain wearing a blue hooded jacket amidst a bustling nighttime cityscape with neon lights and silhouetted people in the ba — Accurately reflects color-specified prompts (indigo, deep blue, burgundy)

By having one of the four text encoders respond to any prompt, prompt fidelity is further improved.

Although HiDream’s text encoders are large, using the QuadrupleCLIPLoaderMultiGPU node from ComfyUI-MultiGPU with device: cpu to load them into system RAM prevents VRAM overload!

Japanese Prompts Supported!

Among HiDream’s text encoders, Llama and T5-XXL have Japanese comprehension capabilities, responding to Japanese prompts at a practical level.

日本の東京の夜景のアニメイラスト、遠くにスカイツリーも映っている — Anime illustration of Tokyo’s night view, with Sky Tree in the distance

空港の到着ロビーで「welcome to Japan」と書かれた大きなプラカードを持って笑顔で出迎える中年の日本人女性の写真_3 — Photo of a middle-aged Japanese woman smiling and welcoming people with a large placard reading “welcome to Japan” in the airport arrival lobby

The Llama-3.1-8b-instruct model, with high multilingual comprehension among Llama models, can faithfully reproduce long Japanese prompts.

Llama-3.1-Swallow-8B-Instruct-v0.3-BF16_test — In the snowy depths of a winter mountain, the roar of an engine breaks the silence as a Subaru rally car speeds through a silver world. The driver grips the steering wheel tightly, focusing on the slippery snow-covered road while skillfully drifting through corners. Subaru’s AWD system delivers exceptional traction even on snow, stabilizing the vehicle. Along the road, warmly dressed spectators gather, holding their breath as they watch the powerful drive. The blue-bodied Subaru, kicking up snow dust, asserts its dominance as the king of the snowy terrain.

HiDream’s Stable Perspective!

HiDream excels at stable perspective rendering.

a young man with orange hair and blue eyes stands in a futuristic control room filled with computer monitors displaying various images surrounded by a dark and moody atmosphere with a predominantly bl — HiDream-I1-Dev

The number of parameters greatly affects the accuracy of perspective in image generation AI.

One-Point Perspective Comparison

Left: HiDream-I1-Dev | Right: Flux.1[dev]

Flux1 — Left: HiDream-I1-Dev | Right: Flux.1[dev]

	Parameters	Model	Open Source	Development Use of Outputs	Commercial Use
Flux.1[pro]	12 billion	Non-distilled	×	×	×
Flux.1[dev]	12 billion	Distilled	△	×	×
Flux.1[schnell]	12 billion	Distilled	△	○	○

Reiterated: Parameter count comparison

Perspective rendering was previously dominated by Flux.1, but HiDream, with more parameters than Flux.1, offers equal or superior three-dimensional expressiveness.

Workflows

Now, let’s introduce the workflows for each HiDream model.

The workflows presented here modify the official versions by changing the text encoder loading to QuadrupleCLIPLoaderMultiGPU and disabling negative prompts.

Conclusion: HiDream is the Culmination of Image Generation AI!

HiDream is the culmination of image generation AI
Fully open-source
Supports commercial use

HiDream can be considered the culmination of open-source image generation AI.

While Flux.1 models were highly polished, their perfection made them resistant to new changes, and modifications often worsened performance.

in a futuristic setting a man in a suit gazes at a holographic display of a womans face amidst a sea of digital screens all enveloped in hues of blue orange and black evoking a cyberpunk vibe

HiDream’s current quality is slightly below Flux.1, but with future optimizations, it has the potential to surpass Flux.1.

HiDream’s possibilities are limitless, potentially embodying the future of open-source image generation AI envisioned by the creators of Stable Diffusion.

Thank you for reading to the end!

Reference Article

When using HiDream, I also recommend optimizing ComfyUI’s VRAM management.

[ComfyUI Intermediate] Settings to Control VRAM and Unlock Peak Performance! | AI image journey

Update History

May 7, 2025

The original Llama-3.1-8b-instruct model has high Japanese comprehension and better image quality, so the description of the Japanese fine-tuned Llama model was removed.