The Most Powerful Image Generation AI "HiDream" Runs on 12GB VRAM!


- HiDream is the culmination of image generation AI
- Fully open-source
- Supports commercial use
Introduction
Hello, I'm Easygoing.
Today, I’m introducing HiDream, the most powerful local image generation AI.

What is HiDream?
HiDream is a new image generation AI released on April 8, 2025.
- 17 billion parameters, surpassing Flux.1
- Usable with 12GB VRAM
- Open license, allowing commercial use

HiDream is a model with the largest parameters locally, yet it can be used on mid-range GPUs and supports full commercial use!
Downloading the HiDream High-Quality Model Set
First, let me introduce the download page for the complete set of models to use HiDream at the highest quality.
HiDream operates with 12GB VRAM and 64GB system RAM in the workflow described later.
If system RAM is insufficient, using the lightweight versions below can reduce RAM usage.
HiDream Models
Text Encoder
History of Local Image Generation AI
Let’s take a look at the journey leading up to HiDream.
The timeline of major local image generation AIs is as follows:
gantt
title Image Generation AI Generations
dateFormat YYYY-MM-DD
tickInterval 6month
section 1st Generation
Stable Diffusion 1 :done, a1, 2022-08-22, 2025-05-01
section 2nd Generation
Stable Diffusion XL 1.0 :done, c2, 2023-07-27, 2025-05-01
section 3rd Generation
Stable Diffusion 3 : d1, 2024-06-12, 2025-05-01
AuraFlow : d2, 2024-07-12, 2025-05-01
section 3.1 Generation
Flux.1 : d3, 2024-08-01, 2025-05-01
section 3.5 Generation
HiDream : d4, 2025-04-06, 2025-05-01
Release | Parameters | Training Resolution | Text Encoders | Required VRAM | |
---|---|---|---|---|---|
1st Generation | August 2022 | 1 billion | 512 x 512 | 1 | 4 GB |
2nd Generation | July 2023 | 3.5 billion | 1024 x 1024 | 2 | 6 GB |
3rd Generation ~ | June 2024 | 6.8 billion ~ | 1024 x 1024 | 1~4 | 12 GB |
※ The classification of image generation AI generations is based on the author’s perspective.
Third-Generation Models
Currently available third-generation models are all based on Stable Diffusion 3 technology.
Country of Development | Parameters | Architecture | Open Source | |
---|---|---|---|---|
Stable Diffusion 3 | UK | 8 billion | MMDiT | ○ |
AuraFlow | USA | 6.8 billion | DiT + MMDiT | ○ |
Flux.1 | Germany | 12 billion | MMDiT + α? | △ |
HiDream | China | 17 billion | MoE (including DiT and MMDiT) | ○ |
Stable Diffusion 3

Image-to-image from SDXL anime model, with face and eyes redrawn using SDXL and Flux.1
Stable Diffusion 3, introduced in June 2024, adopts an architecture called MMDiT, improving quality and prompt fidelity compared to previous models.
The Large model of Stable Diffusion 3 was initially private but was fully released alongside Stable Diffusion 3.5 in October 2024.
AuraFlow

Introduced in July 2024, AuraFlow is a lightweight model that uses a single text encoder and replaces parts of MMDiT with a simpler DiT.
Since Stable Diffusion 3 was not fully open initially, AuraFlow became the first open third-generation model.
Flux.1 (Generation 3.1)

Introduced in August 2024, Flux.1 was released by a new company founded by members involved in developing Stable Diffusion.
Flux.1 boasts extremely high quality.
However, only the distilled models Flux[dev] and Flux.1[schnell] were released, and technical details remained private.
HiDream (Generation 3.5)

Introduced in April 2025, HiDream adopts an MoE architecture, combining multiple models, including the conventional MMDiT model.
HiDream features a new mechanism that switches internal models based on the generated illustration.
Although it was released about eight months after Flux.1, it was fully released under an open license, including the non-distilled Full model.
HiDream vs. Flux.1 Comparison!
Let’s compare HiDream and Flux.1 in detail.

Flux.1 is Restricted
Flux.1 was developed by Black Forest Lab, a German company.
Parameters | Model | Open Source | Development Use of Outputs | Commercial Use | |
---|---|---|---|---|---|
Flux.1[pro] | 12 billion | Non-distilled | × | × | × |
Flux.1[dev] | 12 billion | Distilled | △ | × | × |
Flux.1[schnell] | 12 billion | Distilled | △ | ○ | ○ |
Flux.1’s detailed technical information is private, and both Flux.1[dev] and Flux.1[schnell] are distilled models, making development challenging.
While the Flux.1[dev] model can generate high-quality illustrations, using its outputs for training other models is prohibited, creating a system where Black Forest Lab reclaims technology developed by the open community.
HiDream is Fully Open
HiDream was developed by VIVAGO AI, a Hong Kong AI startup, and all models are fully open.
Parameters | Model | Open Source | Development Use of Outputs | Commercial Use | |
---|---|---|---|---|---|
HiDream-I1-Full | 17 billion | Non-distilled | ○ | ○ | ○ |
HiDream-I1-Dev | 17 billion | Distilled | ○ | ○ | ○ |
HiDream-I1-Fast | 17 billion | Distilled | ○ | ○ | ○ |
HiDream-E1-Full | 17 billion | Unknown | ○ | ○ | ○ |
The company originally offered a paid online AI generation service, but HiDream is likely the core of their image generation service released for free.
Although HiDream’s technical paper has not yet been published, it includes not only distilled Dev and Fast models but also the non-distilled Full model, making it easier for the community to develop compared to Flux.1.
HiDream’s Four Models
HiDream has the following four models:
CFG Scale | Negative Prompt | Recommended Steps | Use Case | |
---|---|---|---|---|
HiDream-I1-Full | Enabled | ○ | 50 steps | Development |
HiDream-I1-Dev | Disabled | × | 28 steps | High Quality |
HiDream-I1-Fast | Disabled | × | 16 steps | High Speed |
HiDream-E1-Full | Enabled | ○ | 28 steps | Redrawing Only |
The CFG scale in the table amplifies the effect of input prompts in image generation AI but doubles the rendering time when enabled.
The HiDream-I1-Full model with CFG scale enabled has a high recommended step count, making it unsuitable for daily use.
On the other hand, the Dev and Fast models with CFG scale disabled cannot use Negative Prompts but process faster.
HiDream-E1-Full is a custom version of HiDream-I1-Full, a dedicated model for redrawing images based on prompt instructions.


Using Four Text Encoders!
HiDream uses four text encoders.
Release | Developer | Parameters | Comprehension | |
---|---|---|---|---|
Llama-3.1-8B-Instruct | July 2024 | Meta | 8 billion | Very Long Text |
T5-XXL (v1.1) | March 2022 | 11 billion | Very Long Text | |
CLIP-G | January 2023 | LAION + HuggingFace | 340 million | Long Text |
CLIP-L | January 2021 | LAION + HuggingFace | 63 million | Short Text/Words |

By having one of the four text encoders respond to any prompt, prompt fidelity is further improved.

Although HiDream’s text encoders are large, using the QuadrupleCLIPLoaderMultiGPU node from ComfyUI-MultiGPU with device: cpu to load them into system RAM prevents VRAM overload!
Japanese Prompts Supported!
Among HiDream’s text encoders, Llama and T5-XXL have Japanese comprehension capabilities, responding to Japanese prompts at a practical level.


The Llama-3.1-8b-instruct model, with high multilingual comprehension among Llama models, can faithfully reproduce long Japanese prompts.

HiDream’s Stable Perspective!
HiDream excels at stable perspective rendering.

The number of parameters greatly affects the accuracy of perspective in image generation AI.
One-Point Perspective Comparison


Parameters | Model | Open Source | Development Use of Outputs | Commercial Use | |
---|---|---|---|---|---|
Flux.1[pro] | 12 billion | Non-distilled | × | × | × |
Flux.1[dev] | 12 billion | Distilled | △ | × | × |
Flux.1[schnell] | 12 billion | Distilled | △ | ○ | ○ |
Reiterated: Parameter count comparison
Perspective rendering was previously dominated by Flux.1, but HiDream, with more parameters than Flux.1, offers equal or superior three-dimensional expressiveness.
Workflows
Now, let’s introduce the workflows for each HiDream model.
The workflows presented here modify the official versions by changing the text encoder loading to QuadrupleCLIPLoaderMultiGPU and disabling negative prompts.
HiDream-I1-Full

HiDream-I1-Dev

HiDream-I1-Fast

HiDream-E1-Full

Conclusion: HiDream is the Culmination of Image Generation AI!
- HiDream is the culmination of image generation AI
- Fully open-source
- Supports commercial use
HiDream can be considered the culmination of open-source image generation AI.
While Flux.1 models were highly polished, their perfection made them resistant to new changes, and modifications often worsened performance.

HiDream’s current quality is slightly below Flux.1, but with future optimizations, it has the potential to surpass Flux.1.
HiDream’s possibilities are limitless, potentially embodying the future of open-source image generation AI envisioned by the creators of Stable Diffusion.
Thank you for reading to the end!
Reference Article
When using HiDream, I also recommend optimizing ComfyUI’s VRAM management.
Update History
May 7, 2025
The original Llama-3.1-8b-instruct model has high Japanese comprehension and better image quality, so the description of the Japanese fine-tuned Llama model was removed.