The Most Powerful Image Generation AI "HiDream" Runs on 12GB VRAM!

A male character with orange hair and blue eyes wearing a black suit and tie, set against a futuristic background_2
  • HiDream is the culmination of image generation AI
  • Fully open-source
  • Supports commercial use

Introduction

Hello, I'm Easygoing.

Today, I’m introducing HiDream, the most powerful local image generation AI.

a young male character with blue eyes and dark hair is dressed in a formal suit standing in a dimly lit room filled with computer screens displaying various images surrounded by silhouetted figures in
HiDream-I1-Dev

What is HiDream?

HiDream is a new image generation AI released on April 8, 2025.

  • 17 billion parameters, surpassing Flux.1
  • Usable with 12GB VRAM
  • Open license, allowing commercial use
Chart showing VRAM usage for image generation AI in English
Minimum split capacity of models in ComfyUI

HiDream is a model with the largest parameters locally, yet it can be used on mid-range GPUs and supports full commercial use!

Downloading the HiDream High-Quality Model Set

First, let me introduce the download page for the complete set of models to use HiDream at the highest quality.

HiDream operates with 12GB VRAM and 64GB system RAM in the workflow described later.

If system RAM is insufficient, using the lightweight versions below can reduce RAM usage.

HiDream Models

Text Encoder

History of Local Image Generation AI

Let’s take a look at the journey leading up to HiDream.

The timeline of major local image generation AIs is as follows:


gantt
title Image Generation AI Generations
dateFormat YYYY-MM-DD
tickInterval 6month
section 1st Generation
Stable Diffusion 1 :done, a1, 2022-08-22, 2025-05-01
section 2nd Generation
Stable Diffusion XL 1.0 :done, c2, 2023-07-27, 2025-05-01
section 3rd Generation     
Stable Diffusion 3 : d1, 2024-06-12, 2025-05-01
AuraFlow : d2, 2024-07-12, 2025-05-01
section 3.1 Generation  
Flux.1   : d3, 2024-08-01, 2025-05-01
section 3.5 Generation  
HiDream   : d4, 2025-04-06, 2025-05-01

Release Parameters Training Resolution Text Encoders Required VRAM
1st Generation August 2022 1 billion 512 x 512 1 4 GB
2nd Generation July 2023 3.5 billion 1024 x 1024 2 6 GB
3rd Generation ~ June 2024 6.8 billion ~ 1024 x 1024 1~4 12 GB

※ The classification of image generation AI generations is based on the author’s perspective.

Third-Generation Models

Currently available third-generation models are all based on Stable Diffusion 3 technology.

Country of Development Parameters Architecture Open Source
Stable Diffusion 3 UK 8 billion MMDiT
AuraFlow USA 6.8 billion DiT + MMDiT
Flux.1 Germany 12 billion MMDiT + α?
HiDream China 17 billion MoE (including DiT and MMDiT)

Stable Diffusion 3

SD3.5_large
Stable Diffusion 3.5 Large
Image-to-image from SDXL anime model, with face and eyes redrawn using SDXL and Flux.1

Stable Diffusion 3, introduced in June 2024, adopts an architecture called MMDiT, improving quality and prompt fidelity compared to previous models.

The Large model of Stable Diffusion 3 was initially private but was fully released alongside Stable Diffusion 3.5 in October 2024.

AuraFlow

AuraFlow
AuraFlow_0.3

Introduced in July 2024, AuraFlow is a lightweight model that uses a single text encoder and replaces parts of MMDiT with a simpler DiT.

Since Stable Diffusion 3 was not fully open initially, AuraFlow became the first open third-generation model.

Flux.1 (Generation 3.1)

Flux_dev
Flux.1[dev]

Introduced in August 2024, Flux.1 was released by a new company founded by members involved in developing Stable Diffusion.

Flux.1 boasts extremely high quality.

However, only the distilled models Flux[dev] and Flux.1[schnell] were released, and technical details remained private.

HiDream (Generation 3.5)

a young male animated character with blue eyes and short dark hair stands in front of a futuristic cityscape at night surrounded by digital screens with silhouettes of other characters in the backgrou
HiDream-I1-Dev

Introduced in April 2025, HiDream adopts an MoE architecture, combining multiple models, including the conventional MMDiT model.

HiDream features a new mechanism that switches internal models based on the generated illustration.

Although it was released about eight months after Flux.1, it was fully released under an open license, including the non-distilled Full model.

HiDream vs. Flux.1 Comparison!

Let’s compare HiDream and Flux.1 in detail.

a young man with blonde hair and blue eyes stands in a high - tech control room or control room surrounded by digital screens displaying various data and symbols with a silhouetted figure standing in

Flux.1 is Restricted

Flux.1 was developed by Black Forest Lab, a German company.

Parameters Model Open Source Development Use of Outputs Commercial Use
Flux.1[pro] 12 billion Non-distilled × × ×
Flux.1[dev] 12 billion Distilled × ×
Flux.1[schnell] 12 billion Distilled

Flux.1’s detailed technical information is private, and both Flux.1[dev] and Flux.1[schnell] are distilled models, making development challenging.

While the Flux.1[dev] model can generate high-quality illustrations, using its outputs for training other models is prohibited, creating a system where Black Forest Lab reclaims technology developed by the open community.

HiDream is Fully Open

HiDream was developed by VIVAGO AI, a Hong Kong AI startup, and all models are fully open.

Parameters Model Open Source Development Use of Outputs Commercial Use
HiDream-I1-Full 17 billion Non-distilled
HiDream-I1-Dev 17 billion Distilled
HiDream-I1-Fast 17 billion Distilled
HiDream-E1-Full 17 billion Unknown

The company originally offered a paid online AI generation service, but HiDream is likely the core of their image generation service released for free.

Although HiDream’s technical paper has not yet been published, it includes not only distilled Dev and Fast models but also the non-distilled Full model, making it easier for the community to develop compared to Flux.1.

HiDream’s Four Models

HiDream has the following four models:

CFG Scale Negative Prompt Recommended Steps Use Case
HiDream-I1-Full Enabled 50 steps Development
HiDream-I1-Dev Disabled × 28 steps High Quality
HiDream-I1-Fast Disabled × 16 steps High Speed
HiDream-E1-Full Enabled 28 steps Redrawing Only

The CFG scale in the table amplifies the effect of input prompts in image generation AI but doubles the rendering time when enabled.

The HiDream-I1-Full model with CFG scale enabled has a high recommended step count, making it unsuitable for daily use.

On the other hand, the Dev and Fast models with CFG scale disabled cannot use Negative Prompts but process faster.

HiDream-E1-Full is a custom version of HiDream-I1-Full, a dedicated model for redrawing images based on prompt instructions.

an animated female character with blonde hair and blue eyes stands in a well - lit supermarket aisle wearing a blue blazer white shirt and holding a cup with a red design surrounded by shelves close up HiDream-E1-Full image
Instruction in Japanese: “Change the cup in hand to a beer”

Using Four Text Encoders!

HiDream uses four text encoders.

Release Developer Parameters Comprehension
Llama-3.1-8B-Instruct July 2024 Meta 8 billion Very Long Text
T5-XXL (v1.1) March 2022 Google 11 billion Very Long Text
CLIP-G January 2023 LAION + HuggingFace 340 million Long Text
CLIP-L January 2021 LAION + HuggingFace 63 million Short Text/Words
an animated female character with red - brown hair and blue eyes stands in the rain wearing a blue hooded jacket amidst a bustling nighttime cityscape with neon lights and silhouetted people in the ba
Accurately reflects color-specified prompts (indigo, deep blue, burgundy)

By having one of the four text encoders respond to any prompt, prompt fidelity is further improved.

QuadrupleCLIPLoaderMultiGPU node

Although HiDream’s text encoders are large, using the QuadrupleCLIPLoaderMultiGPU node from ComfyUI-MultiGPU with device: cpu to load them into system RAM prevents VRAM overload!

Japanese Prompts Supported!

Among HiDream’s text encoders, Llama and T5-XXL have Japanese comprehension capabilities, responding to Japanese prompts at a practical level.

日本の東京の夜景のアニメイラスト、遠くにスカイツリーも映っている
Anime illustration of Tokyo’s night view, with Sky Tree in the distance
空港の到着ロビーで「welcome to Japan」と書かれた大きなプラカードを持って笑顔で出迎える中年の日本人女性の写真_3
Photo of a middle-aged Japanese woman smiling and welcoming people with a large placard reading “welcome to Japan” in the airport arrival lobby

The Llama-3.1-8b-instruct model, with high multilingual comprehension among Llama models, can faithfully reproduce long Japanese prompts.

Llama-3.1-Swallow-8B-Instruct-v0.3-BF16_test
In the snowy depths of a winter mountain, the roar of an engine breaks the silence as a Subaru rally car speeds through a silver world. The driver grips the steering wheel tightly, focusing on the slippery snow-covered road while skillfully drifting through corners. Subaru’s AWD system delivers exceptional traction even on snow, stabilizing the vehicle. Along the road, warmly dressed spectators gather, holding their breath as they watch the powerful drive. The blue-bodied Subaru, kicking up snow dust, asserts its dominance as the king of the snowy terrain.

HiDream’s Stable Perspective!

HiDream excels at stable perspective rendering.

a young man with orange hair and blue eyes stands in a futuristic control room filled with computer monitors displaying various images surrounded by a dark and moody atmosphere with a predominantly bl
HiDream-I1-Dev

The number of parameters greatly affects the accuracy of perspective in image generation AI.

One-Point Perspective Comparison

HiDream Flux1
Left: HiDream-I1-Dev | Right: Flux.1[dev]
Parameters Model Open Source Development Use of Outputs Commercial Use
Flux.1[pro] 12 billion Non-distilled × × ×
Flux.1[dev] 12 billion Distilled × ×
Flux.1[schnell] 12 billion Distilled

Reiterated: Parameter count comparison

Perspective rendering was previously dominated by Flux.1, but HiDream, with more parameters than Flux.1, offers equal or superior three-dimensional expressiveness.

Workflows

Now, let’s introduce the workflows for each HiDream model.

The workflows presented here modify the official versions by changing the text encoder loading to QuadrupleCLIPLoaderMultiGPU and disabling negative prompts.

HiDream-I1-Full

HiDream-I1-Full _workflow

HiDream-I1-Dev

HiDream-I1-Dev_workflow

HiDream-I1-Fast

HiDream-I1-Fast_workflow

HiDream-E1-Full

HiDream-E1-Full_workflow

Conclusion: HiDream is the Culmination of Image Generation AI!

  • HiDream is the culmination of image generation AI
  • Fully open-source
  • Supports commercial use

HiDream can be considered the culmination of open-source image generation AI.

While Flux.1 models were highly polished, their perfection made them resistant to new changes, and modifications often worsened performance.

in a futuristic setting a man in a suit gazes at a holographic display of a womans face amidst a sea of digital screens all enveloped in hues of blue orange and black evoking a cyberpunk vibe

HiDream’s current quality is slightly below Flux.1, but with future optimizations, it has the potential to surpass Flux.1.

HiDream’s possibilities are limitless, potentially embodying the future of open-source image generation AI envisioned by the creators of Stable Diffusion.

Thank you for reading to the end!


Reference Article

When using HiDream, I also recommend optimizing ComfyUI’s VRAM management.


Update History

May 7, 2025

The original Llama-3.1-8b-instruct model has high Japanese comprehension and better image quality, so the description of the Japanese fine-tuned Llama model was removed.