TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Zhenglin Cheng^1,2,3,4,*Peng Sun^1,3,4,*Jianguo Li¹Tao Lin^3,1,†

¹Inclusion AI²Shanghai Innovation Institute³Westlake University⁴Zhejiang University*Equal contribution.ㅤㅤㅤㅤㅤㅤㅤㅤ†Corresponding authors.

arxiv code hf

Generation Speed Comparison (1328×1328)

OriginalvsOurs (Accelerated)

NFE

Qwen-Image 🦥

Multi-step (50×2 NFEs)

0.0s

0/4 images~36.55s / img

TwinFlow-Qwen-Image 🚀🚀🚀

Few-step (2 NFEs)

0.0s

0/4 images~1.21s / img

2-NFE visualization of TwinFlow-Qwen-Image.

Abstract

Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency, requiring 40-100 Number of Function Evaluations (NFEs). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need for fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100× with minor quality degradation.

News

We release TwinFlow-Qwen-Image-v1.0! And we are also working on Z-Image-Turbo to make it more faster!

Overview

The Trilemma of Few-Step Generation

Recent large multi-modal models have achieved stunning generative capabilities, but they come at a steep cost: inference efficiency. Standard diffusion and flow-matching models typically require 50-100 NFEs to generate a single image.

While researchers have raced to solve this with few-step distillation, existing solutions force a compromise between complexity, stability, and quality. As shown below, current SoTA methods rely heavily on "baggage"—auxiliary discriminators/fake scores or frozen teacher models, which inflate memory costs and destabilize training.

Method	Generation Type	Requires Auxiliary Trained Model? (e.g., Discriminator, fake score)	Requires Frozen Teacher Model?	Major Drawback
GANs	1-step	1	0	Unstable training dynamics
Consistency Models	1-step / Few-step	0	1	Quality degradation at < 4 steps
DMD / DMD2	1-step / Few-step	1-2	1	High GPU Memory, Complex pipeline
TwinFlow (Ours)	1-step / Few-step	0	0	-

TwinFlow: Simplicity via Self-Adversarial Flows

The TwinFlow Architecture.

We introduce TwinFlow, a framework that realizes high-quality 1-step and few-step generation without the pipeline bloat.

Instead of relying on external discriminators or frozen teachers, TwinFlow creates an internal "twin trajectory". By extending the time interval to $t \in [-1, 1]$ , we utilize the negative time branch to map noise to "fake" data, creating a self-adversarial signal directly within the model.

Then, the model can rectify itself by minimizing the difference of the velocity fields between real trajectory and fake trajectory, i.e. the $\Delta_\mathrm{v}$ . The rectification performs distribution matching as velocity matching, which gradually transforms the model into a 1-step/few-step generator.

Key Advantages:

One-model Simplicity. We eliminate the need for any auxiliary networks. The model learns to rectify its own flow field, acting as the generator, fake/real score. No extra GPU memory is wasted on frozen teachers or discriminators during training.
Scalability on Large Models. TwinFlow is easy to scale on 20B full-parameter training due to the one-model simplicity. In contrast, methods like VSD, SiD, and DMD/DMD2 require maintaining three separate models for distillation, which not only significantly increases memory consumption—often leading OOM, but also introduces substantial complexity when scaling to large-scale training regimes.

Scalability: Unlocking Full-Parameter Training on Qwen-Image-20B

The true power of TwinFlow's "One-model" efficiency is demonstrated by its scalability. Prior methods like VSD/SiD/DMD, DMD2 is hard to scale to massive models because loading multiple model (Fake score + Teacher (Real score) + Generator, Discriminator (Optional)) causes OOM.

TwinFlow is the first framework to successfully enable full-parameter 1-step/few-step training on the massive Qwen-Image-20B model, reducing inference cost by nearly 100× while maintaining strong generation quality.

Method	Scale	Memory Tradeoffs	NFE ⬇️	GenEval ⬆️	DPG-Bench ⬆️	WISE ⬆️
Qwen-Image (Original)	20B	-	50×2	0.87	88.32	0.62
VSD / DMD / SiD	Generator (20B), Teacher (20B), Fake score (20B)	OOM (>80GB)	-	-	-	-
VSD	Generator (20B), Teacher (20B), Fake score (LoRA, ~420M)	Fake score as LoRA & Small batch size	1	0.67	84.44	0.22
SiD	Generator (20B), Teacher (20B), Fake score (LoRA, ~420M)	Fake score as LoRA & Small batch size	1	0.77	87.05	0.42
DMD	Generator (20B), Teacher (20B), Fake score (LoRA, ~420M)	Fake score as LoRA & Small batch size	1	0.81	84.31	0.47
sCM (JVP-free)	20B	JVP through finite differences / Special JVP kernels	8	0.60	85.54	0.45
MeanFlow (JVP-free)	20B	JVP through finite differences / Special JVP kernels	8	0.49	83.81	0.37
TwinFlow	20B	No tradeoffs	1	0.85	85.44	0.51
TwinFlow	20B	No tradeoffs	2	0.86	86.35	0.55
TwinFlow (longer training)	20B	No tradeoffs	1	0.89	87.54	0.57
TwinFlow (longer training)	20B	No tradeoffs	2	0.90	87.80	0.59

Citation

@article{cheng2025twinflow,
  title={TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows},
  author={Cheng, Zhenglin and Sun, Peng and Li, Jianguo and Lin, Tao},
  journal={arXiv preprint arXiv:2512.05150},
  year={2025}
}

Relevant Projects

arXiv'25

RCGM: Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation

Peng Sun, Tao Lin

arXiv'25

UCGM: Unified Continuous Generative Models

Peng Sun, Yi Jiang, Tao Lin