TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

1Inclusion AI2Shanghai Innovation Institute3Westlake University4Zhejiang University*Equal contribution.ㅤㅤㅤㅤㅤㅤㅤㅤ†Corresponding authors.

Generation Speed Comparison (1328×1328)

OriginalvsOurs (Accelerated)
NFE

Qwen-Image 🦥

Multi-step (50×2 NFEs)

0.0s
0/4 images~36.55s / img

TwinFlow-Qwen-Image 🚀🚀🚀

Few-step (2 NFEs)

0.0s
0/4 images~1.21s / img

2-NFE visualization of TwinFlow-Qwen-Image.

Abstract

Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency, requiring 40-100 Number of Function Evaluations (NFEs). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need for fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100× with minor quality degradation.

News

  • We release TwinFlow-Qwen-Image-v1.0! And we are also working on Z-Image-Turbo to make it more faster!

Overview

The Trilemma of Few-Step Generation

Recent large multi-modal models have achieved stunning generative capabilities, but they come at a steep cost: inference efficiency. Standard diffusion and flow-matching models typically require 50-100 NFEs to generate a single image.

While researchers have raced to solve this with few-step distillation, existing solutions force a compromise between complexity, stability, and quality. As shown below, current SoTA methods rely heavily on "baggage"—auxiliary discriminators/fake scores or frozen teacher models, which inflate memory costs and destabilize training.

MethodGeneration TypeRequires Auxiliary Trained Model? (e.g., Discriminator, fake score)Requires Frozen Teacher Model?Major Drawback
GANs1-step10Unstable training dynamics
Consistency Models1-step / Few-step01Quality degradation at < 4 steps
DMD / DMD21-step / Few-step1-21High GPU Memory, Complex pipeline
TwinFlow (Ours)1-step / Few-step00-

TwinFlow: Simplicity via Self-Adversarial Flows

TwinFlow Architecture
The TwinFlow Architecture.

We introduce TwinFlow, a framework that realizes high-quality 1-step and few-step generation without the pipeline bloat.

Instead of relying on external discriminators or frozen teachers, TwinFlow creates an internal "twin trajectory". By extending the time interval to t[1,1]t \in [-1, 1], we utilize the negative time branch to map noise to "fake" data, creating a self-adversarial signal directly within the model.

Then, the model can rectify itself by minimizing the difference of the velocity fields between real trajectory and fake trajectory, i.e. the Δv\Delta_\mathrm{v}. The rectification performs distribution matching as velocity matching, which gradually transforms the model into a 1-step/few-step generator.

Key Advantages:

  • One-model Simplicity. We eliminate the need for any auxiliary networks. The model learns to rectify its own flow field, acting as the generator, fake/real score. No extra GPU memory is wasted on frozen teachers or discriminators during training.
  • Scalability on Large Models. TwinFlow is easy to scale on 20B full-parameter training due to the one-model simplicity. In contrast, methods like VSD, SiD, and DMD/DMD2 require maintaining three separate models for distillation, which not only significantly increases memory consumption—often leading OOM, but also introduces substantial complexity when scaling to large-scale training regimes.

Scalability: Unlocking Full-Parameter Training on Qwen-Image-20B

The true power of TwinFlow's "One-model" efficiency is demonstrated by its scalability. Prior methods like VSD/SiD/DMD, DMD2 is hard to scale to massive models because loading multiple model (Fake score + Teacher (Real score) + Generator, Discriminator (Optional)) causes OOM.

TwinFlow is the first framework to successfully enable full-parameter 1-step/few-step training on the massive Qwen-Image-20B model, reducing inference cost by nearly 100× while maintaining strong generation quality.

MethodScaleMemory TradeoffsNFE ⬇️GenEval ⬆️DPG-Bench ⬆️WISE ⬆️
Qwen-Image (Original)20B-50×20.8788.320.62
VSD / DMD / SiDGenerator (20B), Teacher (20B), Fake score (20B)OOM (>80GB)----
VSDGenerator (20B), Teacher (20B), Fake score (LoRA, ~420M)Fake score as LoRA & Small batch size10.6784.440.22
SiDGenerator (20B), Teacher (20B), Fake score (LoRA, ~420M)Fake score as LoRA & Small batch size10.7787.050.42
DMDGenerator (20B), Teacher (20B), Fake score (LoRA, ~420M)Fake score as LoRA & Small batch size10.8184.310.47
sCM (JVP-free)20BJVP through finite differences / Special JVP kernels80.6085.540.45
MeanFlow (JVP-free)20BJVP through finite differences / Special JVP kernels80.4983.810.37
TwinFlow20BNo tradeoffs10.8585.440.51
TwinFlow20BNo tradeoffs20.8686.350.55
TwinFlow (longer training)20BNo tradeoffs10.8987.540.57
TwinFlow (longer training)20BNo tradeoffs20.9087.800.59

Citation

@article{cheng2025twinflow,
  title={TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows},
  author={Cheng, Zhenglin and Sun, Peng and Li, Jianguo and Lin, Tao},
  journal={arXiv preprint arXiv:2512.05150},
  year={2025}
}