TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Multi-step (50×2 NFEs)
Few-step (2 NFEs)

2-NFE visualization of TwinFlow-Qwen-Image.
Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency, requiring 40-100 Number of Function Evaluations (NFEs). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need for fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100× with minor quality degradation.
Recent large multi-modal models have achieved stunning generative capabilities, but they come at a steep cost: inference efficiency. Standard diffusion and flow-matching models typically require 50-100 NFEs to generate a single image.
While researchers have raced to solve this with few-step distillation, existing solutions force a compromise between complexity, stability, and quality. As shown below, current SoTA methods rely heavily on "baggage"—auxiliary discriminators/fake scores or frozen teacher models, which inflate memory costs and destabilize training.
| Method | Generation Type | Requires Auxiliary Trained Model? (e.g., Discriminator, fake score) | Requires Frozen Teacher Model? | Major Drawback |
|---|---|---|---|---|
| GANs | 1-step | 1 | 0 | Unstable training dynamics |
| Consistency Models | 1-step / Few-step | 0 | 1 | Quality degradation at < 4 steps |
| DMD / DMD2 | 1-step / Few-step | 1-2 | 1 | High GPU Memory, Complex pipeline |
| TwinFlow (Ours) | 1-step / Few-step | 0 | 0 | - |

We introduce TwinFlow, a framework that realizes high-quality 1-step and few-step generation without the pipeline bloat.
Instead of relying on external discriminators or frozen teachers, TwinFlow creates an internal "twin trajectory". By extending the time interval to t∈[−1,1], we utilize the negative time branch to map noise to "fake" data, creating a self-adversarial signal directly within the model.
Then, the model can rectify itself by minimizing the difference of the velocity fields between real trajectory and fake trajectory, i.e. the Δv. The rectification performs distribution matching as velocity matching, which gradually transforms the model into a 1-step/few-step generator.
Key Advantages:
The true power of TwinFlow's "One-model" efficiency is demonstrated by its scalability. Prior methods like VSD/SiD/DMD, DMD2 is hard to scale to massive models because loading multiple model (Fake score + Teacher (Real score) + Generator, Discriminator (Optional)) causes OOM.
TwinFlow is the first framework to successfully enable full-parameter 1-step/few-step training on the massive Qwen-Image-20B model, reducing inference cost by nearly 100× while maintaining strong generation quality.
| Method | Scale | Memory Tradeoffs | NFE ⬇️ | GenEval ⬆️ | DPG-Bench ⬆️ | WISE ⬆️ |
|---|---|---|---|---|---|---|
| Qwen-Image (Original) | 20B | - | 50×2 | 0.87 | 88.32 | 0.62 |
| VSD / DMD / SiD | Generator (20B), Teacher (20B), Fake score (20B) | OOM (>80GB) | - | - | - | - |
| VSD | Generator (20B), Teacher (20B), Fake score (LoRA, ~420M) | Fake score as LoRA & Small batch size | 1 | 0.67 | 84.44 | 0.22 |
| SiD | Generator (20B), Teacher (20B), Fake score (LoRA, ~420M) | Fake score as LoRA & Small batch size | 1 | 0.77 | 87.05 | 0.42 |
| DMD | Generator (20B), Teacher (20B), Fake score (LoRA, ~420M) | Fake score as LoRA & Small batch size | 1 | 0.81 | 84.31 | 0.47 |
| sCM (JVP-free) | 20B | JVP through finite differences / Special JVP kernels | 8 | 0.60 | 85.54 | 0.45 |
| MeanFlow (JVP-free) | 20B | JVP through finite differences / Special JVP kernels | 8 | 0.49 | 83.81 | 0.37 |
| TwinFlow | 20B | No tradeoffs | 1 | 0.85 | 85.44 | 0.51 |
| TwinFlow | 20B | No tradeoffs | 2 | 0.86 | 86.35 | 0.55 |
| TwinFlow (longer training) | 20B | No tradeoffs | 1 | 0.89 | 87.54 | 0.57 |
| TwinFlow (longer training) | 20B | No tradeoffs | 2 | 0.90 | 87.80 | 0.59 |
@article{cheng2025twinflow,
title={TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows},
author={Cheng, Zhenglin and Sun, Peng and Li, Jianguo and Lin, Tao},
journal={arXiv preprint arXiv:2512.05150},
year={2025}
}