About Me

May oneko

lead you to my latest work!

Hello! I am Zhenglin Cheng, a Ph.D. student of LINs lab, Westlake University (through joint program with ZJU), advised by Prof. Tao LIN. I am also honored to be affiliated with Shanghai Innovation Institute (SII), a new force in the GenAI era. Before that, I received my bachelor’s degree in Software Engineering from Zhejiang University (ZJU).

I love to write and post something (from technical notes to life stuff). I also practice Chinese traditional calligraphy to relax occasionally.

Research Interests

My long-term research goal is to build multimodal models and agents that can understand the physical world, reason on any-type problems, and create novel cotents, which could also learn from experience and evolve themselves in the constantly changing environment.

Looking at the present, I put my focus on:

Unified multimodal autoregressive models such as Emu-3, VILA-U, Transfusion, Janus-series, etc.
LLM reasoning in the deep thinking era, such as ChatGPT o-series and Deepseek-R1.

Publications/Manuscripts (* denotes equal contribution)

ICLR'25

📖 Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo*, Zhenglin Cheng*, Xiaoying Tang, Zhaopeng Tu, Tao Lin

👉 DynMoE frees the burden of pivotal hyper-parameter selection for MoE training by enabling each token to activate different number of experts, and adjusting the number of experts automatically, acheiving stronger sparsity well maintaining performance.

arXiv'24

📖 GMem: A Modular Approach for Ultra-Efficient Generative Models

Yi Tang*, Peng Sun*, Zhenglin Cheng*, Tao Lin

👉 GMem decouples diffusion modeling by network for generalization and external memory bank for memorization, achieving 50× training speedup compared to SiT, 25× speed up to REPA.

EMNLP'24 (Main)

📖 Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model (inherited from my undergrad thesis)

Wenqi Zhang*, Zhenglin Cheng*, Yuanyu He, Mengna Wang, … , Weiming Lu, Yueting Zhuang

👉 Multimodal self-instruct utilizes LLMs and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios such as charts, graphs, visual puzzles, etc.

News

2025/01: 🥳 Dynamic Mixture of Experts (DynMoE) is accepted to ICLR’25, see you in Singapore 🇸🇬 !

Educations

2024/09 - 2029/06, Westlake University, College of Engineering.
2020/09 - 2024/06, Zhejiang University, College of Computer Science and Technology.