About Me

May oneko lead you to my latest work!

Hello! I am Zhenglin Cheng, a Ph.D. student of LINs lab, Westlake University (through joint program with ZJU), advised by Prof. Tao LIN. I am also honored to be affiliated with Shanghai Innovation Institute (SII), a new force in the GenAI era. Before that, I received my bachelorโ€™s degree in Software Engineering from Zhejiang University (ZJU).

I love to write and post something (from technical notes to life stuff). I also practice Chinese traditional calligraphy to relax occasionally.

Research Interests

My long-term research goal is to build multimodal models and agents that can understand the physical world, reason on any-type problems, and create novel cotents, which could also learn from experience and evolve themselves in the constantly changing environment.

Looking at the present, I put my focus on:

  • Unified multimodal autoregressive models such as Emu-3, VILA-U, Transfusion, Janus-series, etc.
  • LLM reasoning in the deep thinking era, such as ChatGPT o-series and Deepseek-R1.

Publications/Manuscripts (* denotes equal contribution)

ICLR'25
sym

๐Ÿ“– Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo*, Zhenglin Cheng*, Xiaoying Tang, Zhaopeng Tu, Tao Lin

GitHub Repo stars HF Checkpoints

๐Ÿ‘‰ DynMoE frees the burden of pivotal hyper-parameter selection for MoE training by enabling each token to activate different number of experts, and adjusting the number of experts automatically, acheiving stronger sparsity well maintaining performance.

arXiv'24
sym

๐Ÿ“– GMem: A Modular Approach for Ultra-Efficient Generative Models

Yi Tang*, Peng Sun*, Zhenglin Cheng*, Tao Lin

GitHub Repo stars HF Checkpoints

๐Ÿ‘‰ GMem decouples diffusion modeling by network for generalization and external memory bank for memorization, achieving 50ร— training speedup compared to SiT, 25ร— speed up to REPA.

EMNLP'24 (Main)
sym

๐Ÿ“– Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model (inherited from my undergrad thesis)

Wenqi Zhang*, Zhenglin Cheng*, Yuanyu He, Mengna Wang, โ€ฆ , Weiming Lu, Yueting Zhuang

Project GitHub Repo stars HF Datasets HF Datasets

๐Ÿ‘‰ Multimodal self-instruct utilizes LLMs and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios such as charts, graphs, visual puzzles, etc.

News

Educations

  • 2024/09 - 2029/06, Westlake University, College of Engineering.
  • 2020/09 - 2024/06, Zhejiang University, College of Computer Science and Technology.