Yafu Li (李雅夫)

I am a Postdoctoral Researcher at The Chinese University of Hong Kong, working with Prof. Yu Cheng, and a Researcher at Shanghai AI Laboratory. My research focuses on building strong reasoning systems via reinforcement learning and test-time scaling:

System: SU-01, a 30B-AB model achieving gold-medal-level performance on IMO 2025/USAMO 2026 and IPhO 2024/2025 with a simple and unified recipe (Technical Report, 2026).

Reinforcement Learning:

Policy: LUFFY (NeurIPS 2025); ExGRPO (ICLR 2026)
Reward: TRM (ICML 2026 Oral); DIVER (ICLR 2026); CANON (ICLR 2026)

Test-Time Scaling: TPO (ICML 2025); AFT (ACL 2026); MoSA (ACL 2026)

Evaluation: SpecBench (ICML 2026); MathIF (ACL 2026); π-Bench (Technical Report, 2026)

I received my Ph.D. in Computer Science through a joint program between Zhejiang University and Westlake University, advised by Prof. Yue Zhang. Prior to that, I earned my M.Sc. in Artificial Intelligence from the University of Edinburgh under the supervision of Prof. Alex Lascarides, and my B.Eng. from Wuhan University. During my Ph.D., I interned at Tencent AI Lab, working on natural language generation and trustworthy AI.

Talks & Lectures

April 7, 2026: Lecture at The Chinese University of Hong Kong, An Introduction of Reasoning and GRPO
October 27, 2025: Invited lecture at Southern University of Science and Technology, On the Evolution of Reasoning Abilities in Large Language Models
September 4, 2025: Invited lecture at Tencent, Evolving Reasoning Abilities of LLMs: RLVR, Off-Policy Learning, and Test-Time Reinforcement Learning
August 12, 2025: Invited speaker at CCL 2025 Forum on Large Model Reasoning and Reinforcement Learning

Service

Area Chair: ACL 2025; EMNLP 2025; ACL 2026; EMNLP 2026; NeurIPS 2026; ACL ARR (February 2025, May 2025, January 2026, March 2026, May 2026)
Conference Reviewer: ACL, EMNLP, COLING, ACL ARR, IJCAI, NeurIPS, ICLR, ICML, CVPR
Journal Reviewer: TMLR, JAIR, TACL, TASLP, TBD, TALLIP