Publications
You can also find my articles on Google Scholar. * denotes equal contribution and † denotes project lead or corresponding author.
Reasoning and RL

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Yafu Li, Runzhe Zhan, Haoran Zhang, Shunkai Zhang, Yizhuo Li, Zhilin Wang, Jiacheng Chen, Futing Wang, Xuyang Hu, Yuchen Fan, Bangjie Xu, Yucheng Su, Xinmiao Han, Chenxi Li, Haodi Lei, Yufeng Zhao, Zejin Lin, Qianjia Cheng, Tong Zhu, Xiaoye Qu, Ganqu Cui, Peng Ye, Yun Luo, Zhouchen Lin, Yu Qiao, Bowen Zhou, Ning Ding, Yu Cheng
Technical Report, 2026
SU-01, a 30B-AB model achieving gold-medal-level performance on IMO 2025/USAMO 2026 and IPhO 2024/2025 with a simple and unified recipe.

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng
ICML, 2025
Aligning LLMs at test time via textual reward.

Learning to Reason under Off-Policy Guidance
Jianhao Yan*, Yafu Li*†, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang
NeurIPS, 2025
Boosting reasoning performance using off-policy guidance.

ExGRPO: Learning to Reason from Experience
Runzhe Zhan, Yafu Li†, Zhi Wang, Xiaoye Qu, Dongrui Liu, Jing Shao, Derek F. Wong, Yu Cheng
ICLR, 2026
Boosting reasoning performance with the model's own experience.

Diversity-Incentivized Exploration for Versatile Reasoning
Zican Hu, Shilin Zhang, Yafu Li†, Jianhao Yan, Xuyang Hu, Leyang Cui, Xiaoye Qu, Chunlin Chen, Yu Cheng, Zhi Wang
ICLR, 2026
Encouraging global exploration.

Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Guanxu Chen, Yafu Li†, Yuxian Jiang, Chen Qian, Qihan Ren, Jingyi Yang, Yu Cheng, Dongrui Liu, Jing Shao
ICLR, 2026
Digging implicit signals via group comparison.

Characterizing, Evaluating, and Optimizing Complex Reasoning
Haoran Zhang, Yafu Li†, Zhi Wang, Zhilin Wang, Shunkai Zhang, Xiaoye Qu, Yu Cheng
ICML 2026
A framework for robust and scalable evaluation for complex reasoning.

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Deliberation
Haoran Zhang, Yafu Li†, Xuyang Hu, Dongrui Liu, Zhilin Wang, Bo Li, Yu Cheng
ICML 2026
Reasoning over safety and behavioral boundaries before answering.

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
Tingchen Fu, Yafu Li†, Jiawei Gu, Xiaoye Qu, Yu Cheng
ACL 2026
A tension between scaling up reasoning capacity and maintaining controllability.

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu*†, Yafu Li*†, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng
preprint
A survey of efficient reasoning for large reasoning models.

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
Yafu Li*, Zhilin Wang*, Tingchen Fu, Ganqu Cui, Sen Yang, Yu Cheng
ACL 2026 Findings
Training models to aggregate multiple responses.

Multi-LLM Collaborative Search for Complex Problem Solving
Sen Yang, Yafu Li†, Wai Lam, Yu Cheng
ACL 2026 Findings
Aggregate multiple agents for complex reasoning.
Trustworthy AI

Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
Zhilin Wang*, Yafu Li*†, Jianhao Yan, Yu Cheng, Yue Zhang
ACL, 2025
Unveiling attractor cycles in LLMs through dynamical systems analysis.

MAGE: Machine-generated Text Detection in the Wild
Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang
ACL, 2024
Assessing the proficiency of machine-generated text detectors amidst real-world scenarios.

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi
Computational Linguistics
A survey of hallucination in LLMs.
Multilinguality

Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Yafu Li*, Ronghao Zhang*, Zhilin Wang, Huajiang Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang
ACL, 2025
A systematic study of the origin of translationese in LLMs and mitigation methods.

Explicit Syntactic Guidance for Neural Text Generation
Yafu Li, Leyang Cui, Jianhao Yan, Yongjing Yin, Wei Bi, Shuming Shi, Yue Zhang
ACL, 2023, Best Paper Nomination (1.6%)
A neural symbolic method that guides generation with rules.

Multi-Granularity Optimization for Non-Autoregressive Translation
Yafu Li, Leyang Cui, Yongjing Yin, Yue Zhang
EMNLP, 2022
Optimizing non-autoregressive translation with multi-granularity policy gradient.

On Compositional Generalization of Neural Machine Translation
Yafu Li, Yongjing Yin, Yulong Chen, Yue Zhang
ACL, 2021, Oral
Neural machine translation suffers poor compositionality.
