|
Yafu Li (ζι
倫)
I am a postdoctoral researcher at The Chinese University of Hong Kong, working under the supervision of Prof. Yu Cheng. I earned my PhD through a joint program between Zhejiang University and Westlake University, advised by Prof. Yue Zhang.
I received my Bachelor's degree from Wuhan University, followed by a Master's degree from the University of Edinburgh, where I was supervised by Prof. Alex Lascarides.
During my PhD, I interned at Tencent AI Lab and collaborated closely with Dr. Leyang Cui and Dr. Wei Bi.
Email  / 
Google Scholar  / 
Twitter  / 
Github
|
|
|
Open Positions
We are looking for full-time researchers, research interns and joint PhD students (with THU, PKU, SJTU, FDU, etc.) to work on cutting-edge research in large language models.
|
|
Research Areas
My research focuses on reasoning, trustworthy AI and multilinguality. *: equal contributions. †: project lead or corresponding author.
|
|
ExGRPO: Learning to Reason from Experience
Runzhe Zhan, Yafu Li†, Zhi Wang, Xiaoye Qu, Dongrui Liu, Jing Shao, Derek F. Wong, Yu Cheng
preprint
Github
/
Paper
Boosting reasoning performance with the model's own experience.
|
|
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
Tingchen Fu, Jiawei Gu, Yafu Li†, Xiaoye Qu, Yu Cheng
preprint
Github
/
Paper
A tension between scaling up reasoning capacity and maintaining controllability.
|
|
Learning to Reason under Off-Policy Guidance
Jianhao Yan*, Yafu Li*†, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang
NeurIPS, 2025
Github
/
Paper
A RL framework to boost reasoning performance using off-policy guidance.
|
|
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu*†, Yafu Li*†, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng
preprint
Github
/
Paper
A survey of efficient reasoning for Large Reasoning Models.
|
|
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li,
Xuyang Hu,
Xiaoye Qu,
Linjie Li,
Yu Cheng
ICML, 2025
Github
/
Paper
Test-Time Preference Optimization (TPO) that aligns LLMs during inference via textual feedbacks.
|
|
Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
Zhilin Wang*,
Yafu Li*†,
Jianhao Yan,
Yu Cheng,
Yue Zhang
ACL, 2025
Paper
Unveiling attractor cycles in LLMs through dynamical systems analysis.
|
|
MAGE: Machine-generated Text Detection in the Wild
Yafu Li,
Qintong Li,
Leyang Cui,
Wei Bi,
Longyue Wang,
Linyi Yang,
Shuming Shi,
Yue Zhang
ACL, 2024
Github
/
Paper
Assessing the proficiency of machine-generated text detectors amidst real-world scenarios.
|
|
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang,
Yafu Li,
Leyang Cui,
Deng Cai,
Lemao Liu,
Tingchen Fu,
Xinting Huang,
Enbo Zhao,
Yu Zhang,
Yulong Chen,
Longyue Wang,
Anh Tuan Luu,
Wei Bi,
Freda Shi,
Shuming Shi
Computational Linguistics
Github
/
Paper
A survey of hallucination in LLMs.
|
|
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Yafu Li*,
Ronghao Zhang*,
Zhilin Wang,
Huajiang Zhang,
Leyang Cui,
Yongjing Yin,
Tong Xiao,
Yue Zhang
ACL, 2025
Github
/
Paper
A systematic study of the origin of translationese in LLMs and mitigation methods.
|
|
Explicit Syntactic Guidance for Neural Text Generation
Yafu Li,
Leyang Cui,
Jianhao Yan,
Yongjing Yin,
Wei Bi,
Shuming Shi,
Yue Zhang
ACL, 2023, Best Paper Nomination (1.6%)
Github
/
Paper
A neural symbolic method that guides generation with rules.
|
|
Multi-Granularity Optimization for Non-Autoregressive Translation
Yafu Li,
Leyang Cui,
Yongjing Yin,
Yue Zhang
EMNLP, 2022
Github
/
Paper
Optimizing non-autoregressive translation with multi-granularity policy gradient.
|
|
On Compositional Generalization of Neural Machine Translation
Yafu Li,
Yongjing Yin,
Yulong Chen,
Yue Zhang
ACL, 2021
Github
/
Paper
Neural machine translation suffers poor compositionality.
|
|
Talks & Lectures
September 4, 2025 β Invited lecture at Tencent: Evolving Reasoning Abilities of LLMs: RLVR, Off-Policy Learning, and Test-Time Reinforcement Learning
August 12, 2025 β Invited speaker at CCL 2025 Forum on Large Model Reasoning and Reinforcement Learning
|
|
Service
Area Chair: ACL 2025, EMNLP 2025
Conference Reviewer: ACL, EMNLP, COLING, ACL ARR, IJCAI, NeurIPS.
Journal Reviewer: TMLR, JAIR, TACL, TASLP, TBD, TALLIP.
|
|
Honor
Outstanding Student Scholarship (Silver medal, Tencent Rhino-Bird Elite Program, 2024).
National Scholarship (Ministry of Education, 2023).
Dean's Medal (Westlake University, 2023).
|
|