Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
ICML, 2025
Aligning LLMs at test time via textual reward.
Postdoctoral Researcher, The Chinese University of Hong Kong
I am a Postdoctoral Researcher at The Chinese University of Hong Kong, working with Prof. Yu Cheng, and a Researcher at Shanghai AI Laboratory. My research focuses on scalable reasoning and alignment for large language models, spanning reinforcement learning with verifiable rewards, off-policy training, inference-time optimization, and rigorous evaluation.
I received my Ph.D. in Computer Science through a joint program between Zhejiang University and Westlake University, advised by Prof. Yue Zhang. Prior to that, I earned my M.Sc. in Artificial Intelligence from the University of Edinburgh under the supervision of Prof. Alex Lascarides, and my B.Eng. from Wuhan University.
My research focuses on reasoning, trustworthy AI, and multilinguality. * denotes equal contribution; † denotes project lead or corresponding author.
ICML, 2025
Aligning LLMs at test time via textual reward.
NeurIPS, 2025
Boosting reasoning performance using off-policy guidance.
ICLR, 2026
Boosting reasoning performance with the model's own experience.
ICLR, 2026
Encouraging global exploration.
ICLR, 2026
Digging implicit signals via group comparison.
preprint
A survey of efficient reasoning for Large Reasoning Models.
preprint
A framework for robust and scalable evaluation for complex reasoning.
ACL, 2026
A tension between scaling up reasoning capacity and maintaining controllability.
ACL Findings, 2026
Training models to aggregate multiple responses.
ACL Findings, 2026
Aggregate multiple agents for complex reasoning.
preprint
Reasoning over safety and behavioural boundaries before answering.
ACL, 2025
Unveiling attractor cycles in LLMs through dynamical systems analysis.
ACL, 2024
Assessing machine-generated text detectors in real-world scenarios.
Computational Linguistics
A survey of hallucination in LLMs.
ACL, 2025
A systematic study of the origin of translationese in LLMs and mitigation methods.
ACL, 2023; Best Paper Nomination
A neural-symbolic method that guides generation with rules.
EMNLP, 2022
Optimizing non-autoregressive translation with multi-granularity policy gradient.
ACL, 2021; Oral
Neural machine translation suffers poor compositionality.
I closely supervise and mentor research interns and students. Selected mentees and representative publications include:
We are looking for full-time researchers, research interns, and joint PhD students with THU, PKU, SJTU, FDU, and other universities to work on cutting-edge research in large language models.
ACL 2025; EMNLP 2025; ACL 2026; ACL ARR (February 2025, May 2025, January 2026, March 2026).
ACL, EMNLP, COLING, ACL ARR, IJCAI, NeurIPS, ICLR, ICML, CVPR.
TMLR, JAIR, TACL, TASLP, TBD, TALLIP.