I’m currently a 2nd-year Ph.D candidate at REAL LAB, Zhejiang University, advised by Yongliang Shen. Prior to this, I earned my B.E degree from Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院) at 2024.

My research interests focus on AI Agents and LLM Post Training (RL included). My earlier work in 2025 focused on RL for GUI Agents, and my current research investigates Post-Training techniques for General Agents, including agent skills, on-policy distillation (OPD) and reinforcement learning (RL).

📢 I’m actively seeking research-internship opportunities in industry on the topics above. Feel free to reach out if there might be a fit.

🐈 Our lab is also recruiting remote / on-site interns — undergraduate and Master’s students are warmly welcomed! See Join.

🔥 News

  • 2026.05:  🔥🔥 Our new work SDAR was released, featured as 🤗 HF Daily Paper #2!
  • 2026.05:  🔥🔥 Our new work SKILL1 was released, featured as 🤗 HF Daily Paper #2!
  • 2026.04:  🎉🎉 Four papers were accepted by ACL 2026, see you in San Diego, US.
  • 2026.04:  🔥🔥 Our new work SKILL0 was released, featured as 🤗 HF Daily Paper #2!
  • 2026.02:  🎉🎉 One paper was accepted by CVPR 2026.
  • 2025.11:  🎉🎉 Three papers were accepted by AAAI 2026.

📝 Publications

🤖 Agentic RL

Preprint
sym

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

[Paper] |

  • We propose an in-context agentic RL framework that internalizes external tool-use skills into the policy itself, enabling agents to retain reusable behaviors across tasks without repeated demonstrations.
Preprint
sym

SDAR: Self-Distilled Agentic Reinforcement Learning

Zhengxi Lu, Zhiyuan Yao, Zhuowen Han, Zi-Han Wang, Jinyang Wu, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

[Paper] |

  • A self-distillation pipeline that lets an agent improve through its own high-reward trajectories, bridging on-policy distillation and RL to stabilize long-horizon multi-step training.
Preprint
sym

SKILL1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Yaorui Shi, Yuxin Chen, Zhengxi Lu, Yuchun Miao, Shugui Liu, Qi Gu, Xunliang Cai, Xiang Wang, An Zhang

[Paper] |

  • Jointly evolves the agent policy and its skill library through RL, allowing newly discovered skills and the controller to co-adapt instead of being optimized in isolation.

📱 MLLM Agents

AAAI 2026
sym

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Han Xiao, Shuai Ren, Guanjing Xiong, Hongsheng Li

[Paper] |

  • The first work to apply rule-based reinforcement learning to GUI action prediction, improving the data efficiency and grounding accuracy of MLLM-based GUI agents.
ACL 2026
sym

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

Zhengxi Lu, Jiabo Ye, Fei Tang, Yongliang Shen, Haiyang Xu, Ziwei Zheng, Weiming Lu, Ming Yan, Fei Huang, Jun Xiao, Yueting Zhuang

[Paper] |

  • A semi-online RL paradigm that mixes offline trajectories with on-policy rollouts to combine the stability of imitation with the exploration benefits of online RL for GUI agents.
Tech Report
sym

Mobile-Agent-v3: Fundamental Agents for GUI Automation

Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, Jitong Liao, Qi Zheng, Fei Huang, Jingren Zhou, Ming Yan

[Paper] |

  • A foundation-agent framework for mobile GUI automation that unifies perception, planning, and execution roles, achieving strong performance across long-horizon real-device tasks.
AAAI 2026
sym

GUI-G²: Gaussian Reward Modeling for GUI Grounding

Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang

[Paper] |

  • Replaces binary hit/miss rewards with a Gaussian reward field over click coordinates, providing smoother gradients and substantially improving GUI grounding accuracy under RL.
ACL 2026
sym

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Zhengxi Lu, Fei Tang, Guangyi Liu, Kaitao Song, Xu Tan, Jin Ma, Wenqi Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

[Paper] |

  • Tool-integrated policy optimization that lets GUI agents call auxiliary tools mid-trajectory, extending effective horizon and credit assignment for long, multi-screen workflows.

🎨 Multimodal AI

🎖 Honors and Awards

  • Second-Class Scholarship of Zhejiang University, 2021, 2022, 2023.

📖 Educations

  • 2020.09 - 2024.06: B.E student at Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院).
  • 2024.09 - now: Ph.D candidate at REAL Lab, Zhejiang University.

💬 Misc

  • Invited Talks:
    • 2026.5.24: I gave a talk about skills invited by ZJU AI Talk. Link.
  • Reviewers:
    • 2025: ACMMM 2025, AAAI 2026, ICLR 2026.
    • 2026: CVPR 2026, ECCV 2026, Nuerips 2026.

💻 Internships