应兰州大学数学与统计学院李朋副教授邀请,复旦大学大数据学院魏轲教授, 将于2025年5
月22号-24号访问兰州大学,并且于5月23号(星期五)下午14:00-15:00线下举办学术报告.
报告题目:Elementary Analysis of Policy Gradient Methods
报告地点:城关校区凌云楼6楼631
报告时间:2025年5月23日(星期五) 14:00-15:00
报告摘要:Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent activities in studying these algorithms from the theoretical aspect. Despite this, their convergence behavior is still not fully understood, even given the access to exact policy evaluations. In this talk, we give a systematic study of the aforementioned policy optimization methods. Several novel results are presented, including 1) Sublinear and finite iteration convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing result, 5) tight local linear convergence rate of entropy regularized natural policy gradient, and 6) a new and concise local quadratic convergence rate of soft policy iteration without the assumption on the stationary distribution under the optimal policy. New and elementary analysis techniques have been developed to establish these results.
报告人简介
魏轲,复旦大学大数据学院教授,博士生导师.2014年获得牛津大学博士学位,之后在香港科技大学(2014-2015)和加州大学戴维斯分校(2015-2017)从事博士后工作.目前主要研究兴趣包括高维信号与数据处理,强化学习算法与理论;研究成果发表于ACHA,SIAM系列,IEEE TIT,MP,JMLR等领域内权威期刊.
数学与统计学院
甘肃应用数学中心
萃英学院
2025年5月22日