Zu Chongzhi Mathematics Research Seminar
Date and Time (China standard time): Tuesday, June 25, 2:00 – 3:00 pm
Zoom: 953 6918 3849, Passcode: dkumath
Title: PhiBE: A PDE-based Bellman Equation for Continuous Time Reinforcement Learning
Speaker: Yuhua Zhu
Abstract:
In this talk, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively conduct policy evaluation? We first highlight that the commonly used Bellman equation (BE) is not always a reliable approximation to the true value function. We then introduce a new bellman equation, PhiBE, which integrates the discrete-time information into a PDE formulation. The new bellman equation offers a more accurate approximation to the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations. We conduct the error analysis for both BE and PhiBE with explicit dependence on the discounted coefficient, the reward and the dynamics. Additionally, we present a model-free algorithm to solve PhiBE when only discrete-time trajectory data is available. Numerical experiments are provided to validate the theoretical guarantees we propose.
Bio:
Yuhua Zhu is an assistant professor at Department of Statistics and Data Science of UC Los Angeles. Previously, she was an Assistant Professor at UCSD where she held a joint where she holds a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. She received her Ph.D. from UW-Madison under the supervision of Shi Jin, and later she was a Postdoctoral Fellow at Stanford University, under the supervision of Lexing Ying. Her work builds the bridge between differential equations and machine learning, spanning the areas of reinforcement learning, stochastic optimization, sequential decision-making, and uncertainty quantification.