Sean Meyn: Zap Q-learning with Nonlinear Function Approximation
Abstract: Zap Q-learning is a recent class of reinforcement learning algorithms, initially motivated as a means to accelerate convergence. It is now understood that Zap Q-learning can be embedded in a larger theory of Zap stochastic approximation for root finding. Stability theory is based on ODE approximation, and in this case the ODE is the Newton-Raphson flow for which stability holds under minimal assumptions. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. It is found that the new algorithm converges quickly and is robust to choice of function approximation architecture.
References:
Shuhang Chen, Adithya M. Devraj, Fan Lu, Ana Busic and Sean Meyn. Zap Q-learning with nonlinear function approximation,
Advances in Neural Information Processing Systems and arXiv e-prints 1910.05405, pages 16879-16890, Vol. 33, 2020.
Sean Meyn. Control Systems and Reinforcement Learning. Cambridge University Press (to appear), Cambridge, 2021.
https://meyn.ece.ufl.edu/2021/08/01/control-systems-and-reinforcement-learning/
Bio: Sean Meyn is well known for his research on stochastic processes and their applications. His award winning monograph “Markov Chains and Stochastic Stability” with R.L. Tweedie is now a standard reference. In 2015 he and Prof. Ana Busic received a Google Research Award recognizing research on renewable energy integration. He is an IEEE Fellow, and IEEE Control Systems Society distinguished lecturer on topics related to both reinforcement learning and energy systems. Following 20 years as a professor of ECE at the University of Illinois, he joined the University of Florida where he is professor and holds the Robert C. Pittman Eminent Scholar Chair at the Department of Electrical and Computer Engineering.