Reinforcement Learning

Scholar Frank L. Lewis from UTA has been identified as a leading authority in the fields of reinforcement learning and optimal control, having been ranked first in both areas by ScholarGPS.

📜Certificate in Machine Learning | Stanford University (Coursera)📜Cadence Cerebrus Online Course Certificate

I have explored several publications on the mathematical principles of reinforcement learning and conducted a thorough analysis of their foundations, which I have documented in the posts below. There are several drafts related to mathematical derivations.

  1. Reinforcement Learning with Soft State Aggregation, Satinder P. Singh, Tommi Jaakkola, Micheal I. Jordan, MIT. Advances in neural information processing systems, vol. 7, 1994.
  2. Policy Gradient Methods for Reinforcement Learning with Function Approximation, Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932. Advances in neural information processing systems, vol. 12, 1999.
  3. Actor-Critic Algorithms, Vijay R. Konda, John N. Tsitsitklis, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139. Advances in neural information processing systems, vol. 12, 1999.
  4. Hierarchical Actor-Critic, Andrew Levy1 , Robert Platt2 , Kate Saenko1 , 1Department of Computer Science, Boston University, Boston, MA, USA, 2College of Information and Computer Science, Northeastern University, Boston, MA, USA. arXiv preprint arXiv:1712.00948, vol. 12, p. 438, 2017.
  5. Hierarchical Policy Gradient Algorithms, Mohammad Ghavamzadeh, Sridhar Mahadevan, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 01003-4610, USA. in Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 226-233.
  6. Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Approach, Derong Liu, Fellow, IEEE, Ding Wang, and Hongliang Li. IEEE transactions on neural networks and learning systems, vol. 25, pp. 418-428, 2013.
  7. Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics, Derong Liu, Chao Li, Hongliang Li, Ding Wang, Hongwen Ma, Neurocomputing 165 90-98 2015. Neurocomputing, vol. 165, pp. 90-98, 2015.
  8. Reinforcement Learning is Direct Adaptive Optimal Control, Richard S. Sutton, Andrew G. Barto, and Ronald J. Williams, IEEE control systems magazine, vol. 12, pp. 19-22, 1992.
  9. Decentralized Optimal Control of Distributed Interdependent Automata With Priority Structure, Olaf Stursberg, Member, IEEE, and Christian Hillmann, IEEE Transactions on Automation Science and Engineering, vol. 14, pp. 785-796, 2017.
  10. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, Tejas D Kulkarni, DeepMind, London, Karthik R. Narasimhan, CSAIL, MIT, Ardavan Saeedi, CSAIL, MIT, Joshua B. Tenenbaum, BCS, MIT. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. Advances in neural information processing systems, vol. 29, 2016.
  11. Meta Learning Shared Hierarchies, Kevin Frans, Henry M. Gunn High School, work done as an intern at OpenAI, Jonathan Ho, Xin Chen, Pieter Abbeel, UC Berkeley, Department of Electrical Engineering and Computer Science, John Schulman, OpenAI. ICLR 2018. arXiv preprint arXiv:1710.09767, 2017.
  12. Actor-critic Algorithm for Hierarchical Markov Decision Processes, Shalabh Bhatnagar, Department of Computer Science and Automation,  Indian Institute of Science, Bangalore, India, J. Ranjan Panigrahi, SoftJin Technologies Private Limited, India. 2005. Automatica, vol. 42, pp. 637-644, 2006.
  13. Anlysis II: Metric Spaces, Continuous functions on metric spaces, Uniform convergence. Terence Tao, UCLA.
  14. Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations. Dimitri P. Bertsekas, MIT. IEEE/CAA Journal of Automatica Sinica, vol. 6, pp. 1-31, 2018.
  15. Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion, J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng, Department of Computer Science, Stanford University. Advances in neural information processing systems, vol. 20, 2007.
  16. The Asymptotic Convergence-Rate of Q-learning, Cs. Szepesvari, Research Group on Artificial Intelligence, “Jozsef Attila” University, Szeged, Aradi vrt. tere 1, Hungary, H-6720. 1998. Advances in neural information processing systems, vol. 10, 1997.
  17. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time, Mengdi Wang, Department of Operations Research and Financial Engineering, Princeton University, 2017. Mathematics of Operations Research, vol. 45, pp. 517-546, 2020.
  18. Solving H-horizon, Stationary Markov Decision Problems In Time Proportional To Log(H), Paul Tseng, Laboratory for Information and Decision Systems, MIT. Operations Research Letters, vol. 9, pp. 287-297, 1990.
  19. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms, Michael Kearns and Satinder Singh, AT&T Labs, 180 Park Avenue, Florham Park, NJ 07932. Advances in neural information processing systems, vol. 11, 1998.

RL other useful reference

Athena Scientific is a small publisher specializing in textbooks written by professors at the Massachusetts Institute of Technology and used in their courses.
Special discount: Order directly from Athena Scientific electronically, by email, by mail, or by fax, three or more different titles (i.e., ISBN numbers) in a single order, and you will receive an automatic discount of 10% from the list prices.

  1. Neuro-Dynamic Programming, Dimitri Bertsekas, John N. Tsitsiklis. Publisher: Athena Scientific; 1 edition (May 1, 1996). ISBN: 1-886529-10-8 Publication: September 1996, 512 pages, hardcover.
  2. Reinforcement Learning and Optimal Control, Dimitri Bertsekas. Publisher: Athena Scientific. ISBN: 978-1-886529-39-7 Publication: 2019, 388 pages, hardcover.
  3. Stochastic Optimal Control: The Discrete-Time Case, Dimitri Bertsekas and Steven E. Shreve. Publisher: Athena Scientific. ISBN: 1-886529-03-5 Publication: 1996, 330 pages, softcover.
  4. Dynamic Programming and Optimal Control, Dimitri Bertsekas. Publisher: Athena Scientific; ISBNs: 1-886529-43-4 (Vol. I, 4th Edition), 1-886529-44-2 (Vol. II, 4th Edition), 1-886529-08-6 (Two-Volume Set, i.e., Vol. I, 4th ed. and Vol. II, 4th edition). Vol. I, 4TH EDITION, 2017, 576 pages, hardcover. Vol. II, 4TH EDITION: APPROXIMATE DYNAMIC PROGRAMMING 2012, 712 pages, hardcover.

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto. ISBN: 978-0-262-19398-6. 2nd edition 2018.

Sidebar



×

Learn more