Reinforcement Learning
Math, Software and Application
Athena Scientific
Athena Scientific is a small publisher specializing in textbooks written by professors at the Massachusetts Institute of Technology and used in their courses.
http://www.athenasc.com/ordering.html
Special discount: Order directly from Athena Scientific electronically, by email, by mail, or by fax, three or more different titles (i.e., ISBN numbers) in a single order, and you will receive an automatic discount of 10% from the list prices.

NeuroDynamic Programming, Dimitri Bertsekas, John N. Tsitsiklis. Publisher: Athena Scientific; 1 edition (May 1, 1996). ISBN: 1886529108 Publication: September 1996, 512 pages, hardcover.

Reinforcement Learning and Optimal Control, Dimitri Bertsekas. Publisher: Athena Scientific. ISBN: 9781886529397 Publication: 2019, 388 pages, hardcover.

Stochastic Optimal Control: The DiscreteTime Case, Dimitri Bertsekas and Steven E. Shreve. Publisher: Athena Scientific. ISBN: 1886529035 Publication: 1996, 330 pages, softcover.

Dynamic Programming and Optimal Control, Dimitri Bertsekas. Publisher: Athena Scientific; ISBNs: 1886529434 (Vol. I, 4th Edition), 1886529442 (Vol. II, 4th Edition), 1886529086 (TwoVolume Set, i.e., Vol. I, 4th ed. and Vol. II, 4th edition) Vol. I, 4TH EDITION, 2017, 576 pages, hardcover Vol. II, 4TH EDITION: APPROXIMATE DYNAMIC PROGRAMMING 2012, 712 pages, hardcover.

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto. ISBN: 9780262193986. 2nd edition 2018.

Reinforcement Learning with Soft State Aggregation, Satinder P. Singh, Tommi Jaakkola, Micheal I. Jordan, MIT.

Policy Gradient Methods for Reinforcement Learning with Function Approximation, Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour AT&T Labs  Research, 180 Park Avenue, Florham Park, NJ 07932.

ActorCritic Algorithms, Vijay R. Konda, John N. Tsitsitklis, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139.

Hierarchical ActorCritic, Andrew Levy^{1} , Robert Platt^{2} , Kate Saenko^{1} , ^{1}Department of Computer Science, Boston University, Boston, MA, USA, ^{2}College of Information and Computer Science, Northeastern University, Boston, MA, USA.

Hierarchical Policy Gradient Algorithms, Mohammad Ghavamzadeh, Sridhar Mahadevan, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 010034610, USA. 20th International Conference on Machine Learning (ICML2003), Washington DC, 2003.

Decentralized Stabilization for a Class of ContinuousTime Nonlinear Interconnected Systems Using Online Learning Optimal Approach, Derong Liu, Fellow, IEEE, Ding Wang, and Hongliang Li. IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, No. 2, February 2014.

Neuralnetworkbased decentralized control of continuoustime nonlinear interconnected systems with unknown dynamics, Derong Liu, Chao Li, Hongliang Li, Ding Wang, Hongwen Ma, Neurocomputing 165 9098 2015.

Reinforcement Learning is Direct Adaptive Optimal Control, Richard S. Sutton, Andrew G. Barto, and Ronald J. Williams, IEEE Control Systems, April 1992.

Decentralized Optimal Control of Distributed Interdependent Automata With Priority Structure, Olaf Stursberg, Member, IEEE, and Christian Hillmann, IEEE Transaction on Automation Science and Engineering, Vol. 14, No. 2, April 2017.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, Tejas D Kulkarni, DeepMind, London, Karthik R. Narasimhan, CSAIL, MIT, Ardavan Saeedi, CSAIL, MIT, Joshua B. Tenenbaum, BCS, MIT. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

Meta Learning Shared Hierarchies, Kevin Frans, Henry M. Gunn High School, work done as an intern at OpenAI, Jonathan Ho, Xin Chen, Pieter Abbeel, UC Berkeley, Department of Electrical Engineering and Computer Science, John Schulman, OpenAI. ICLR 2018.

Actorcritic Algorithm for Hierarchical Markov Decision Processes, Shalabh Bhatnagar, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, J. Ranjan Panigrahi, SoftJin Technologies Private Limited, India. 2005.

Anlysis II: Metric Spaces, Continuous functions on metric spaces, Uniform convergence. Terence Tao, UCLA.

FeatureBased Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations. Dimitri P. Bertsekas, MIT

Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion, J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng, Department of Computer Science, Stanford University.

The Asymptotic ConvergenceRate of Qlearning, Cs. Szepesvari, Research Group on Artificial Intelligence, "Jozsef Attila" University, Szeged, Aradi vrt. tere 1, Hungary, H6720. 1998.

Randomized Linear Programming Solves the Discounted Markov Decision Problem In NearlyLinear (Sometimes Sublinear) Run Time, Mengdi Wang, Department of Operations Research and Financial Engineering, Princeton University, 2017.

Solving Hhorizon, Stationary Markov Decision Problems In Time Proportional To Log(H), Paul Tseng, Laboratory for Information and Decision Systems, MIT. Operations Reseserch Letters 9 (1990) 287297.

FiniteSample Convergence Rates for QLearning and Indirect Algorithms, Michael Kearns and Satinder Singh, AT&T Labs, 180 Park Avenue, Florham Park, NJ 07932.