Reinforcement Learning is Direct Adaptive Optimal Control
Reinforcement Learning is Direct Adaptive Optimal Control
How should Reinforcement learning be viewed from a control systems perspective?
Control problems can be divided into two classes:
- regulation and tracking problems, in which the objective is to follow a reference trajectory.
- optimal control problems, which the objective is to extremize a functional of the controlled system’s behavior that not necessarily defined in terms of a reference trajectory.
In some problems, the control objective is defined in terms of a reference level or reference trajectory that the controlled system’s output should match or track as closely as possible. Stability is the key issue in these regulation and tracking problems.
In other problems, the control objective is to extremize a functional of the controlled system’s behavior that is not necessarily defined in terms of a reference level or trajectory. The key issue in the latter problems is constrained optimization; here optimal-control methods based on the calculus of variations and dynamic programming have been extensively studied.
When a detailed and accurate model of the system to be controlled is not available, adaptive control methods can be applied. The overwhelming majority of adaptive control methods address regulation and tracking problems. However, adaptive methods for optimal control problems would be widely applicable if methods could be developed that were computationally feasible and that could be applied robustly to nonlinear systems.
For example, trajectory planning is a key and difficult problem in robot navigation tasks, as it is in other robot control tasks. To design a robot capable of walking bipedally, one may not be able to specify a desired trajectory for the limbs a priori, but one can specify the objective of moving forward, maintaining equilibrium, not damaging the robot, etc.
For both tracking and optimal control, it is usual to distinguish between indirect and direct adaptive control methods. An indirect method replies on a system identification procedure to form an explicit model of the controlled system and determines then the control rule from the model. Direct methods determine the control rule without forming such a system model.
| Control problems | Tracking problems | Optimal problems |
| Trajectory | Yes | Not necessary |
| Adaptive methods | More | Little attention |
| Adaptive control methods | Direct | Indirect |
Present reinforcement learning methods as a direct approach to adaptive optimal control.