Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics

– Math and Optimal Control

Problem formulation

Consider a continuous-time nonlinear large-scale system ∑ composed of N interconnected subsystems described by

$\begin{align*} \sum : {\color{Red} \dot{{\color{Blue} x}}}_i(t)&=f_i[{\color{Blue} x}_i(t)]+{\color{Magenta} g}_i[{\color{Blue} x}_i(t)]\left \{ {\color{Red} u}_i[x_i(t)] +Z_i[{\color{Blue} x}(t)] \right \} \\ i&=1,2,...,N \end{align*}$ (1)

where

x_i(t) ∈ Rⁿ^_i: state.

The overall state of the large-scale system ∑ is denoted by $x=[{\color{Blue} x}_1^T x_2^T ... x_N^T]^T \in \mathbb{R}^n, where\ n=\sum_{i=1}^N n_i$

u_i[ x_i(t) ] ∈ R^m^_i: control input vector of the ith subsystem.

f_i : continuous nonlinear internal dynamics function. f_i (0)=0. $\mathbb{R}^{{\color{Red} n}_i} \to \mathbb{R}^{{\color{Red} n}_i}$

g_i[ x_i(t) ] : input gain function $\mathbb{R}^{{\color{Red} n}_i} \to \mathbb{R}^{{\color{Red} n}_i\times {\color{Magenta} m}_i}$

Z_i [ x(t) ] : interconnected term for the ith subsystem.

The ith isolated subsystem

$\begin{align*} \sum_{\color{Red} i} : {\color{Red} \dot{{\color{Blue} x}}}_i(t)&=f_i[{\color{Blue} x}_i(t)]+{\color{Magenta} g}_i[{\color{Blue} x}_i(t)]\left \{ {\color{Red} u}_i[{\color{Blue} x}_i(t)] \right \} \\ i&=1,2,...,N \end{align*}$ (2)

Decentralized control law

Optimal control

———————-

Reinforcement Learning and Optimal Control Methods for Uncertain Nonlinear Systems

Page 27-29 2.3 Infinite Horizon Optimal Control Problem is the same as Definition 1.

Notation:

$x(t) \in \chi \subseteq \mathbb{R}^n$ : state.

$u(t) \in U \subseteq \mathbb{R}^m$ : control input.

$\dot{x}=F(x,u)$ (2-5)

Cost function for the system Eq. 2-5:

$J(x(t),u(\tau)_{t\leq \tau < {\infty}} )=\int_{t}^{\infty}r(x(s),u(s))ds$ (2-6)

where t : initial time.

r(x,u) ∈ R : immediate or local cost for the state and control.

${\color{Blue} r}(x,u)=Q(x)+u^TRu$ (2-7)

where Q(x) ∈ R continuously differentiable and positive definite.

R ∈ R ^{m x m} : positive-definite symmetric matrix.

Optimal value function:

$V^*(x(t))=min_{{u(\tau)\in \Psi (\chi)},\ {t\leq \tau < \infty}}\int_{t}^{\infty}r \{ x(s),u[x(s)] \}ds$ (2-8)

where

$\Psi(\cdot)$ : set of admissible controls.

Bellman’s principle of optimally can be used to derive the following optimality condition

${\color{Blue} 0=min_{u(t)\in \Psi (\chi)} \left [ r(x,u) + \frac{\partial V^*(x)}{\partial x} F(x,u) \right ]}$ (2-9)

which is a nonlinear partial differential equation (PDE), also called the HJB equation.

Optimal control: (using convex local cost in Eqs. 2-7 and 2-9.)

$u^*(x)=-\frac{1}{2}R^{-1} {\color{Magenta} \frac{\partial F(x,u)^T}{\partial u}}\frac{\partial V^*(x)^T}{\partial x}$ (2-10)

For the control-affine dynamics of the form

$\dot x={\color{Golden} f(x)+g(x)u}=F(x,u)$ (2-11)

Eq. 2-10 -> in terms of the system state

${\color{Red} u}^*(x)=-\frac{1}{2}R^{-1} {\color{Magenta} g^T(x)}\frac{\partial V^*(x)^T}{\partial x}$ (2-12)

The HJB in Eq. 2-9 can be rewritten in terms of the optimal value function by substituting for the local cost in Eq. 2-7, the system in Eq. 2-11 and the optimal control in Eq. 2-12, as

$\begin{align*} 0 &=min_{u(t)\in \Psi (\chi)} \left [ {\color{Blue} r(x,u)} + \frac{\partial V^*(x)}{\partial x} {\color{Golden} F(x,u)} \right ]\\ &=min_{u(t)\in \Psi (\chi)} \left [ {\color{Blue} Q(x)+u^TRu} + \frac{\partial V^*(x)}{\partial x} \left [{\color{Golden} f(x)+g(x)u }\right ] \right ]\\ &=Q(x)+{\color{Red} u^*}^TR{\color{Red} u^*} + \frac{\partial V^*(x)}{\partial x} \left [ f(x) +g(x){\color{Red} u^*} \right ]\\ &=Q(x)+ \left [ {\color{Red} -\frac{1}{2}R^{-1} g^T(x)\frac{\partial V^*(x)^T}{\partial x}} \right ]^TR \left[ {\color{Red} -\frac{1}{2}R^{-1} g^T(x)\frac{\partial V^*(x)^T}{\partial x}} \right ] +\frac{\partial V^*(x)}{\partial x} \left \{ f(x)+g(x) \left[ {\color{Red} -\frac{1}{2}R^{-1} g^T(x)\frac{\partial V^*(x)^T}{\partial x}} \right ]\right \} \\ \end{align*}$

$\xrightarrow[C^TB^TA^T]{(ABC)^T=}\\ \begin{align*} 0&=Q(x)+ \left \{ -\frac{1}{2} \left [ \frac{\partial V^*(x)^T}{\partial x} \right ]^T [g^T(x)] ^T \left [ R^{-1} \right]^T \right \} R \left[ {\color{Red} -\frac{1}{2}R^{-1} g^T(x)\frac{\partial V^*(x)^T}{\partial x}} \right ] +\frac{\partial V^*(x)}{\partial x} \left \{ f(x)+g(x) \left[ {\color{Red} -\frac{1}{2}R^{-1} g^T(x)\frac{\partial V^*(x)^T}{\partial x}} \right ]\right \} \\ &=Q(x) + \frac{1}{4}\frac{\partial V^*(x)}{\partial x}g(x){R^{-1}}^T g^T(x)\frac{\partial V^*(x)^T}{\partial x} + \frac{\partial V^*(x)}{\partial x}f(x)-\frac{1}{2}\frac{\partial V^*(x)}{\partial x}g(x) R^{-1}g^T(x)\frac{\partial V^*(x)^T}{\partial x} \end{align*}$

$\xrightarrow[R: symmetric]{R^T=R} {\color{Blue} {R^{-1}}^T=R^{-1}}$

$\begin{align*} 0 &=min_{u(t)\in \Psi (\chi)} \left [ r(x,u) + \frac{\partial V^*(x)}{\partial x} F(x,u) \right ]\\ &=min_{u(t)\in \Psi (\chi)} \left [ Q(x)+u^TRu + \frac{\partial V^*(x)}{\partial x} \left [f(x)+g(x)u \right ] \right ]\\ &=Q(x)+{\color{Red} u^*}^TR{\color{Red} u^*} + \frac{\partial V^*(x)}{\partial x} \left [ f(x) +g(x){\color{Red} u^*} \right ]\\ &=Q(x) + \frac{1}{4}\frac{\partial V^*(x)}{\partial x}g(x){\color{Blue} {R^{-1}}^T} g^T(x)\frac{\partial V^*(x)^T}{\partial x} + \frac{\partial V^*(x)}{\partial x}f(x)-\frac{1}{2}\frac{\partial V^*(x)}{\partial x}g(x) R^{-1}g^T(x)\frac{\partial V^*(x)^T}{\partial x} \\ &=Q(x) + \frac{1}{4}\frac{\partial V^*(x)}{\partial x}g(x){\color{Blue} R^{-1}} g^T(x)\frac{\partial V^*(x)^T}{\partial x} + \frac{\partial V^*(x)}{\partial x}f(x)-\frac{1}{2}\frac{\partial V^*(x)}{\partial x}g(x) R^{-1}g^T(x)\frac{\partial V^*(x)^T}{\partial x} \\ \end{align*}$

$\begin{align*} {\color{Blue} 0}&{\color{Blue} = }{\color{Blue} Q(x)+\frac{\partial V^*(x)}{\partial x}f(x)-\frac{1}{4}\frac{\partial V^*(x)}{\partial x}g(x) R^{-1} g^T(x)\frac{\partial V^*(x)^T}{\partial x}}\\ 0&=V^*(0) \end{align*}$ (2-13)

———————-

Pei

Email Address:

Blog Stats

State Action/Control

Meta

Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics