Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time


Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time


The nonlinear Bellman equation = \left | S \right | \times \left (\left | S \right |\left | A \right | \right ) linear programming problem: Primal-Dual LP

Primal LP

min. \left ( 1-\gamma \right )\mathbf{q}^T \mathbf{v} \\ \indent s.t. \left ( I-\gamma P_a \right )\mathbf{v}-\mathbf{r}_a \geq 0, \forall a \in A, (1)

\forall i \in S \\ \indent r_a(i)=\sum_{j\in S}p_{ij}(a)r_{ij}(a)

Dual LP

max. \sum_{a \in A} \mu_{a}^{T}\mathbf{r}_a \\ \indent s.t. \sum_{a \in A} \left ( I- \gamma P_a^T \right ) \mu_a = \left ( 1-\gamma \right )\mathbf{q}, \mu_a \geq 0, \forall a \in A. (2)

 

Minmax Problem

min_{\mathbf{v} \in V}\ max_{\mu \in U_{\theta, \mathbf{q}}} \left ( 1-\gamma \right ) \mathbf{q}^T \mathbf{v} + \sum_{a \in A} \mu_a^T \left ( \left ( \gamma P_a - I \right )\mathbf{v} + \mathbf{r}_a \right ) (3)

 

V=\left \{ \left \| \mathbf{v} \right \|_{\infty } \leq \frac{1}{1-\gamma} , \mathbf{v} \geq 0\right \}, U_{\theta,\mathbf{q}}=\left \{ \mathbf{e}^T \mu =1, \mu \geq 0, \sum_{a \in A} \mu_a \geq \theta \mathbf{q} \right \}


 

Sidebar