#### Blog Stats

• 133,302 hits

# Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time

### Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Run Time

The nonlinear Bellman equation = $\left&space;|&space;S&space;\right&space;|&space;\times&space;\left&space;(\left&space;|&space;S&space;\right&space;|\left&space;|&space;A&space;\right&space;|&space;\right&space;)$ linear programming problem: Primal-Dual LP

Primal LP

$min.&space;\left&space;(&space;1-\gamma&space;\right&space;)\mathbf{q}^T&space;\mathbf{v}&space;\\&space;\indent&space;s.t.&space;\left&space;(&space;I-\gamma&space;P_a&space;\right&space;)\mathbf{v}-\mathbf{r}_a&space;\geq&space;0,&space;\forall&space;a&space;\in&space;A,$ (1)

$\forall&space;i&space;\in&space;S&space;\\&space;\indent&space;r_a(i)=\sum_{j\in&space;S}p_{ij}(a)r_{ij}(a)$

Dual LP

$max.&space;\sum_{a&space;\in&space;A}&space;\mu_{a}^{T}\mathbf{r}_a&space;\\&space;\indent&space;s.t.&space;\sum_{a&space;\in&space;A}&space;\left&space;(&space;I-&space;\gamma&space;P_a^T&space;\right&space;)&space;\mu_a&space;=&space;\left&space;(&space;1-\gamma&space;\right&space;)\mathbf{q},&space;\mu_a&space;\geq&space;0,&space;\forall&space;a&space;\in&space;A.$ (2)

Minmax Problem

$min_{\mathbf{v}&space;\in&space;V}\&space;max_{\mu&space;\in&space;U_{\theta,&space;\mathbf{q}}}&space;\left&space;(&space;1-\gamma&space;\right&space;)&space;\mathbf{q}^T&space;\mathbf{v}&space;+&space;\sum_{a&space;\in&space;A}&space;\mu_a^T&space;\left&space;(&space;\left&space;(&space;\gamma&space;P_a&space;-&space;I&space;\right&space;)\mathbf{v}&space;+&space;\mathbf{r}_a&space;\right&space;)$ (3)

$V=\left&space;\{&space;\left&space;\|&space;\mathbf{v}&space;\right&space;\|_{\infty&space;}&space;\leq&space;\frac{1}{1-\gamma}&space;,&space;\mathbf{v}&space;\geq&space;0\right&space;\},&space;U_{\theta,\mathbf{q}}=\left&space;\{&space;\mathbf{e}^T&space;\mu&space;=1,&space;\mu&space;\geq&space;0,&space;\sum_{a&space;\in&space;A}&space;\mu_a&space;\geq&space;\theta&space;\mathbf{q}&space;\right&space;\}$

Sidebar