i
May 30, 2019
Policy
Policy
Hierarchical Policy Gradient Algorithms Math Notation M : the overall task MDP. {M0, M1, M2 , M3 , . . . , Mn } : a finite set of subtask MDPs. Mi : subtask, models a subtask in the hierarchy. M0 : root task and solving it solves the entire MDP M. i : non-primitive subtask, paper uses… read more »
There is no excerpt because this is a protected post.
There is no excerpt because this is a protected post.
There is no excerpt because this is a protected post.