Policy Gradient Methods

i

Policy  

Hierarchical Policy Gradient Algorithms

Hierarchical Policy Gradient Algorithms Math Notation M : the overall task MDP. {M0, M1, M2 , M3 , . . . , Mn } : a finite set of subtask MDPs. Mi : subtask, models a subtask in the hierarchy. M0 : root task and solving it solves the entire MDP M. i : non-primitive subtask, paper uses… read more »

Sidebar



×

Learn more