Hierarchical RL Archives

Protected: Resume

March 18, 2026

There is no excerpt because this is a protected post.

Actor-Critic Algorithms for Hierarchical Markov Decision Processes

April 22, 2026

Actor-Critic Algorithms for Hierarchical Markov Decision Processes

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

April 22, 2026

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation 当环境给的奖励少而延迟时，论文给出了一个解决方案：agent至始至终只有一个，但分两个阶段：1总控器阶段，选goal，2控制器，根据当前state和goal，输出action，critic判断goal是否完成或达到终态。重复1,2。总控器选一个新的goal，控制器再输出action，依次类推。我理解它把环境“分”出N个时序上的小环境，与每个小环境对应1个goal。agent实体在这种环境下可以等效为一个点。 The key is that the policy over goals πg which makes expected Q-value with discounting maximum is the policy which the agent chooses, i.e., if the goal sequence g1-g3-g2-… ‘s Q-value is the maximum value among that of all kinds of goal sequences, the agent should… read more »

Meta Learning Shared Hierarchies

April 22, 2026

Meta Learning Shared Hierarchies Notation S: state space. A: action space. MDP: transition function P(s’, r|s, a), (s’, r): next state and reward, (s,a): state and action. PM : distribution over MDPs M with the same state-action space (S, A). Agent: a function mapping from a multi-episode history (s0, a0, r0, s1, a2, r2, …… read more »

Hierarchical Policy Gradient Algorithms

April 22, 2026

Hierarchical Policy Gradient Algorithms Math Notation M : the overall task MDP. {M0, M1, M2 , M3 , . . . , Mn } : a finite set of subtask MDPs. Mi : subtask, models a subtask in the hierarchy. M0 : root task and solving it solves the entire MDP M. i : non-primitive subtask, paper uses… read more »

Hierarchical Actor-Critic

April 22, 2026

Hierarchical Actor-Critic Download Hierarchical_Actor-Critic Flowchart Terminology Artificial intelligence Optimization/decision/control a Agent Controller or decision maker b Action Control c Environment System d Reward of a stage (Opposite of) Cost of a stage e Stage value (Opposite of) Cost of a state f Value (or state-value) function (Opposite of) Cost function g Maximizing the value function… read more »

Dr. Pei

Email Address:

Blog Stats

State Action/Control

Meta

Hierarchical RL Archives - Dr. Pei

Protected: Resume

Actor-Critic Algorithms for Hierarchical Markov Decision Processes

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Meta Learning Shared Hierarchies

Hierarchical Policy Gradient Algorithms

Hierarchical Actor-Critic