Hierarchical

Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion

Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion   本论文关键在于机器狗走路经过崎岖路面到达goal的特殊性决定了比较方便选low-level:四条腿,与地面接触,high-level:整体重心,与goal直线距离(关于专家建议)。后面有分析。 图5表明机器狗的足迹,学习前和学习后差别很大,只用footstep约束(四条腿)会使机器狗走弯路,我理解是四条腿更关心路面的崎岖程度,哪里更不容易卡住或者摔倒就走哪里,而body path planner计划机器狗重心近似轨迹(在terrain上方)到goal,可以理解成path更关心到goal的直线距离。 机器狗在测试terrain中只从path-level demonstration过不去,也就是说如果只关心机器狗重心到goal的直线距离而不关心4条腿与地面接触就不能到达goal,因为机器狗会在路面上摔倒或者卡住。  

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation 当环境给的奖励少而延迟时,论文给出了一个解决方案:agent至始至终只有一个,但分两个阶段:1总控器阶段,选goal,2控制器,根据当前state和goal,输出action,critic判断goal是否完成或达到终态。重复1,2。总控器选一个新的goal,控制器再输出action,依次类推。我理解它把环境“分”出N个时序上的小环境,与每个小环境对应1个goal。agent实体在这种环境下可以等效为一个点。 The key is that the policy over goals πg which makes expected Q-value with discounting maximum is the policy which the agent chooses, i.e., if the goal sequence g1-g3-g2-… ‘s Q-value is the maximum value among that of all kinds of goal sequences, the agent should… read more »

Sidebar