AI

Policy Gradient Methods

Policy Gradient Methods In summary, I guess because 1. policy (probability of action) has the style: , 2. obtain (or let’s say ‘math trick’) in the objective function ( i.e., value function )’s gradient equation to get an ‘Expectation’ form for : , assign ‘ln’ to policy before gradient for analysis convenience. pg Notation J(θ):… read more »

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation 当环境给的奖励少而延迟时,论文给出了一个解决方案:agent至始至终只有一个,但分两个阶段:1总控器阶段,选goal,2控制器,根据当前state和goal,输出action,critic判断goal是否完成或达到终态。重复1,2。总控器选一个新的goal,控制器再输出action,依次类推。我理解它把环境“分”出N个时序上的小环境,与每个小环境对应1个goal。agent实体在这种环境下可以等效为一个点。 The key is that the policy over goals πg which makes expected Q-value with discounting maximum is the policy which the agent chooses, i.e., if the goal sequence g1-g3-g2-… ‘s Q-value is the maximum value among that of all kinds of goal sequences, the agent should… read more »

Meta Learning Shared Hierarchies

Meta Learning Shared Hierarchies Notation S: state space. A: action space. MDP: transition function P(s’, r|s, a), (s’, r): next state and reward, (s,a): state and action. PM : distribution over MDPs M with the same state-action space (S, A). Agent: a function mapping from a multi-episode history (s0, a0, r0, s1, a2, r2, …… read more »

RL Math

Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics Global Value vs. Sub-goals by Policy Gradient Neuro-Dynamic Programming Gradient Methods Framework Policy Gradient Method for Hierarchical RL Policy Gradient HRL Policy Gradient HRL and Neuro-Dynamic Programming Policy Gradient Method for HRL The scanned draft files above contain handwritten mathematical formulas or tools, including… read more »

Decentralized Optimal Control of Distributed Interdependent Automata With Priority Structure

Decentralized Optimal Control of Distributed Interdependent Automata With Priority Structure Data Flowchart Notation : subsystem model, the plant P i , deterministic finite-state automaton. (1)      (2) (3)   (4) : P i  can be transitioned from state  into state  if the input l is applied.   (5)   It encodes with  that the transition  is possible with at least… read more »

Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics

  Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics – Math and Optimal Control Problem formulation Consider a continuous-time nonlinear large-scale system ∑ composed of N interconnected subsystems described by (1) where xi(t) ∈ Rni : state. The overall state of the large-scale system ∑ is denoted by  ui [ xi(t) ] ∈ Rmi : control input vector of the ith… read more »

Reinforcement Learning is Direct Adaptive Optimal Control

Reinforcement Learning is Direct Adaptive Optimal Control Stanford_cs229-notes12_Andrew_Ng Reinforcement Learning and Control How should Reinforcement learning be viewed from a control systems perspective? Control problems can be divided into two classes: regulation and tracking problems, in which the objective is to follow a reference trajectory. optimal control problems, which the objective is to extremize a… read more »

Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach

Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach Neural-network-based Online Learning Optimal Control Decentralized Control Strategy Cost functions (critic neural networks) – local optimal controllers Feedback gains to the optimal control policies – decentralized control strategy Optimal Control Problem (Stabilization) Hamilton-Jacobi-Bellman (HJB) Equations Apply Online Policy Iteration… read more »

Hierarchical Policy Gradient Algorithms

Hierarchical Policy Gradient Algorithms Math Notation M : the overall task MDP. {M0, M1, M2 , M3 , . . . , Mn } : a finite set of subtask MDPs. Mi : subtask, models a subtask in the hierarchy. M0 : root task and solving it solves the entire MDP M. i : non-primitive subtask, paper uses… read more »

Sidebar