Convergence guarantee of Policy Gradient with function approximation

Asked Dec 18 '20 at 11:42

Active Dec 18 '20 at 17:04

Viewed 40 times

Is there any convergence proof of the Policy Gradient algorithm with "general" value/Q function approximation ? Seminal papers (Sutton1999 & Tsitsiklis1999) prove the theorem using a compatibility assumption (i.e. the Q-function approximation is linear w.r.t to policy's features). Also later improvement such as DPG (Silver14) also have similar assumptions.

Yet in practice this compatibility assumption is not satisfied, policy network and Q-function network have their own, independent, set of parameters.

Hence I wonder to which extend those methods are supported by theoretical guarantees.

Thanks,

(Sutton1999) : Policy gradient methods for reinforcement learning with function approximation, Sutton et al, 1999 (Silver2014) : Deterministic Policy Gradient Algorithms, Silver et al, 2014 (Tsitsiklis1999) : Actor-Critic Algorithms, Tsitsiklis et al, 1999

edited Dec 18 '20 at 17:04

chesh

asked Dec 18 '20 at 11:42

arnaud

Convergence guarantee of Policy Gradient with function approximation

0 Answers0