Abstract:
Iterative trajectory optimization techniques for
non-linear dynamical systems are among the most powerful and
sample-efficient methods of model-based reinforcement learning
and approximate optimal control. By leveraging time-variant
local linear-quadratic approximations of system dynamics and
reward, such methods can find both a target-optimal trajectory
and time-variant optimal feedback controllers. However, the
local linear-quadratic assumptions are a major source of opti-
mization bias that leads to catastrophic greedy updates, raising
the issue of proper regularization. Moreover, the approximate
models’ disregard for any physical state-action limits of the
system causes further aggravation of the problem, as the
optimization moves towards unreachable areas of the state-
action space. In this paper, we address the issue of constrained
systems in the scenario of online-fitted stochastic linear dy-
namics. We propose modeling state and action physical limits
as probabilistic chance constraints linear in both state and
action and introduce a new trajectory optimization technique
that integrates these probabilistic constraints by optimizing a
relaxed quadratic program. Our empirical evaluations show a
significant improvement in learning robustness, which enables
our approach to perform more effective updates and avoid
premature convergence observed in state-of-the-art algorithms.