Tempo Adaption in Non-stationary Reinforcement Learning

We raise and tackle a time synchronization issue between agent and environment in non-stationary RL, where environment changes occur over wall-clock time rather than episode count. We propose a Proactively Synchronizing Tempo (PST) framework that optimizes a suboptimal sequence of interaction/training durations by minimizing an upper bound on dynamic regret. The result trades off agent training tempo with environment change tempo, yielding a sublinear dynamic regret bound. Experiments on high-dimensional non-stationary environments show PST achieves higher online return at a non-zero optimal tempo compared to existing methods.

Authors

Hyunin Lee

Yuhao Ding

Jongmin Lee

Ming Jin

Javad Lavaei

Somayeh Sojoudi

Published

September 26, 2023