Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) faces challenges such as limited data coverage and value overestimation. We propose an implicit actor-critic (iAC) framework that uses optimization solution functions as a deterministic policy (actor) and a monotone function over the optimal value as a critic. By encoding optimality in the actor policy, the learned policies become robust to suboptimality of actor parameters through an exponentially decaying sensitivity (EDS) property. We provide performance guarantees and validate the framework on two real-world applications, showing notable gains over state-of-the-art offline RL methods.

Authors

Vanshaj Khattar

Ming Jin

Published

January 1, 2024