Monte Carlo Grid Dynamic Programming: Almost Sure Convergence and Probability Constraints
Dynamic Programming suffers from the well-known “curse of dimensionality”, further exacerbated by expectations in stochastic systems. This paper presents a Monte Carlo-based sampling approach of the state and input spaces and an interpolation procedure for the resulting value function in a “self-approximating” fashion, eliminating the need for ordering or set-membership tests. We provide a proof of almost sure convergence for the value iteration (and consequently, policy iteration) procedure. The proposed sampling and self-approximating algorithm alleviates the burden of gridding and interpolation traditionally required in DP. Moreover, we demonstrate that the proposed interpolation procedure is well-suited for handling probabilistic constraints by sampling both infeasible and feasible regions. The curse of dimensionality cannot be avoided, however, this approach offers a convenient framework for addressing.