Towards Off-Policy Evaluation as a Prerequisite for Real-World Reinforcement Learning in Building Control
Off-policy evaluation (OPE) estimates a policy’s performance without online interaction, enabling safety and performance checks before deployment in building control. We review OPE methods against the characteristics of building operation data—deterministic behavior policies and limited coverage—adopt an approximate model approach, and use bootstrapping to quantify uncertainty and correct for bias. Simulation results highlight practical considerations for real-world RL in buildings.