Reinforcement learning-based controllers (RL-controllers) in self-driving datacenters have evolved into complex dynamic systems that require continuous tuning to achieve higher performance than hand-crafted expert heuristics. The operating environment of these controllers poses additional challenges as it can often change significantly, thus the controllers must be adapted to new external conditions. To obtain trustworthy RL-controllers for self-driving datacenters, it is essential to guarantee that RL-controllers that are trained continuously in these changing environments behave according to the designer’s notions of reliability and correctness. Traditionally, RL-controllers are evaluated by comparing their reward function statistics. However, that does not capture all desired properties, e.g., stability, of the controller. In this work, we propose enhancing the evaluation criteria for RL-controllers with a set of novel metrics that quantify how well the controller performs with respect to user-defined properties. We leverage formal methods for computing our novel metrics. Thus, our work makes a step forward toward improving trustworthiness of RL-controllers. We show that these metrics are useful in evaluating a standalone controller or in comparing multiple controllers that achieve the same reward.