Policy Representation Learning for Multiobjective Reservoir Policy Design with Different Objective Dynamics

IAHR Document Library

« Back to Library Homepage « Proceedings of the 2nd International Symposium on Water Syst...

Policy Representation Learning for Multiobjective Reservoir Policy Design with Different Objective Dynamics

Download

Author(s): M. Zaniolo; M. Giuliani; A. Castelletti

Linked Author(s):

Keywords: Forecasting & real-time operation; Multi-purpose water systems; Multi-objective optimization; Information selection; Direct policy search

Abstract: Most water reservoir operators make use of forecasts to inform their decisions and enhance water systems flexibility and resilience by anticipating hydrological extremes. Yet, despite numerous candidate hydro-meteorological variables and forecast horizons may potentially be beneficial to operations, the best information set for a given problem is often not evident a priori. Additionally, in multi-purpose systems, characterized by multiple demands, this information set might change according to the objective tradeoff. For instance, common operating targets, e. g., flood protection and water supply, can be vastly heterogeneous in their dynamics and vulnerabilities. Flood events are generally caused by the onset of fast and intense wet meteorological extreme events, and flood-conservative policies benefit from a short lead time that conveys peak inflow magnitude and timing; conversely, water supply shortages are caused by slow-developing droughts, and an effective policy seeks predictors that are relevant for prolonged water shortages to timely activate hedging strategies. The tradeoff space between these two opposite policies is populated by diversely balancing opposite control targets. In this work [1], we contribute a novel method to learn the optimal policy representation (i. e., policy input set) for varying objective tradeoffs, by combining a feature selection routine with a multi-objective Direct Policy Search framework. The selected policy search routine is the Neuro-Evolutionary Multi-Objective Direct Policy Search (NEMODPS) [2]. NEMODPS is a multi-objective NeuroEvolutionary algorithm that uses evolutionary techniques to evolve a policy architecture along with its parameters. The flexible policy architectures generated via NEMODS are used here to dynamically evolve the policy input set. This approach is demonstrated in the case study of Lake Como (Italy), where the operating objectives are highly heterogeneous in their dynamics (fast and slow) and vulnerabilities (wet and dry extremes). Our results demonstrate that one policy input set is inadequate to represent the entire space of different control behaviors that may emerge for alternative tradeoffs, and varying objectives, and tradeoffs therein, benefit from a different policy representation, ultimately yielding remarkable results in terms of conflict mitigation between different users. More informed policies, moreover, show higher robustness when re-evaluated across a suite of different hydrological conditions.

DOI:

Year: 2021