Author(s): Federico Vilaseca; Christian Chreties; Alberto Castro; Angela Gorgoglione
Linked Author(s): Christian Chreties
Keywords: No Keywords
Abstract: This paper proposes a methodology based on data augmentation to improve the performance of data-driven hydrological models during high flows. Problems in the representation of high discharges by data-driven models were observed in previous research, which the authors of this work attribute, in part, to the shortage of high-flow observations in the training data. This creates an imbalance problem that biases the learning process towards the representation of low flows. The proposed methodology was tested for models generated with the Random Forest machine learning algorithm, implemented in two incremental watersheds of the Santa Lucia Chico basin in Uruguay. Results showed an average increase in performance of 18 % for Nash-Sutcliffe efficiency and 37 % for peak-flow Nash-Sutcliffe efficiency. The work allows us to conclude that class imbalance is a relevant issue affecting the performance of data-driven rainfall-runoff models under certain conditions and that the proposed methodology is useful to tackle it, potentially improving model performance for high flows.
Year: 2024