Addressing Class Imbalance Problems in Data-Driven Rainfall-Runoff Modelling

IAHR Document Library

« Back to Library Homepage « Proceedings of the 8th IAHR Europe Congress (Lisbon, 2024)

Addressing Class Imbalance Problems in Data-Driven Rainfall-Runoff Modelling

Download

Author(s): Federico Vilaseca; Christian Chreties; Alberto Castro; Angela Gorgoglione

Linked Author(s): Christian Chreties

Keywords: No Keywords

Abstract: This paper proposes a methodology based on data augmentation to improve the performance of data-driven hydrological models during high flows. Problems in the representation of high discharges by data-driven models were observed in previous research, which the authors of this work attribute, in part, to the shortage of high-flow observations in the training data. This creates an imbalance problem that biases the learning process towards the representation of low flows. The proposed methodology was tested for models generated with the Random Forest machine learning algorithm, implemented in two incremental watersheds of the Santa Lucia Chico basin in Uruguay. Results showed an average increase in performance of 18 % for Nash-Sutcliffe efficiency and 37 % for peak-flow Nash-Sutcliffe efficiency. The work allows us to conclude that class imbalance is a relevant issue affecting the performance of data-driven rainfall-runoff models under certain conditions and that the proposed methodology is useful to tackle it, potentially improving model performance for high flows.

DOI:

Year: 2024