Author(s): Karen Schulz; Andre Niemann
Linked Author(s):
Keywords: Water resources management sensor data CNN-LSTM RF imbalanced data precipitation data fusion error correction sub-daily
Abstract: Data quality is fundamental to innovative, data-driven applications. The current advancements in artificial intelligence have not been fully exploited in terms of data-quality control in the water sector. Most methodologies for data quality control rely on conventional or relatively simple machine learning algorithms. However, deep learning can generally leverage spatio-temporal associations and model generalization can lead to a larger application area. This study aims to compare a standard machine learning method, random forest to a CNN-LSTM-based deep learning model. Furthermore, individual and generalized models are contrasted over a 1,500 km2 catchment to correct sub-daily, highly imbalanced, rain gauge data. It was found that the generalized random forest model had the best performance. We propose, through inductive reasoning, to use less complex AI-based models and catchment models instead of single-station models. This is also beneficial for practical applicability.
Year: 2025