A GAUSSIAN PROCESS MODEL APPLIED TO PREDICTION OF THE WATER LEVELS IN VENICE LAGOON

 

 

Vladan Babovic and Maarten Keijzer

DHI - Water & Environment

 

 

Abstract: Forecasting the water level at Venice lagoon has been object of exten­sive studies in the past. For example, the numerical model (MIKE 21) based on deterministic equations has been setup for the purposes of the operational water level forecast. The model includes all the fundamental modelling components necessary for use in operational mode and model has been tested against a number of historical storms.

    This paper describes a somewhat alternative approach of combining observations and numerical model results in order to produce a more ac­curate forecast routine. The paper presents an approach where the errors made by a deterministic model are corrected by a gaussian process model. The paper concludes with the analysis of the forecast skill for resulting, hybrid model which provides rather good forecast skill that can be ex­tended over a forecasting horizon of considerable length. Keywords: data assimilation, forecast, gaussian process.

1    Introduction

The desire to predict the future and understand the past governs the search for laws that explain the behaviour of observed phenomena. If the underlying de­terministic equations are known, in principle they can be solved to forecast the outcome based on knowledge of the initial conditions and evolution of forcing terms. In hydraulic modelling, for example, governing laws may be represented by Navier-Stokes equations, whereas the forcing term by the evolution of the status of atmosphere. Initial conditions describe the sea status in entire compu­tational domain at the beginning of computation. Once the initial and forcing terms are precisely specified, it should be possible to precisely calculate the evolution of the status of the sea, from its specified initial conditions and as a consequence of applied forcing.

However, even under these, almost ideal circumstances, the model results are not precise. Every model is indeed only a model of reality. In a numerical modelling one discretises a domain, and is not able to resolve numerous sub-grid phenomena. Also, the errors in the model parametrisation contribute to errors of numerical models. Finally, it is impossible to precisely define initial

conditions and forcing terms in the entire computational domain. All these imprecisions and uncertainties can accumulate and result in fairly poor model results - despite our “perfect” knowledge of governing laws.

It is established and accepted fact that numerical models are far from being perfect. Various schemes may be utilised to make models more accurate. When observations of the modelled phenomena are available data assimilation methods can be used in order to improve the model solution.

2    Data Assimilation

Data assimilation is a methodology which can optimise the extraction of reliable information from observations and combine it with^ or assimilate it in, numer­ical models. The data assimilation procedures may be classified according to the variables modified during the updating process. In WMO report [8] four different methodologies have been defined 1. The four methodologies can be defined as follows:

2.1    Updating of input parameters

This is the classical method justified by the fact that input uncertainties may be the dominant error source in operational forecasting.

2.2    Updating of state variables

The theoretically most comprehensive method­ology is based on Kalman filtering [3]. Kalman filtering is the optimal updating procedure for linear systems, but can with some modifications also provide an approximate solution for non-linear hydrodynamic sys­tems.

2.3    Updating of model parameters

such as continuous re-calibration is a possi­bility, yet a matter of continuous debate.

2.4    Updating of output variables (error prediction)

The deviations between the simulation mode forecast and the observed variables, are model errors. Possibility of forecasting these errors and superimposing them to the sim­ulation mode forecasts, usually gives a more accurate performance. This method is employed in the present study.

If forecasting interest is limited to only a few variables at some specific lo­cations with a high degree of accuracy and for a considerably long lead-time forecast, a data assimilation scheme based on updating of output variables (er­ror prediction) may be the most suitable approach. In model error prediction techniques such as artificial neural networks [1] or an approach based on chaos theory [4] have demonstrated very good forecast skill.

3    Case Study

The city of Venice is frequently flooded due to the occurrence of large sea level rises in the Northern Adriatic Sea, resulting: from the interaction of a number of phenomena, the most dominant being meteorological, namely strong SE winds known as Scirocco and Bora winds from the NE. Due to these winds and to the associated low pressure system, storm surges of more than 1 m above mean sea level are likely to be observed at the coastline of the Venice Lagoon. These surges are superimposed on the astronomical tide, and their interaction under unfavourable conditions (high water at spring equinoctial tides) may lead to catastrophic flooding. Besides, taking into account the elevation of some streets and shops is less than 0.8 above mean sea level and that the range of an average spring tide is +/- 0.6 m, even a rather frequent 0.3 storm surge can cause significant damage if it occurs at spring high water. Furthermore, if for instance the winds stop blowing shortly after the highest levels occur, free oscillations (seiches) will be triggered all over the Adriatic, with amplitudes at Venice large enough to precipitate new floods after the main storm has passed. Important flooding in Venice occurs on average 5-10 times a year and the situation has been aggravated over the years due to the non-negligible sinking of the town.

 

The deterministic model

A deterministic modelling system has been developed and implemented by the DHI  Water & Environment. The system includes all the fundamental mod­elling components necessary for use in operational mode and has been tested against eight historical storms defined in the terms of reference. It is composed of three nested models: a regional model of the whole of the Adriatic Sea, an intermediate model covering the Northern Adriatic Sea and two local models of Venice Lagoon, a coarse and a fine grid model. The fundamental components of the deterministic modelling system consist of a hydrodynamic model and a meteorological model. The hydrodynamic model applied, MIKE 21 HD, is the basic component of DHI's generalized 2D mathematical modelling system, MIKE 21. The Danish Meteorological Institute's (DMI) operation HIRLAM (High Resolution Limited Area Model) Forecasting System has been used for computing pressure and wind fields. This GRV-HIRLAM forecasting module has been applied for the deterministic modelling system.

4    The Gaussian Process Model

A Gaussian process model is a data interpolation model which can be used to infer the relationship between a set of input parameters and a set of target, or output, parameters. Given a training set of N pairs of inputs and targets {x,t}, the objective is to predict the value of t(N+1) given x(N+1). The main assumption made in this type of model is that the joint distribution of the augmented targets1 is approximately gaussian, with a covariance matrix S and noise level a1. Then:

            (1)

To predict the value of t(N+1) we need to condition on it. After some alge­braic manipulation this gives2:

        (2)

where   is the N by N training covariance matrix

(with noise) and  is the covariance between training and

test-cases and b=C(x(n+1), x(n+1))+d2,  One of the interesting properties of this model is that it produces a computable expression for the error the model makes. What is left to specify is the covariance matrix ,. This canbe any function that produces a positive definite matrix. In this paper we used:

       (3)

with parameters 0 = (n0, v1, a1,..., aD, w1,..., wd ). These parameters need to be optimized. The log-likelihood of these parameters is given by:

          (4)

 

with partial derivatives:

       (5)

Equations 4 and 5 are used in a conjugate gradient method in order to find the most likely coefficients. As the computational complexity of inverting the covariance matrix is 0(N3), practical applications of this approach are limited to approximately 1000 training points. However, approximate inversion methods are being developed. Mathematically the gaussian process model corresponds with an Artificial Neural Network with an infinite number of hidden nodes and a Gaussian prior on the weights [6]. For more information on this model, the reader is advised to Mackay [5].

5    Results

The utilised data that were also used previously in [4]. The data consisted of a time series of the observed water levels at Punta della Salute in Venice Lagoon, together with the modelled water levels by MIKE21. The same pre-processing of the data was performed as in [2]: water levels are predicted for lead times varying from 1 hour ahead to 12 hours ahead; inputs were time-lagged vectors from {xt,xt-4, …, xt-20}, either from raw observed water levels or taken from a time series of residual errors between MIKE21 and the water levels. 75% of the data was used for training purposes, the results are presented using the remaining 25% of the data.

Figure 2 shows the results of applying the Gaussian process model on the testing data for lead times of 1-12 hours. It can be concluded that a reasonable improvement of the error is achieved, particularly for short lead times. It is also possible to generate forecast on the basis of water levels alone (without utilising MIKE 21). It is interesting to observe that such a forecast is more accurate than deterministic skill only for the lead times of 1 and 2 hours. However, the combined, hybrid model provides the combination of the best of the two worlds. High quality forecast skill, which can also be extended for long lead times.

 

Acknowledgments

This work was in part funded by the Danish Technical Research Council (STVF) under the Talent Project N 9800463 entitled “Data to Knowledge D2K”. More information on the project can be obtained through http://www.d2k.dk

References

[1]    V. Babovic, R. Canizares, H. R. Jensen, and A. Klinting. Artificial neural networks as a routine for updating of numerical models. 
        ASCE Journal of Hydraulic Engineering, to appear, 2001.

[2]    V. Babovic, M. Keijzer, and M. Bundzel. From global to local modelling: A case study in error correction of deterministic models.
         In Proceedings of the Fourth International Conference on Hydroinformatics. Balkema, Rotterdam, 2000.

[3]    A. Gelb. Applied Optimal Estimation. MIT Press, Cambridge, 1974.

[4]    M. Keijzer and V. Babovic. Error correction of a deterministic model in venice lagoon by local linear models. In Proceedings of
        the  “Modelli complessi e metodi computazionali intensivi per la stima e la previsione” conference. Universita Ca' Foscari, Venice,
        September 1999.

[5]    D. Mackay. Introduction to gaussian processes.  Tutorial at icann’97, http://wol.ra.phy.cam.ac.uk/pub/mackay/gpB.ps.gz, 1997.

[6]    R. M. Neal. Bayesian leaning for neural networks. In Lecture Notes in Statistics, number 118. Springer, New York, 1996.

[7]    J. C. R. rd. Validation and intercomparison of different updating procedures for real-time forecasting. Nordic Hydrology, 28:65-
        84, 1997.

[8]    WMO. Simulated real-time intercomparison of hydrological models. WMO operational hydrology report no 38 WMO no 779,
         World Meteorological Organisation, Geneva, 1992.