RIVER FLOW MODELLING USING SUPPORT VECTOR MACHINES

 

 

Dawei Han and Zhiping Yang

Department of Civil Engineering, University of Bristol, BS8 1TR, UK

Tel: 0117 928 9738 E-mail: d.han@bristol.ac.uk

 

 

Abstract: Recently, a new tool from the Artificial Intelligence field called a Support Vector Machine (SVM) has gained popularity in the Machine Learning community. It has been applied successfully to classification tasks such as pattern recognition, OCR and more recently also to regression and time series.  In recent years, a number of non-linear classification and regression SVMs have been developed and these have been benchmarked against artificial neural networks (ANNs). It has been found that the empirical performance of SVMs is generally as good as the best ANN solutions.  Compared with traditional artificial neural networks, learning in SVMs is very robust from the point of view of the precision of the computations.  This paper describes the attempt of using Support Vector Machines approach for river flow modelling.  Mathematically, SVMs are a range of classification and regression algorithms that have been formulated from the principles of statistical learning theory.  The nonlinearity and learning abilities in the SVM technique are useful features that could be applied to many areas in future hydraulic and hydrological engineering.

 

Keywords: hydroinformatics, artificial intelligence, river flow, support vector machines

1  INTRODUCTION

Modelling of dendritic river systems is complex and at present involves both rainfall-runoff and river flow routing simulations. Over the past decades, a great deal of research has been focussed on rainfall runoff and river flow routing models. As a result, numerous hydraulic/hydrological models have been produced and they play important roles in modern river flow modelling systems. However, many natural river systems tend to be dendritic and so far insufficient research has been conducted to allow the modelling of integrated dendritic river systems.

Traditional river routing models are mainly SISO models (Single input single output, i.e. one input from upstream and one output downstream) such as hydraulic river routing model derived from St. Venant equations and hydrological models like the Muskingum model. (The hydrological routing models usually lack the flexibility to predict flows at multiple ungauged sites along the river since they are not distributed models in comparison with their hydraulic counterparts). Therefore, for practical dendritic river systems, it is necessary to couple rainfall-runoff models with hydraulic routing models so that an integrated modelling system can be built. However, to develop such an integrated model, a large number of sample sites are required to provide sufficient representation of the spatio-temporal hydraulic characteristics of a river system over which the hydraulic characteristics may change (Lane et al. 1999). This process is very time-consuming and financially, as river routing typically involves very long reaches, the cost of obtaining sufficient cross section data is generally prohibitive. Therefore, some researchers have focussed on establishing linear MISO models (Multiple Input Single Output) to tackle this problem. Generally, a MISO model can be expressed in terms of two explicit characters: a parallel character representing the multiple inputs and a cascade character representing the different hydrographs at separate cascade points along the river (Zhang 1998). Several such systems have been established (Jakeman, Littlewood, and Whitehead, 1990, Liang, G.C. and Kachroo, R.K., 1999). Due to restrictions in the model identification, these systems are all linear models and some of the impulse response functions tend to become unstable or negative when operated in real time. Therefore, further research work is required to establish a non-linear modelling methodology to represent dendritic river systems that are highly non-linear and non-stationary.

Over the past a few years, Neural Networks, one of the branches in Artificial Intelligence technology, have gained popularity among the hydrological and hydraulic engineering community (Dibike, Solomatine and Abbott 1999, Bernd, Kleutges and Kroll, 1999, Campolo, Andreussi and Soldati, 1999) and some encouraging results have been achieved. Recently, a new tool from the Artificial Intelligence field called a Support Vector Machine (SVM) has gained popularity in the Machine Learning community (Cristianini, N., Campbell, C.  and Taylor, J.S., 1999). It has been applied successfully to classification tasks such as pattern recognition, OCR and more recently also to regression and time series. Mathematically, SVMs are a range of classification and regression algorithms that have been formulated from the principles of statistical learning theory developed by Vapnik (Vapnik, 1995). In recent years, a number of non-linear classification and regression SVMs have been developed and these have been benchmarked against artificial neural networks (ANNs). It has been found that the empirical performance of SVMs is generally as good as the best ANN solutions (Hearst, et al. 1998). It has been hypothesised that this is because there are fewer model parameters to optimise in the SVM approach, reducing the possibility of over fitting the training data and thus increasing the actual performance (Brown, Gunn and Lewis, 1999). Compared with traditional artificial neural networks, learning in SVMs is very robust from the point of view of the precision of the computations (Anguita, Boni and Ridella, 1999).

A major distinction between the two approaches is the training algorithm. Both SVMs and ANNs can be represented as two-layer networks (where the weights are non-linear in the first layer and linear in the second layer). However, while ANNs generally adapt all the parameters (using gradient or clustering-based approaches) SVMs choose the parameters for the first layer to be the training input vectors because this minimises the VC-dimension (Cherkassky and Mulier 1998). The features of SVMs therefore enable this new technology to be applied to river modelling problems.

2  BASIC THEORY OF SUPPORT VECTOR REGRESSION ALGORITHM

This section will introduce the basic theory of support vector regression (Vapnik 1995; Smola et al. 1998). It starts with the conventional regression model, and converts the conventional regression concept to the support vector regression algorithm.

2.1  Linear support vector regression

Suppose we have training data , , our goal is to find a function  that predicts the response in the best possible way. For the case of linear functions, we have,

                                                            (1)

where · , · denotes the dot product between w and x.

The selection is based on a training set of l random independent identically distributed (i.i.d.) observations drawn according to (Vapnik, 1999)

                                                   (2)

In order to choose the most available approximation, usually a loss function is applied to measure the discrepancy, , between the response y to a given input x and the response . We will try to find the function that minimizes a risk functional,

                                                              (3)

Given that we do not know the probability measure we can only use empirical data for estimating a function that minimise the . Then the expected risk functional is replaced by the empirical functional .

                                                               (4)

If a square deviation between the observed and predicted data is used for the loss function, the results will refer to the conventional regression model with least mean square fitting algorithm.

The value of obtained empirical risks convert to the minimal possible value of the risk when the l goes to infinitive.

                                                                     (5)

However, this implies that if the regression is dealing with few data in a very high-dimensional spaces, it may lead to over-fitting problem and thus bad generalisation (Smola et al., 1996; Gunn, 1998). For instance, the least mean square fitting based regression model does not contain any means of capacity control (besides choosing a smaller set of fucntion) which makes it very sensitive to overfitting and noisy data. Especially for the case the dimension of input space rich and the training data is sort. Hence one a capacity control term is introduced, which in SVM results to be , i.e. the Euclidean norm 2, which leads to the regularised risk functional ( Smola and Schölkopf, 1998):

                                                               (6)

where is called regulation constant. In other words, we are seeking minimum empirical loss function as well as small w for all training data . The small w in this case implies to increase flatness or penalise over-complexity.

The equation is now which loss functions should be used in equation (6). If the

Vapnik (1995) suggested an -Insensitive Loss Function (see Figure 1):

In the regression case, this loss function typically leads to a sparse representation of the decision rule giving significant algorithmic and representational advantages (Cristiani, 2000).

By using this loss function, only the points outside the region  are penalised. Suppose is unity, the corresponding equation of (6) is then (Vapnik, 1995):

Minimise                   

Subject to                   

Where and are slack variables for the optimisation. The constant C >0 determines the trade off between the flatness of f and the amount up to which deviations larger than  are tolerated.

It should be noted that if we set  in the case of optimising the 2-norm of the margin slack vector, we will recover the regressor to the general regression models or ridge regression models. However, these approaches have the disadvantage of loosing the sparseness of the representation. 

With this loss function, it is easy to solve the optimisation problem in its dual formulation. The key idea is to construct a Lagrangange function from both the objective function and corresponding constraints, by introducing a dual set of variables:

Maximise:

Subject to:

The important feature of the solution of this optimisation problem is that only some of the coefficients differ from zero. The corresponding vectors  are called Support Vectors. Therefore the regression is support vector regression.

Since   

Therefore,    

                                                                    (7)

2.2  Non-linear support vector regression

The above discussion of the regression is based only on linear SMV regression. For non-linear regression, the SMV has a great advantage which can represent the non-linear function in an arbitrary number of dimension efficiently throng a defined Kernel. The idea is to map the input variable (x) into a high-dimensional space (called feature space), or hyperplane Ф(x), while the regression for Ф(x) remains linear. Thus the procedure is the same as the linear regression model except to change the dot products  by .

Now suppose,   

where is a symmetric function called kernel.

By using Equiation (10), the non-linear regression can be described as:

And corresponding equation for SVM non-linear regression algorithm can be written as follows:

Maximise:

Subject to:                                  

The difference to the linear case is that w is no longer explicitly given. However, it is already uniquely defined in the weak sense by the dot products  (Smola and Schölkopf, 1998). Accordingly, the optimisation problem for the non-linear regression is to find the flattest function in feature space, not in input space.

Therefore, the use of the kernels make it possible to map the data implicitly into a feature space. By using sing dual representation, the dimension of the feature space will not affect the computation. The key is to find a kernel function that can be evaluated efficiently. Two commonly used kernel functions for non-linear regressions are:

(1) Radial Basis Function (RBF)                  

(2) Polynomial regression of order p        

3  APPLICATION OF SVM TO A SISO RIVER SYSTEM

The Support Vector Regression has been initially tested on the SISO rainfall-runoff data series, with part of them used as training examples and remaining parts as test. In addition, various kernel functions have been tried in order to define a suitable kernel function for rainfall-runoff simulations.  Both linear and non-linear SVM regression algorithms have been tested in UK catchment. The input variables are time series from Radar rainfall measurements and observed runoff.

It has been found that SVM’s performance on calibration was very satisfactory, but its simulation process was less successful.  A typical calibration result is illustrated in figure 1 that demonstrates the close fitting of SVM model to the measured flow data.  Further research will be carried out to establish the suitable forms of SVMs for simulation processes.

4  CONCLUSION

In this paper, we try to illustrate the potential of Support Vector Machines on dendritic river modelling. In particular, we construct a time-series based non-linear model structure to simulate the rainfall runoff relationships. The purpose of the paper is not to lead final assessment for the SV machines, since the support vector machines are developing, their potentials in terms of hydrological/hydraulic modelling will still need further experiment and development.  Hopefully, this paper will serve as an introduction of the SV machines, a new powerful tool, into the hydrological/hydraulic modelling area.

Acknowledgement

The project is sponsored by EPSRC grant GR/N09336.

References

Anguita, D., Boni, A. and Ridella, S. 1999,Learning algorithm for nonlinear support vector machines suited for digital VLSI. ELECTRONICS LETTERS, 35(16) pp.1349-1350,

Bernd, T., Kleutges, M. and Kroll, A., 1999, Nonlinear black box modelling - Fuzzy networks versus neural networks. NEURAL COMPUTING & APPLICATIONS, 8(2) pp.151-162

Brown, M., Gunn, S.R. and Lewis, H.G., 1999, Support vector machines for optimal classification and spectral unmixing. ECOLOGICAL MODELLING, 120(2-3) pp.167-179

Campolo, M., Andreussi, P. and Soldati, A., 1999, River flood forecasting with a neural network model. WATER RESOURCES RESEARCH, 35(4) pp.1191-1197.

Cherkassky, V., Mulier, F., 1998. Learning From Data: Concepts, Theory and Methods. Wiley, New York, 442 pp..

Cristianini, N., Campbell, C.  and Taylor, J.S., 1999, Dynamically Adapting Kernels in Support Vector Machines. Advances in Neural Information Processing Systems, 11.

Cristianini, N. and Shawe-Taylor J, 2000, An introduction to support vector machines, Cambridge University Press, pp204.

Dibike, Y.B., D. Solomatine and M.B. Abbott, 1999,On the encapsulation of numerical-hydraulic models in artificial neural network, JOURNAL OF HYDRAULIC RESEARCH, 37.

Gunn, S., (1998) Support vector machines for classification and regression, ISIS technical report.

Hearst, M.A., Schölkopf, B., Dumais, S., Osuna, E. and Platt, J., 1998. Trends and controversies support vector machines. IEEE Intell. Syst. 13, pp. 18-28.

Jakeman, A.J. Littlewood, L.G. and Whitehead, P.G., 1990, Computation of the instanteous Unit Hydrograph and Identifiable component flows with applications to two small upland catchments. Journal of Hydrology, 117, pp.275-300.

Lane S.N., K.F. Bradbrook, K.S. Richards, P.A. Biron and A.G. Roy, 1999, The application of computational fluid dynamics to natural river channels:three-dimensional versus two-dimensional approaches, Geomorphology, 29(1-2), August.

Liang, G.C. and Kachroo, R.K., 1999, River flow forecasting. Part 4: Applications of linear techniques for flow routing on large catchment. Journal of Hydrology, 133, pp.99-140.

Smola, A.J. and Schölkopf, B., (1998) A tutorial on support vector regression, NeuroCOLT2 Technical Report Series, NC2-TR-1998-030.

Vapnik, V.N., (1998) Statistical learning theory, John Wiley&Sons, Inc.

Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 188 pp..

Zhang, L. 1998, Dendritic River Modelling. Ph.D. Thesis, Department of Civil Engineering, University of Salford.

Fig. 1  Calibration of SVM