Dawei Han and Zhiping Yang
Department of Civil Engineering, University of Bristol, BS8 1TR, UK
Tel: 0117 928 9738 E-mail: d.han@bristol.ac.uk
Abstract:
Recently, a new tool from the Artificial
Intelligence field called a Support Vector Machine (SVM) has gained popularity
in the Machine Learning community. It has been applied successfully to
classification tasks such as pattern recognition, OCR and more recently also to
regression and time series. In
recent years, a number of non-linear classification and regression SVMs have
been developed and these have been benchmarked against artificial neural
networks (ANNs). It has been found that the empirical performance of SVMs is
generally as good as the best ANN solutions. Compared with traditional artificial neural networks,
learning in SVMs is very robust from the point of view of the precision of the
computations. This paper describes
the attempt of using Support Vector Machines approach for river flow modelling. Mathematically, SVMs are a range of classification and regression
algorithms that have been formulated from the principles of statistical learning
theory. The nonlinearity and
learning abilities in the SVM technique are useful features that could be
applied to many areas in future hydraulic and hydrological engineering.
Keywords: hydroinformatics, artificial intelligence, river flow, support vector
machines
Modelling of dendritic river systems is complex and at present involves both rainfall-runoff and river flow routing simulations. Over the past decades, a great deal of research has been focussed on rainfall runoff and river flow routing models. As a result, numerous hydraulic/hydrological models have been produced and they play important roles in modern river flow modelling systems. However, many natural river systems tend to be dendritic and so far insufficient research has been conducted to allow the modelling of integrated dendritic river systems.
Traditional river routing models are mainly SISO models (Single input single output, i.e. one input from upstream and one output downstream) such as hydraulic river routing model derived from St. Venant equations and hydrological models like the Muskingum model. (The hydrological routing models usually lack the flexibility to predict flows at multiple ungauged sites along the river since they are not distributed models in comparison with their hydraulic counterparts). Therefore, for practical dendritic river systems, it is necessary to couple rainfall-runoff models with hydraulic routing models so that an integrated modelling system can be built. However, to develop such an integrated model, a large number of sample sites are required to provide sufficient representation of the spatio-temporal hydraulic characteristics of a river system over which the hydraulic characteristics may change (Lane et al. 1999). This process is very time-consuming and financially, as river routing typically involves very long reaches, the cost of obtaining sufficient cross section data is generally prohibitive. Therefore, some researchers have focussed on establishing linear MISO models (Multiple Input Single Output) to tackle this problem. Generally, a MISO model can be expressed in terms of two explicit characters: a parallel character representing the multiple inputs and a cascade character representing the different hydrographs at separate cascade points along the river (Zhang 1998). Several such systems have been established (Jakeman, Littlewood, and Whitehead, 1990, Liang, G.C. and Kachroo, R.K., 1999). Due to restrictions in the model identification, these systems are all linear models and some of the impulse response functions tend to become unstable or negative when operated in real time. Therefore, further research work is required to establish a non-linear modelling methodology to represent dendritic river systems that are highly non-linear and non-stationary.
Over the past a few years, Neural Networks, one of the branches in Artificial Intelligence technology, have gained popularity among the hydrological and hydraulic engineering community (Dibike, Solomatine and Abbott 1999, Bernd, Kleutges and Kroll, 1999, Campolo, Andreussi and Soldati, 1999) and some encouraging results have been achieved. Recently, a new tool from the Artificial Intelligence field called a Support Vector Machine (SVM) has gained popularity in the Machine Learning community (Cristianini, N., Campbell, C. and Taylor, J.S., 1999). It has been applied successfully to classification tasks such as pattern recognition, OCR and more recently also to regression and time series. Mathematically, SVMs are a range of classification and regression algorithms that have been formulated from the principles of statistical learning theory developed by Vapnik (Vapnik, 1995). In recent years, a number of non-linear classification and regression SVMs have been developed and these have been benchmarked against artificial neural networks (ANNs). It has been found that the empirical performance of SVMs is generally as good as the best ANN solutions (Hearst, et al. 1998). It has been hypothesised that this is because there are fewer model parameters to optimise in the SVM approach, reducing the possibility of over fitting the training data and thus increasing the actual performance (Brown, Gunn and Lewis, 1999). Compared with traditional artificial neural networks, learning in SVMs is very robust from the point of view of the precision of the computations (Anguita, Boni and Ridella, 1999).
A major distinction between the two approaches is the training algorithm. Both SVMs and ANNs can be represented as two-layer networks (where the weights are non-linear in the first layer and linear in the second layer). However, while ANNs generally adapt all the parameters (using gradient or clustering-based approaches) SVMs choose the parameters for the first layer to be the training input vectors because this minimises the VC-dimension (Cherkassky and Mulier 1998). The features of SVMs therefore enable this new technology to be applied to river modelling problems.
This section will introduce the basic theory of support vector regression (Vapnik 1995; Smola et al. 1998). It starts with the conventional regression model, and converts the conventional regression concept to the support vector regression algorithm.
Suppose we have training data
,
, our goal is to find a function
that predicts the response in the
best possible way. For the case of linear functions, we have,
(1)
where
· , ·
denotes the dot product between w and x.
The selection is based on a training set of l random independent identically distributed (i.i.d.) observations drawn according to (Vapnik, 1999)
(2)
In order to choose the most available approximation, usually a loss
function is applied to measure the discrepancy,
, between the response y to a given input x and the response
. We will try to find the function
that minimizes a risk functional,
(3)
Given that we do not know the probability
measure
we can only use empirical data for estimating a function
that minimise the
. Then the expected risk functional
is replaced by the empirical functional
.
(4)
If a square deviation between the observed and predicted data is used for the loss function, the results will refer to the conventional regression model with least mean square fitting algorithm.
The value of obtained empirical risks convert
to the minimal possible value of the risk
when the l goes to infinitive.
(5)
However, this implies that if the regression is dealing
with few data in a very high-dimensional spaces, it may lead to over-fitting
problem and thus bad generalisation (Smola et
al., 1996; Gunn, 1998). For instance, the least mean square fitting based
regression model does not contain any means of capacity control (besides
choosing a smaller set of fucntion) which makes it very sensitive to overfitting
and noisy data. Especially for the case the dimension of input space rich and
the training data is sort. Hence one a capacity control term is introduced,
which in SVM results to be
, i.e. the Euclidean norm 2, which leads to the regularised risk
functional ( Smola and Schölkopf, 1998):
(6)
where
is called regulation constant. In other words, we are seeking minimum empirical
loss function as well as small w for all training data
. The small w in this case implies to increase flatness or penalise
over-complexity.
The equation is now which loss functions should be used in equation (6). If the
Vapnik (1995) suggested
an
-Insensitive Loss Function (see Figure 1):
In the regression case, this loss function typically leads to a sparse representation of the decision rule giving significant algorithmic and representational advantages (Cristiani, 2000).
By using this loss function, only the points
outside the region
are penalised. Suppose
is unity, the corresponding equation of (6) is then (Vapnik, 1995):
Minimise
Subject to
Where
and
are slack variables for the optimisation. The constant C >0 determines the trade off between
the flatness of f and the amount up to which deviations larger than
are tolerated.
It should be
noted that if we set
in the case of optimising the
2-norm of the margin slack vector, we will recover the regressor to the general
regression models or ridge regression models. However, these approaches have the
disadvantage of loosing the sparseness of the representation.
With this loss function, it is easy to solve the optimisation problem in its dual formulation. The key idea is to construct a Lagrangange function from both the objective function and corresponding constraints, by introducing a dual set of variables:
Maximise:
Subject to:
The important feature of the solution of this optimisation problem
is that only some of the coefficients
differ from zero. The corresponding vectors
are called Support Vectors.
Therefore the regression is support vector regression.
Since
Therefore,
(7)
The above discussion of the regression is based
only on linear SMV regression. For non-linear regression, the SMV has a great
advantage which can represent the non-linear function in an arbitrary number of
dimension efficiently throng a defined Kernel. The idea is to map the input
variable (x) into a high-dimensional space (called feature space), or hyperplane
Ф(x), while the regression for Ф(x) remains linear. Thus the
procedure is the same as the linear regression model except to change the dot
products
by
.
Now suppose,
where
is a symmetric function called kernel.
By using Equiation (10), the non-linear regression can be described as:
And corresponding equation for SVM non-linear regression algorithm can be written as follows:
Maximise:
Subject to:
The difference to the linear case is that w is no longer explicitly given. However, it is already uniquely
defined in the weak sense by the dot products
(Smola and Schölkopf, 1998).
Accordingly, the optimisation problem for the non-linear regression is to find
the flattest function in feature space, not in input space.
Therefore, the use of the kernels make it possible to map the data implicitly into a feature space. By using sing dual representation, the dimension of the feature space will not affect the computation. The key is to find a kernel function that can be evaluated efficiently. Two commonly used kernel functions for non-linear regressions are:
(1) Radial Basis Function (RBF)
(2) Polynomial regression of order p
The Support Vector Regression has been initially tested on the SISO rainfall-runoff data series, with part of them used as training examples and remaining parts as test. In addition, various kernel functions have been tried in order to define a suitable kernel function for rainfall-runoff simulations. Both linear and non-linear SVM regression algorithms have been tested in UK catchment. The input variables are time series from Radar rainfall measurements and observed runoff.
It has been found that SVM’s performance on calibration was very satisfactory, but its simulation process was less successful. A typical calibration result is illustrated in figure 1 that demonstrates the close fitting of SVM model to the measured flow data. Further research will be carried out to establish the suitable forms of SVMs for simulation processes.
In this paper, we try to illustrate the potential of Support Vector Machines on dendritic river modelling. In particular, we construct a time-series based non-linear model structure to simulate the rainfall runoff relationships. The purpose of the paper is not to lead final assessment for the SV machines, since the support vector machines are developing, their potentials in terms of hydrological/hydraulic modelling will still need further experiment and development. Hopefully, this paper will serve as an introduction of the SV machines, a new powerful tool, into the hydrological/hydraulic modelling area.
Acknowledgement
The project is sponsored by EPSRC grant GR/N09336.
References
Anguita, D., Boni, A. and Ridella, S. 1999,Learning algorithm for nonlinear support vector machines suited for digital VLSI. ELECTRONICS LETTERS, 35(16) pp.1349-1350,
Bernd, T., Kleutges, M. and Kroll, A., 1999, Nonlinear black box modelling - Fuzzy networks versus neural networks. NEURAL COMPUTING & APPLICATIONS, 8(2) pp.151-162
Brown, M., Gunn, S.R. and Lewis, H.G., 1999, Support vector machines for optimal classification and spectral unmixing. ECOLOGICAL MODELLING, 120(2-3) pp.167-179
Campolo, M., Andreussi, P. and Soldati, A., 1999, River flood forecasting with a neural network model. WATER RESOURCES RESEARCH, 35(4) pp.1191-1197.
Cherkassky, V., Mulier, F., 1998. Learning From Data: Concepts, Theory and Methods. Wiley, New York, 442 pp..
Cristianini, N., Campbell, C. and Taylor, J.S., 1999, Dynamically Adapting Kernels in Support Vector Machines. Advances in Neural Information Processing Systems, 11.
Cristianini, N. and Shawe-Taylor J, 2000, An introduction to support vector machines, Cambridge University Press, pp204.
Dibike, Y.B., D. Solomatine and M.B. Abbott, 1999,On the encapsulation of numerical-hydraulic models in artificial neural network, JOURNAL OF HYDRAULIC RESEARCH, 37.
Gunn, S., (1998) Support vector machines for classification and regression, ISIS technical report.
Hearst, M.A., Schölkopf, B., Dumais, S., Osuna, E. and Platt, J., 1998. Trends and controversies support vector machines. IEEE Intell. Syst. 13, pp. 18-28.
Jakeman, A.J. Littlewood, L.G. and Whitehead, P.G., 1990, Computation of the instanteous Unit Hydrograph and Identifiable component flows with applications to two small upland catchments. Journal of Hydrology, 117, pp.275-300.
Lane S.N., K.F. Bradbrook, K.S. Richards, P.A. Biron and A.G. Roy, 1999, The application of computational fluid dynamics to natural river channels:three-dimensional versus two-dimensional approaches, Geomorphology, 29(1-2), August.
Liang, G.C. and Kachroo, R.K., 1999, River flow forecasting. Part 4: Applications of linear techniques for flow routing on large catchment. Journal of Hydrology, 133, pp.99-140.
Smola, A.J. and Schölkopf, B., (1998) A tutorial on support vector regression, NeuroCOLT2 Technical Report Series, NC2-TR-1998-030.
Vapnik, V.N., (1998) Statistical learning theory, John Wiley&Sons, Inc.
Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 188 pp..
Zhang, L. 1998, Dendritic River Modelling. Ph.D. Thesis, Department of Civil Engineering, University of Salford.
Fig. 1 Calibration of SVM