Kim, Hung Soo1 and Yoon, Yong Nam2
1Assistant Prof., Dept. of Civil
Engrg., Sun Moon Univ.,
Asan-Si,
Chung-Nam, 336-840, Korea,
Tel: 82-41-530-2325, Fax: 82-41-530-2839,
E-mail:
sookim@omega.sunmoon.ac.kr
2Prof., Dept. of Civil and Environmental Engrg.,
Korea Univ., Seoul, Korea
Abstract: Modeling the variability of daily streamflows has received less analysis than those of monthly or annual streamflows. Since daily streamflows are affected by individual rainstorms, characterizing their features is more complicated. In this work, we analyze the complex behavior of daily streamflows and search for evidence of deterministic nonlinear dynamics. However, there is no evidence of chaotic behavior in the investigated streamflows. We also investigate the effect of sampling process on chaotic time series. We suggest that the lack of evidence for nonlinear determinism in daily streamflows may be due to such sampling process.
Keywords: sampling, chaos, C-C method, correlation
dimension, attractor
The modeling of daily streamflow time series has
attracted the attention of hydrologists for a long time. The main reason for such interest has
been the need to conduct synthetic simulation studies (data generation) of water
resources systems, and to forecast future flow events one or several days in
advance. For this purpose, several
modeling approaches have been suggested in literature (Salas, 1993; Lettenmaier
and Wood, 1993).
However, in recent years the interest in
alternatives to stochastic models, such as nonlinear stochastic models (Tong,
1990) and chaos, has increased. Rodriguez-Iturbe
et al. (1989) and Puente and Obregon (1996) reported evidence of chaotic
behavior in rainfall recorded with a time interval of 15 seconds. However, Ghilardi and Rosso (1990) discussed some technical issues
involving data size, and they pointed out that it is difficult to discriminate
between chaos and noise when the largest positive
Lyapunov exponent is small. Sharifi
et al. (1990) analyzed three rainfall data sets consisting of the times at which
rain gages signalled the collection of 0.01 mm of rain at a given location, and
they obtained correlation dimensions in the range 3.3 - 3.8. Wilcox et al.
(1991) analyzed daily snowmelt runoff data, but they could not find a saturated
value of the correlation dimension. Sangoyomi
et al. (1996) used the nearest-neighbor method to obtain a dimension of 3.44 for
biweekly data on the volume of the Great Salt Lake (GSL). Jeong and Rao (1996) analyzed the tree ring series to investigate
their chaotic behaviors but they could not find the saturated correlation
dimension.
The first step in
the analysis of a chaotic time series is the embedding of the scalar time series
into an m-dimensional space. This is done using the method of delays
introduced by Packard et al. (1980) and Takens (1981), which has the advantage
of distributing the noise equally among the m components. A scalar time series
, i = 1, 2, ...,
N, is embedded into m-dimensional
space by constructing the vectors
,
(1)
where t
is the index lag, and m is embedding
dimension, both of which must be chosen appropriately. If the sampling time is
, then the delay time is
, and the delay time window is
, which is the entire time spanned by the components of each
vector.
After the attractor has been reconstructed using Eq. (1), quantitative properties of the chaotic system can be determined. The correlation dimension introduced by Grassberger and Procaccia (1983) is widely used in many fields for the quantitative characterization of strange attractors. The correlation integral for the embedded time series is the following function:
,
,
(2)
where
, if
,
, if
,
N is the size of the data set, M
= N - (m - 1)t is the number of
embedded points in m-dimensional space,
and
denotes the sup-norm.
measures the fraction
of the pairs of points
, i = 1,2, ..., M, whose sup-norm separation is no greater than r.
If the limit of
as
exists for each
r, we write the fraction of all state vector points that are within
r of each other as
=
, and the correlation dimension is defined as
=
. In practice,
N remains finite, and, thus, r
cannot go to zero; instead, we look for a linear region of slope
in the plot of
vs. log
r.
The components of the reconstructed state variables
need to be independent,
so the quality of the reconstructed attractor depends on the choice of the index
lag t. If the delay time
is too small,
the reconstructed attractor is compressed along the identity line, and this
is called redundance. If
is too large,
the attractor dynamics may become casually disconnected, which is called irrelevance, and which may cause the attractor to appear much more
complex than it really is (Casdagli et al., 1991).
If the sampling time
is considerably
less than the delay time
, then successive points
and
of the embedded
time series will generally be close, and this will lead to artificial correlations
in Eq. (2) (Grassberger, 1990). While it is not possible to make an unambiguous
identification of such artificially correlated points (Rosenstein et al., 1994;
Grassberger, 1990), a practical way to minimize this effect is to remove from
Eq. (2) the contributions of all pairs of points
and
with |j - i|
< t, where t is the index
lag (Grassberger, 1990).
Many researchers choose to use a fixed delay
time
as the embedding
dimension m is increased. Some have suggested obtaining
from the autocorrelation
function (ACF), which is practically convenient, since it contains information
about both periodic trends and information dissipation. This
ACF method has the advantage of computational efficiency, but it has been found
that the value obtained for t may
be incorrect. Since the relationship
between the spatial distribution of a reconstructed attractor and the temporal
autocorrelation of a single-variable time series is not well-defined, there
are inconsistencies inherent in this approach (Fraser and Swinney, 1986; Martinerie
et al., 1992). Fraser and Swinney
(1986) instead suggested choosing the index lag t as the first local minimum of the mutual
information (MI). It is known that
this is the most comprehensive method, but it has the drawbacks that it requires
a large amount of data, and it is cumbersome computationally (Tsonis, 1992).
We mentioned the alternative of fixing the delay time window
, rather than the delay time
, but the estimation of
is less well developed. Martinerie et al. (1992) examined the
delay time window and compared it with the delay times estimated using the ACF
and the MI. They concluded that
could not be estimated using either
of these two methods. Basically,
is the optimal time for independence of the data, but these methods estimate
the first locally optimal time, which is
. From this distinction between
and
, we developed a technique, called the C-C method, that can estimate
both
and
(Kim et al., 1999). This method is discussed in the following
subsection. We also showed that,
for small data sets, as the embedding dimension m is increased, the correlation dimension
converges more
rapidly if
is held fixed than if
is held fixed (Kim et al., 1998).
Brock et al. (1991, 1996) studied the BDS statistic, which is based on the correlation integral, to test the null hypothesis that the data are independently and identically distributed (iid). This test has been particularly useful for chaotic systems and nonlinear stochastic systems. Under the iid hypothesis, the BDS statistic for m > 1 is defined as
BDS
,
(3)
and this converges to a standard normal distribution
as
Note that the asymptotic
variance
(m, M, r)
can be estimated as
,
(4)
,
(5)
.
(6)
The present study is concerned with
the properties of the quantity
=
-
. We refer to a comment by Brock
et al. (1991): “ If the stochastic process {
} is iid, it will be shown that
for all m and r. That is to say, the
correlation integral behaves much like the characteristic function of a serial
string in that the correlation integral of a serial string of independent random
variables is the product of the correlation integrals of component substrings.” This led us to interpret the statistic
as the serial correlation of a
nonlinear time series. Therefore,
it can be regarded as a dimensionless measure of nonlinear dependence. For fixed m, N,
and r, the plot of
vs. t is a nonlinear analog of the plot of
the autocorrelation function vs. t.
In order to study the nonlinear dependence Kim et al.(1999) derived the following
equations:
=
, m = 2, 3, ... (7)
.
(8)
Brock et al. (1991) suggested that
m should be between 2 and 5, and r should be between
and 2
. In addition, the asymptotic distributions
were well approximated by finite time series when N
500. Thus, we select four values of r in the range
2
,
=(0.5)
,
=(1.0)
,
=(1.5)
, and
=(2.0)
, as representative values.
We then define the following averages of the quantities given by Eqs.
(7) and (8):
(9)
,
(10)
and we look for the first zero crossing of
or the first local minimum of
to find the first locally optimal
time for independence of the data, which gives the delay time
. The optimal time is the index
lag t for which
and
are both closest to zero. If we assign equal importance to these
two quantities, then we may simply look for the minimum of the quantity
,
(11)
and
this optimal time gives the delay time window
.
In this work, we search for evidence of chaotic
behavior in the daily streamflows of two different stations: St. Marys
river near Macclenny
and Ocklawaha river near Conner, Florida, USA. The two streamflow records consist of 67 years (1927-1993)
with 24,472 daily data for St. Marys river near Macclenny and 11 years (1978-1988) with 4,018
daily data for Ocklawaha
river near Conner. The time
series plots for these streamflows are shown in Fig. 1.
For the streamflows
of St. Marys and Ocklawaha rivers,
corresponds to
the first local minimum
points of
, as indicated by the arrow in Fig. 2
for the case of St. Marys river and it can be determined
similarly for Ocklawaha
river.
on both streamflows are estimated
as 39
=39 days and 33
=33 days. Also, as shown in Fig.
2, we found that the minimum of
occurs for t = 152, which gives
= 152 days for St. Marys river and
= 187 for Ocklawaha river.
The correlation integral analysis is applied to the two daily streamflows
using the values of
and
given in a previous section. Fig. 3 shows the plots
of log[C(r)] versus log(r) for the reconstructed attractors for
St. Marys river for embedding dimensions m = 2, 4, ..., 20 using
= 39 days. The correlation dimension slowly increases up to m
= 20 and there is no evidence for chaotic behavior. In Fig. 4, we repeat this analysis using the delay time window
= 152 days. Since the
index lag is then determined by (m-1)t=152, it is necessary to round off t to the nearest integer. The actual values
used for (m, t) are as follows: (2, 152),
(4, 51), (6, 30), (8, 22), (10 ,15), (12, 14), (14, 12), (16, 10), (18, 9),
and (20, 8). In this case, also, we
can not find the correlation dimension
which describes the chaotic behavior. A similar analysis for Ocklawaha river is performed but,
in both cases of
and
, the correlation
fails to saturate as shown in Fig. 5, so there is also no evidence of nonlinear
determinism.
Sampling is an important consideration in the analysis of hydrologic time series. Hydrological processes such as
precipitation and streamflow are generally continuous in time. While continuous time series are
measured at some gaging stations, most recorded time series are discrete,
providing
instantaneous sampling at either regular or irregular time intervals (such as
instantaneous daily observations of water levels in streams). For a chaotic time series, such sampling
can eliminate the evidence of nonlinear determinism, as we show in this section.
As an example, we examine the effect of sampling
on a time series of the variable x
from the Lorenz equations (Abarbanel et al., 1993):
![]()
(12)
![]()
which is generated using the parameter values a = 16.0, b = 4.0, and c = 45.92 and a time step of
= 0.01. The sampled time series
will consist of 15,000 values. We
generate sampled time series by keeping only the values at every nth time step, with n=2, 10, 50, and 100.
For each of the four sampled time series,
we use the C-C method to compute the delay time
, and the results are given in Table 1. We then construct embeddings
of the attractors using these delay times, and the results are shown in Fig.
6. The
attractors degrade quickly as n increases,
and, by n = 10, they are completely
unrecognizable. We also compute the correlation dimensions, and these results
are shown in Fig.7. In each case,
we draw a horizontal line at the dimension
= 2.05 obtained from the original time series. Note that, for the larger values of n, it is not always possible to find a
linear region in the plot of log [C(r)] versus log (r) for large values of the embedding dimension
m. The convergence of the correlation dimension
degrades slowly as n increases, and,
by n = 100, this convergence is lost.
Thus, we see that sampling can eliminate the evidence of nonlinear determinism from a chaotic
time series.
Table 1 The delay times and the delay time
windows for sampled Lorenz series.
|
Time Interval |
|
|
Time Interval |
|
|
|
2 |
5 |
134 |
50 |
2 |
26 |
|
10 |
2 |
91 |
100 |
2 |
24 |
Since, as the time interval n is increased, the time series is more
and more randomized by the sampling process, to
find the best optimal point of
for
is
difficult. Thus, we have not used
the delay time window for the estimation of the correlation dimension in sampling
process.
The Lorenz equations were solved numerically
using a time step of
= 0.01, and the resulting values of the variable x were used as a scalar time series. Applying the C-C
method to this time series yields the delay time
=10
=0.1 and the delay time window
=100
=1.0. We then investigated
the effect of sampling on this time series by keeping only the
values at every nth time step. This is equivalent to making sampled
measurement with a regular time interval
=n
. As this time interval
increases, the successive
measurements will eventually become irrelevant, and any nonlinear determinism
will be lost. When this happens,
the data will appear to be stochastic, rather than chaotic.
Since the delay time
is a basic correlation time, then irrelevance should begin to
become apparent when
~
. However,
is the maximum time for correlations, so complete stochasticity
should not occur until
~
. For the Lorenz system, the conditions
~
and
~
become n~10 and n~100, respectively. In Fig. 6, we do indeed
see that the reconstructed attractors begins to lose its structure when n~10,
and, that this structure is completely lost when n~100.
Stochastic models are often used for the
modeling of daily streamflows. However,
if the streamflow data shows evidence of
nonlinear determinism, then a chaotic model should be more appropriate. Some previous studies have found
evidence of nonlinear determinism, but some studies have not found such evidence, may be, due to sampling process. Therefore, one should not expect to
find evidence of nonlinear determinism in this case, and, stochastic models
should work well.
Abarbanel, H.D.I., Brown, R., Sidorowich,
J.J. and Tsimring, L.S. (1993). “Analysis
of observed chaotic data in physical systems.” Rev. Mod. Phys., Vol. 65, pp. 1331-1392.
Brock, W.A., Dechert, W.D., Scheinkman, J.A., and LeBaron, B. (1996). “A test
for independence based on the correlation dimension.” Econ. Rev., Vol. 15, pp. 197-235.
Brock, W.A., Hsieh, D.A., and Lebaron, B. (1991). Nonlinear Dynamics, Chaos, and
Instability: Statistical Theory and Economic Evidence, The MIT Press.
Casdagli, M., Eubank, S., Farmer, J.D., and
Gibson, J. (1991). “State space reconstruction in the
presence of noise.” Physica D, Vol. 51, pp. 52-98.
Fraser, A.M. and Swinney, H.L. (1986). “Independent coordinates for strange attractors from mutual
information.” Phys. Rev. A, Vol. 55, pp. 1134-1140.
Ghilardi, P. and Rosso, R. (1990). “Comment on Chaos in rainfall by I. Rodriguez-Iturbe et al.” Water Resources Research, Vol. 26, No. 8, pp. 1837-1839.
Grassberger, P. (1990). “An
optimized box-assisted algorithm for fractal dimensions.” Phys. Lett. A, Vol. 148,
No. 1,2, pp. 63-68.
Grassberger, P. and Procaccia, I. (1983). “Measuring the strangeness of strange attractors.”
Physica D, Vol. 7, pp. 153-180.
Jeong, G.D. and Rao, A.R. (1996). “Chaos
characteristics of tree ring series.” J. hydrology, Vol. 182,
pp. 239-257.
Kim, H.S., Eykholt, R., and Salas, J.D. (1999). “Nonlinear dynamics, delay times, and embedding windows.”
Physica D, Vol. 127, pp. 48-60.
Lettenmaier, D.L. and Wood, E.F. (1992). “Hydrologic forecasting.” in Handbook of
Hydrology edited by Maidment, D.R.
Martinerie, J.M., Albano, A.M. Mees,
A.I., and Rapp, P.E. (1992). “Mutual information, strange
attractors, and the optimal estimation of dimension.” Phys. Rev. A, Vol. 45, pp. 7058-7064.
Packard, N.H., Crutchfield, J.P.,
Farmer, J.D., and Shaw, R.S. (1980). “Geometry from a time series.” Phys.
Rev. Lett., Vol. 45, No. 9, pp. 712-716.
Puente, C.E. and Obregon, N. (1996). “A deterministic geometric representation of temporal rainfall :
results for a storm in Boston.” Water Resources Research., Vol. 32,
No. 9, pp. 2825-2839.
Rodriguez-Iturbe, I., Power, B.F.D.,
Sharifi, M.B., and Georgakakos, K.P. (1989). “Chaos
in rainfall.” Water Resources Research, Vol. 25, No. 7, pp. 1667-1675.
Rosenstein, M.T., Collins, J.J., and De Luca, C.J. (1994). “Reconstruction expansion as a geometry-based framework for choosing proper delay times.” Physica D, Vol. 73, pp. 82-98.
Salas, J.D. (1992). “Analysis
and modelling of hydrologic time series.” in Handbook of
Hydrology edited by Maidment, D.R.
Sangoyomi, T.B., Lall, U., and Abarbanel, H.D.I. (1996). “Nonlinear dynamics of the Great
Salt Lake: dimension estimation.” Water Resources Research, Vol. 32, No. 1, pp. 149-159.
Sharifi, M.B., Georgakakos, K.P., and
Rodriguez-Iturbe, I.
(1990). “Evidence of deterministic chaos in the
pulse of storm rainfall.” J. Atmos. Sci., Vol. 47, No. 7, pp. 888-893.
Takens, F. (1981). “Detecting
strange attractors in turbulence.” in Dynamical
Systems and Turbulence edited by Rand, D.A. and Young, L.S., pp. 336-381,
Springer-Verlag.
Tong, H.(1990). Non-Linear
Time Series: A Dynamical System Approach, Clarendon Press.
Tsonis, A.A.(1992). Chaos: From
Theory to Applications, Plenum Press.
Wilcox, B.P., Seyfried, M.S., and
Matison, T.H. (1991). “Searching for chaotic dynamics in
snowmelt runoff.” Water Resources Research, Vol. 27, No. 6, pp. 1005-1010.

(a) (b)
Fig.
1 Daily streamflows at (a) Marys and (b) Ocklawaha
rivers, FL, USA

(a) S(m,r,t)
(b)
,
,
,
Fig.
2 Estimations of
and
for daily streamflow at Marys river, FL, USA

(a) Correlation integral. (b) Correlation dimension.
Fig. 3 Estimation of correlation dimension using the delay time
at St. Marys river near Macclenny, Florida, USA.

(a) Correlation integral. (b) Correlation dimension.
Fig. 4 Estimation of correlation dimension
using the delay time window
at St. Marys river near Macclenny, Florida, USA.

(a) Use of the delay time. (b) Use of the delay time window.
Fig. 5 Correlation dimension at Ocklawaha river near Conner, Florida, USA.

Fig. 6
Attractors of sampled time series from Lorenz system.


Fig. 7 Correlation integral and correlation dimension for sampled time series from Lorenz system.