CHAOS AND SAMPLED DAILY STREAMFLOWS

 

 

Kim, Hung Soo1 and Yoon, Yong Nam2

1Assistant Prof., Dept. of Civil Engrg., Sun Moon Univ.,

Asan-Si, Chung-Nam, 336-840, Korea,

Tel: 82-41-530-2325, Fax: 82-41-530-2839,

 E-mail: sookim@omega.sunmoon.ac.kr

2Prof., Dept. of Civil and Environmental Engrg., Korea Univ., Seoul, Korea

 

 

Abstract: Modeling the variability of daily streamflows has received less analysis than those of monthly or annual streamflows.  Since daily streamflows are affected by individual rainstorms, characterizing their features is more complicated.  In this work, we analyze the complex behavior of daily streamflows and search for evidence of deterministic nonlinear dynamics.  However, there is no evidence of chaotic behavior in the investigated streamflows.  We also investigate the effect of sampling process on chaotic time series.  We suggest that the lack of evidence for nonlinear determinism in daily streamflows may be due to such sampling process.

 

Keywords: sampling, chaos, C-C method, correlation dimension, attractor

1    INTRODUCTION

The modeling of daily streamflow time series has attracted the attention of hydrologists for a long time.  The main reason for such interest has been the need to conduct synthetic simulation studies (data generation) of water resources systems, and to forecast future flow events one or several days in advance.  For this purpose, several modeling approaches have been suggested in literature (Salas, 1993; Lettenmaier and Wood, 1993).

However, in recent years the interest in alternatives to stochastic models, such as nonlinear stochastic models (Tong, 1990) and chaos, has increased.  Rodriguez-Iturbe et al. (1989) and Puente and Obregon (1996) reported evidence of chaotic behavior in rainfall recorded with a time interval of 15 seconds.  However, Ghilardi and Rosso (1990) discussed some technical issues involving data size, and they pointed out that it is difficult to discriminate between chaos and noise when the largest  positive Lyapunov exponent is small.  Sharifi et al. (1990) analyzed three rainfall data sets consisting of the times at which rain gages signalled the collection of 0.01 mm of rain at a given location, and they obtained correlation dimensions in the range 3.3 - 3.8. Wilcox et al. (1991) analyzed daily snowmelt runoff data, but they could not find a saturated value of the correlation dimension.  Sangoyomi et al. (1996) used the nearest-neighbor method to obtain a dimension of 3.44 for biweekly data on the volume of the Great Salt Lake (GSL).  Jeong and Rao (1996) analyzed the tree ring series to investigate their chaotic behaviors but they could not find the saturated correlation dimension.

2    IDENTIFYING CHAOTIC BEHAVIOR

The first step in the analysis of a chaotic time series is the embedding of the scalar time series into an m-dimensional space.  This is done using the method of delays introduced by Packard et al. (1980) and Takens (1981), which has the advantage of distributing the noise equally among the m components.  A scalar time series , i = 1, 2, ..., N, is embedded into m-dimensional space by constructing the vectors

                     ,                                                         (1)

where t is the index lag, and m is embedding dimension, both of which must be chosen appropriately.  If the sampling time is , then the delay time is , and the delay time window is , which is the entire time spanned by the components of each vector.

2.1    Correlation integral and correlation dimension

After the attractor has been reconstructed using Eq. (1), quantitative properties of the chaotic system can be determined.  The correlation dimension introduced by Grassberger and Procaccia (1983) is widely used in many fields for the quantitative characterization of strange attractors. The correlation integral for the embedded time series is the following function:

              ,     ,                                                              (2)

where                         ,      if ,

                            ,      if ,

N is the size of the data set, M = N - (m - 1)t is the number of embedded points in m-dimensional space, and  denotes the sup-norm.  measures the fraction of the pairs of points , i = 1,2, ..., M, whose sup-norm separation is no greater than r.  If the limit of  as  exists for each r, we write the fraction of all state vector points that are within r of each other as  = , and the correlation dimension is defined as  = .  In practice, N remains finite, and, thus, r cannot go to zero; instead, we look for a linear region of slope  in the plot of  vs.  log r.

2.2    The parameters M and T

The components of the reconstructed state variables  need to be independent, so the quality of the reconstructed attractor depends on the choice of the index lag t.  If the delay time  is too small, the reconstructed attractor is compressed along the identity line, and this is called redundance.  If  is too large, the attractor dynamics may become casually disconnected, which is called irrelevance, and which may cause the attractor to appear much more complex than it really is (Casdagli et al., 1991).

If the sampling time  is considerably less than the delay time , then successive points  and  of the embedded time series will generally be close, and this will lead to artificial correlations in Eq. (2) (Grassberger, 1990). While it is not possible to make an unambiguous identification of such artificially correlated points (Rosenstein et al., 1994; Grassberger, 1990), a practical way to minimize this effect is to remove from Eq. (2) the contributions of all pairs of points  and  with |j - i| < t, where t is the index lag (Grassberger, 1990).

2.3    Choosing the delay time and delay time window

Many researchers choose to use a fixed delay time  as the embedding dimension m is increased.  Some have suggested obtaining  from the autocorrelation function (ACF), which is practically convenient, since it contains information about both periodic trends and information dissipation. This ACF method has the advantage of computational efficiency, but it has been found that the value obtained for t may be incorrect.  Since the relationship between the spatial distribution of a reconstructed attractor and the temporal autocorrelation of a single-variable time series is not well-defined, there are inconsistencies inherent in this approach (Fraser and Swinney, 1986; Martinerie et al., 1992).  Fraser and Swinney (1986) instead suggested choosing the index lag t as the first local minimum of the mutual information (MI).  It is known that this is the most comprehensive method, but it has the drawbacks that it requires a large amount of data, and it is cumbersome computationally (Tsonis, 1992).

We mentioned the alternative of fixing the delay time window , rather than the delay time , but the estimation of  is less well developed.  Martinerie et al. (1992) examined the delay time window and compared it with the delay times estimated using the ACF and the MI.  They concluded that  could not be estimated using either of these two methods.  Basically, is the optimal time for independence of the data, but these methods estimate the first locally optimal time, which is .  From this distinction between  and , we developed a technique, called the C-C method, that can estimate both  and  (Kim et al., 1999).  This method is discussed in the following subsection.  We also showed that, for small data sets, as the embedding dimension m is increased, the correlation dimension  converges more rapidly if  is held fixed than if  is held fixed (Kim et al., 1998).

2.4    The C-C method

Brock et al. (1991, 1996) studied the BDS statistic, which is based on the correlation integral, to test the null hypothesis that the data are independently and identically distributed (iid). This test has been particularly useful for chaotic systems and nonlinear stochastic systems.  Under the iid hypothesis, the BDS statistic for m > 1 is defined as

              BDS ,                 (3)

and this converges to a standard normal distribution as  Note that the asymptotic variance (m, M, r) can be estimated as

                             

         ,                (4)

              ,                           (5)

.                (6)

The present study is concerned with the properties of the quantity  = - .  We refer to a comment by Brock et al. (1991): “ If the stochastic process { } is iid, it will be shown that  for all m and r.  That is to say, the correlation integral behaves much like the characteristic function of a serial string in that the correlation integral of a serial string of independent random variables is the product of the correlation integrals of component substrings.”  This led us to interpret the statistic  as the serial correlation of a nonlinear time series.  Therefore, it can be regarded as a dimensionless measure of nonlinear dependence.  For fixed m, N, and r,  the plot of  vs. t is a nonlinear analog of the plot of the autocorrelation function vs. t.  In order to study the nonlinear dependence Kim et al.(1999) derived the following equations:

               = ,    m = 2, 3, ...          (7)

              .                      (8)

Brock et al. (1991) suggested that m should be between 2 and 5, and r should be between  and 2 .  In addition, the asymptotic distributions were well approximated by finite time series when N  500.  Thus, we select four values of r in the range 2 , =(0.5) , =(1.0) , =(1.5) , and =(2.0) , as representative values.  We then define the following averages of the quantities given by Eqs. (7) and (8):

                                                  (9)

                     ,                                   (10)

and we look for the first zero crossing of  or the first local minimum of  to find the first locally optimal time for independence of the data, which gives the delay time .  The optimal time is the index lag t for which  and  are both closest to zero.  If we assign equal importance to these two quantities, then we may simply look for the minimum of the quantity

,                                (11)

and this optimal time gives the delay time window .

3    APPLICATIONS TO DAILY STREAMFLOW DATA

In this work, we search for evidence of chaotic behavior in the daily streamflows of two different stations: St. Marys river near Macclenny and Ocklawaha river near Conner, Florida, USA. The two streamflow records consist of 67 years (1927-1993) with 24,472 daily data for St. Marys river near Macclenny and 11 years (1978-1988) with 4,018 daily data for Ocklawaha river near Conner. The time series plots for these streamflows are shown in Fig. 1.

3.1    Estimations of  and

For the streamflows of St. Marys and Ocklawaha rivers,  corresponds to the first local minimum points of , as indicated by the arrow in Fig. 2 for the case of St. Marys river and it can be determined similarly for Ocklawaha river.  on both streamflows are estimated as 39 =39 days and 33 =33 days.  Also, as shown in Fig. 2, we found that the minimum of  occurs for t = 152, which gives = 152 days for St. Marys river and = 187 for Ocklawaha river.

3.2    Correlation dimensions for daily streamflows

The correlation integral analysis is applied to the two daily streamflows using the values of  and  given in a previous section.  Fig. 3 shows the plots of log[C(r)] versus log(r) for the reconstructed attractors for St. Marys river for embedding dimensions m = 2, 4, ..., 20 using = 39 days.  The correlation dimension slowly increases up to m = 20 and there is no evidence for chaotic behavior.  In Fig. 4, we repeat this analysis using the delay time window = 152 days.  Since the index lag is then determined by (m-1)t=152, it is necessary to round off t to the nearest integer. The actual values used for (m, t) are as follows: (2, 152), (4, 51), (6, 30), (8, 22), (10 ,15), (12, 14), (14, 12), (16, 10), (18, 9), and (20, 8).  In this case, also, we can not find the correlation dimension which describes the chaotic behavior.  A similar analysis for Ocklawaha river is performed but, in both cases of and , the correlation fails to saturate as shown in Fig. 5, so there is also no evidence of nonlinear determinism.

4    SAMPLING OF CHAOTIC TIME SERIES

Sampling is an important consideration in the analysis of hydrologic time series.  Hydrological processes such as precipitation and streamflow are generally continuous in time.  While continuous time series are measured at some gaging stations, most recorded time series are discrete, providing instantaneous sampling at either regular or irregular time intervals (such as instantaneous daily observations of water levels in streams).  For a chaotic time series, such sampling can eliminate the evidence of nonlinear determinism, as we show in this section.

As an example, we examine the effect of sampling on a time series of the variable x from the Lorenz equations (Abarbanel et al., 1993):

                             

                                                   (12)

                            

which is generated using the parameter values a = 16.0, b = 4.0, and c = 45.92 and a time step of = 0.01.  The sampled time series will consist of 15,000 values.  We generate sampled time series by keeping only the values at every nth time step, with n=2, 10, 50, and 100.

For each of the four sampled time series, we use the C-C method to compute the delay time , and the results are given in Table 1. We then construct embeddings of the attractors using these delay times, and the results are shown in Fig. 6. The attractors degrade quickly as n increases, and, by n = 10, they are completely unrecognizable. We also compute the correlation dimensions, and these results are shown in Fig.7.  In each case, we draw a horizontal line at the dimension = 2.05 obtained from the original time series.  Note that, for the larger values of n, it is not always possible to find a linear region in the plot of log [C(r)] versus log (r) for large values of the embedding dimension m.  The convergence of the correlation dimension degrades slowly as n increases, and, by n = 100, this convergence is lost.  Thus, we see that sampling can eliminate the evidence of nonlinear determinism from a chaotic time series.

Table 1  The delay times and the delay time windows for sampled Lorenz series.

Time Interval

Time Interval

2

5

134

50

2

26

10

2

91

100

2

24

Since, as the time interval n is increased, the time series is more and more randomized by the sampling process, to find the best optimal point of  for  is difficult.  Thus, we have not used the delay time window for the estimation of the correlation dimension in sampling process.

5    DISCUSSIONS AND CONCLUSIONS

The Lorenz equations were solved numerically using a time step of = 0.01, and the resulting values of the variable x were used as a scalar time series.  Applying the C-C method to this time series yields the delay time =10 =0.1 and the delay time window =100 =1.0.  We then investigated the effect of sampling on this time series by keeping only the values at every nth time step.  This is equivalent to making sampled measurement with a regular time interval =n .  As this time interval  increases, the successive measurements will eventually become irrelevant, and any nonlinear determinism will be lost.  When this happens, the data will appear to be stochastic, rather than chaotic.

Since the delay time  is a basic correlation time, then irrelevance should begin to become apparent when ~ .  However,  is the maximum time for correlations, so complete stochasticity should not occur until ~ .  For the Lorenz system, the conditions ~ and ~ become n~10 and n~100, respectively.  In Fig. 6, we do indeed see that the reconstructed attractors begins to lose its structure when n~10, and, that this structure is completely lost when n~100.

Stochastic models are often used for the modeling of daily streamflows.  However, if the streamflow data shows evidence of nonlinear determinism, then a chaotic model should be more appropriate.  Some previous studies have found evidence of nonlinear determinism, but some studies have not found such evidence, may be, due to sampling process.  Therefore, one should not expect to find evidence of nonlinear determinism in this case, and, stochastic models should work well.

References

Abarbanel, H.D.I., Brown, R., Sidorowich, J.J. and Tsimring, L.S. (1993). “Analysis of observed chaotic data in physical systems. Rev.  Mod. Phys., Vol. 65, pp. 1331-1392.

Brock, W.A., Dechert, W.D., Scheinkman, J.A., and LeBaron, B. (1996). “A test for independence based on the correlation dimension. Econ. Rev., Vol. 15, pp. 197-235.

Brock, W.A., Hsieh, D.A., and Lebaron, B. (1991). Nonlinear Dynamics, Chaos, and Instability: Statistical Theory and Economic Evidence, The MIT Press.

Casdagli, M., Eubank, S., Farmer, J.D., and Gibson, J. (1991). “State space reconstruction in the presence of noise. Physica D, Vol. 51, pp. 52-98.

Fraser, A.M. and Swinney, H.L. (1986). “Independent coordinates for strange attractors from mutual information. Phys. Rev. A, Vol. 55, pp. 1134-1140.

Ghilardi, P. and Rosso, R. (1990). “Comment on Chaos in rainfall by I. Rodriguez-Iturbe et al. Water Resources Research, Vol. 26, No. 8, pp. 1837-1839.

Grassberger, P. (1990). An optimized box-assisted algorithm for fractal dimensions.” Phys. Lett. A, Vol. 148, No. 1,2, pp. 63-68.

Grassberger, P. and Procaccia, I. (1983). “Measuring the strangeness of strange attractors. Physica D, Vol. 7, pp. 153-180.

Jeong, G.D. and Rao, A.R. (1996). “Chaos characteristics of tree ring series. J. hydrology, Vol. 182, pp. 239-257.

Kim, H.S., Eykholt, R., and Salas, J.D. (1999). “Nonlinear dynamics, delay times, and embedding windows. Physica D, Vol. 127, pp. 48-60.

Lettenmaier, D.L. and Wood, E.F. (1992). “Hydrologic forecasting.” in Handbook of Hydrology edited by Maidment, D.R.

Martinerie, J.M., Albano, A.M. Mees, A.I., and Rapp, P.E. (1992). “Mutual information, strange attractors, and the optimal estimation of dimension.” Phys. Rev. A, Vol. 45, pp. 7058-7064.

Packard, N.H., Crutchfield, J.P., Farmer, J.D., and Shaw, R.S. (1980). “Geometry from a time series. Phys. Rev. Lett., Vol. 45, No. 9, pp. 712-716.

Puente, C.E. and Obregon, N. (1996). “A deterministic geometric representation of temporal rainfall : results for a storm in Boston. Water Resources Research., Vol. 32, No. 9, pp. 2825-2839.

Rodriguez-Iturbe, I., Power, B.F.D., Sharifi, M.B., and Georgakakos, K.P. (1989). “Chaos in rainfall. Water Resources Research, Vol. 25, No. 7, pp. 1667-1675.

Rosenstein, M.T., Collins, J.J., and De Luca, C.J. (1994). “Reconstruction expansion as a geometry-based framework for choosing proper delay times. Physica D, Vol. 73, pp. 82-98.

Salas, J.D. (1992). Analysis and modelling of hydrologic time series.” in Handbook of Hydrology edited by Maidment, D.R.

Sangoyomi, T.B., Lall, U., and Abarbanel, H.D.I. (1996). “Nonlinear dynamics of the Great Salt Lake: dimension estimation. Water Resources Research, Vol. 32, No. 1, pp. 149-159.

Sharifi, M.B., Georgakakos, K.P., and Rodriguez-Iturbe, I. (1990). “Evidence of deterministic chaos in the pulse of storm rainfall. J. Atmos. Sci., Vol. 47, No. 7, pp. 888-893.

Takens, F. (1981). Detecting strange attractors in turbulence. in Dynamical Systems and Turbulence edited by Rand, D.A. and Young, L.S., pp. 336-381, Springer-Verlag.

Tong, H.(1990). Non-Linear Time Series: A Dynamical System Approach, Clarendon Press.

Tsonis, A.A.(1992). Chaos: From Theory to Applications, Plenum Press.

Wilcox, B.P., Seyfried, M.S., and Matison, T.H. (1991). “Searching for chaotic dynamics in snowmelt runoff. Water Resources Research, Vol. 27, No. 6, pp. 1005-1010.

(a)                               (b)  

Fig. 1    Daily streamflows at (a) Marys and (b) Ocklawaha rivers, FL, USA

      (a) S(m,r,t)           (b) , , ,

Fig. 2    Estimations of  and  for daily streamflow at Marys river, FL, USA

 

(a) Correlation integral.            (b) Correlation dimension.

 

Fig. 3    Estimation of correlation dimension using the delay time

at St. Marys river near Macclenny, Florida, USA.

 

           (a) Correlation integral.         (b) Correlation dimension.

 

Fig. 4  Estimation of correlation dimension using the delay time window
 
at St. Marys river near Macclenny, Florida, USA.

 

(a) Use of the delay time.           (b) Use of the delay time window.

Fig. 5    Correlation dimension at Ocklawaha river near Conner, Florida, USA.

 

Fig. 6    Attractors of sampled time series from Lorenz system.

Fig. 7    Correlation integral and correlation dimension for sampled time series from Lorenz system.