A NONPARAMETRIC APPROACH FOR PRESERVING INTERANNUAL DEPENDENCE IN SYNTHETIC STREAMFLOW SEQUENCES

  

Ashish Sharma1 and Robert O'neill2

1School of Civil and Environmental Engineering

The University of New South Wales, Sydney, Australia

Tel: +61 (2) 9385 5768; Fax: +61 (2) 9385 6139; E-mail: a.sharma@unsw.edu.au

2Department of Land and Water Conservation, Sydney, Australia

 

Abstract: The estimation of risks associated with water management plans requires generation of synthetic streamflow sequences. The mathematical algorithms used to generate these sequences at monthly time scales are found lacking in two main respects: inability in preserving dependence attributes particularly at large (seasonal to inter-annual) time lags; and, a poor representation of observed distributional characteristics, in particular, representation of strong assymetry or multimodality in the probability density function. Proposed here is an alternative that naturally incorporates both observed dependence and distributional attributes in the generated sequences. Use of a nonparametric framework provides an effective means for representing the observed probability distribution. A careful selection of prior flows imparts the appropriate short-term memory, while use of an “aggregate” flow variable allows representation of interannual dependence. The nonparametric simulation model is applied to the Burrendong dam inflows, New South Wales, Australia. 

Keywords: stochastic, streamflow, water resources management

1    INTRODUCTION

An important goal of stochastic hydrology is to generate synthetic streamflow sequences that are statistically similar to the observed flow record. These sequences serve as inputs for Monte Carlo simulation of a reservoir system, to help identify plans and policies for efficient management of available water resources. A key requirement in stochastic streamflow simulation is that the generated sequences be "similar" to the observed flows. This implies that the distributional and dependence attributes of observed flows should be accurately reproduced in the simulations. The representation of seasonal to inter-annual dependencies commonly associated with sustained droughts or periods of large flows is of particular importance for reservoir system management. Absence of such dependencies in simulations can result in an inaccurate representation of the flows that are likely to occur. This can in-turn lead to biased reservoir operating policies causing both loss of revenue in reservoir operation, and a possible hazard for users downstream. This paper presents an approach for stochastic simulation of seasonal streamflow sequences that attempts to reproduce such longer term dependence characteristics and the observed distributional attributes in the generated flow sequences. The approach is developed within a nonparametric density estimation framework that ensures accurate representation of the distributional attributes present in the historical flow record. Use of an aggregate streamflow variable, details on which are presented in later sections, ensures an accurate characterisation of the seasonal to inter-annual dependencies in the model simulations.

Stochastic simulation of seasonal flows has traditionally been approached using two different perspectives. Autoregressive moving average (ARMA) models have been commonly used to model both seasonal and annual streamflow sequences. These models assume that the current flow is linearly related to previous observations. Many a times the actual flow values need to be transformed to an alternate variable that conforms well with the assumptions of linearity (or a Gaussian probability density) implicit in the model structure. Use of such a framework offers an accurate representation of the dependence between the current and a few past flow values, but does not necessarily ensure that longer-term (seasonal to interannual) dependencies are accurately reproduced.

An alternative to the ARMA models discussed above are stochastic disaggregation approaches. Here the stochastic simulation proceeds in two stages. First, an annual flow sequence is generated using an appropriately chosen model, using previous year flows as the basis to prescribe the observed annual dependence structure. Next, the generated annual or aggregate flow for each year is disaggregated or divided into the various seasonal components. This ensures that if the annual flow corresponds to a low flow year, the associated seasonal flows will also represent the same. While this offers a reasonable alternative to ARMA models, and also ensures that some measure of inter-annual dependence is translated to the seasonal flow simulations, the resulting flow sequences offer only an approximate representation of the processes observed in the historical flow record.

Proposed here is a seasonal streamflow generation approach that is free from several inherent disadvantages in ARMA models. Generated sequences reproduce both the short term as well as interannual dependence present in the historical flows. Use of the nonparametric framework ensures that dependence and distributional attributes in generated flows are similar to those in the historical record. What follows is a brief background on nonparametric methods, their applications in hydrology and water resources, and how they can be used to formulate conditional streamflow simulation models. Next, methodological and algorithmic details on the nonparametric streamflow simulation model proposed here are presented. The model is next applied to 105 years (1890 to 1994) of Burrendong dam inflows on the Maquarie River in eastern NSW, Australia. We conclude with a discussion of the approach, its pros and cons, and mention some of the work that lies ahead.

2    NONPARAMETRIC APPLICATIONS FOR STOCHASTIC STREAMFLOW GENERATION

The past few years have seen a surge in applications of nonparametric methods for probability density and regression function estimation to a range of hydrologic problems. Interested readers may refer to [Lall, 1995] for a review. Some of the applications related to the present work are – a synthetic streamflow resampling approach using nearest neighbour density estimation principles [Lall and Sharma, 1996] ; a nonparametric alternative to the Autoregressive order p model (the NPp or the nonparametric order p streamflow simulation model) [Sharma et al., 1997] ; and, a nonparametric alternative to traditional disaggregation approaches (the NPD or the nonparametric disaggregation model) [Tarboton et al., 1998] . Streamflow simulation is an exercise in conditional probability distributions [Bras and Rodriguez-Iturbe, 1985] . Simulation of flow Xt conditional to p prior flows (Xt-1, Xt-2, …, Xt-p) involves estimation of the conditional probability density function f(Xt | Xt-1, Xt-2, …, Xt-p). Similarly, disaggregation of an aggregate flow Z = X1 + X2 + … + Xd into the d seasonal components (X1, X2, …, Xd) requires estimation of the conditional multivariate probability density f(X1, X2, …, Xd | Z). Conventional approaches assume certain distributional forms for the joint and marginal probability densities of the flow variables, from which the above conditional probability density functions are derived. These conditional densities are then expressed using parameters such as the mean, variance and skewness, and measures of dependence such as correlation. As these methods rely solely on parameters (mean, variance, skewness, correlation) of the data to characterise the assumed probability density functions, they are termed parametric. Such methods are useful only if the assumptions about the underlying distributional forms are accurate. One often comes across streamflow records that are not easily characterisable by the commonly used probability distributions (see examples [Lall and Sharma, 1996; Sharma et al., 1997; Tarboton et al., 1998] ).

Nonparametric methods offer an efficient alternative to traditional parametric approaches. A nonparametric kernel probability density estimate is obtained by considering the cumulative effect of smooth functions called kernels placed over each sample data point. Using a Gaussian kernel function, the multivariate kernel probability density of a d-dimensional variable set x at coordinate location x is estimated as:

                 (1)

where:

xi is the i'th multivariate data point, for a sample of size n,

S is the sample covariance of the variable set x, and,

l is a smoothing parameter, known as the "bandwidth" of the kernel density estimate

The bandwidth, l, is the key to an accurate estimate of the probability density. A large value of l results in an oversmoothed probability density, with subdued modes and over-enhanced tails. A low value, on the other hand, can lead to density estimates overly influenced by individual data points, with noticeable bumps in the tails of the probability density. Several operational rules for choosing optimal values of the bandwidth l are available in the literature. This study uses the Least Squares Cross Validation approach, details on which are given in [Sharma et al., 1998] .

3    PROPOSED APPROACH

This approach is aimed at accurately representing interannual dependence in simulated flows. Consider the flow at time t to be Xt, where t could represent annual, seasonal or monthly time steps. For example, for monthly flows, X1, X2, …, X12 would be the flows for the first 12 months, X13, …, X24 the flows for the next 12 months, and so on. The aggregate flow variable Zt can then be defined as:

                                     (2)

where m is the number of prior flows included in the aggregate variable. This study uses monthly flows and an annual aggregate level (m=12) to formulate the simulation model. The variable Zt thus represents the annual flow during the past 12 months for the month being simulated, and its use as a conditioning variable enables proper representation of interannual dependence features. Simulation proceeds from the following conditional probability density:

                          (3)

where fm(.) represent the marginal probability density of the variable set. Note that the above conditional probability density is a function of (p+1) variables: Zt and (Xt-1, Xt-2, …, Xt-p). While use of the variables (Xt-1, Xt-2, …, Xt-p) enforces a short term (till lag p) dependence structure in the simulated flow value, the aggregate variable Zt ensures that the annual dependence pattern is correctly represented. Also note that the conditional probability density in (3) has been specified as a function of p prior lags of Xt. One needs to estimate the appropriate value for p in case of a real application using an order selection scheme such as the Akaike Information Criterion (AIC) [Akaike, 1974] or Generalised Cross Validation (GCV) [Craven and Wahba, 1979] . The authors recommend the use of GCV for estimation of the optimal model lag. The present application assumes p equal to 1 for the sake of simplicity. The conditional density used for simulation then becomes:

                              (4)

Using the kernel density estimator in (1), the conditional density in (4) is estimated as:

                      (5)

where:

is the conditional probability density estimate;

S’ is a measure of spread of the conditional probability density, expressed as:

      

where the covariance matrix of the variable set (Xt, Xt-1, Zt) is written as:

wi is the weight associated with each kernel that constitutes the conditional probability density:

      

bi is the conditional mean associated with each kernel:

xi and zi represent observations, zi being estimated using the 12 prior flows as expressed in equation (2).

The conditional probability density estimate in (5) can be viewed as consisting of n kernels having relative areas equal to weight wi, centered at bi, and having a spread proportional to S’. Each of these are slices of the trivariate kernels that constitute the joint probability density of (Xt, Xt-1, Zt), along the conditioning plane specified by (Xt-1, Zt). The weight wi depends directly on how far the kernel is from the conditioning plane. A smaller weight implies that the kernel is far from the conditioning plane and does not make up a significant proportion of the conditional density estimate. On the other hand, a large wi implies that kernel i is close to the conditioning plane and constitutes a significant portion of the conditional density estimate.

Readers should note that the model proposed here is similar to the NP1 model of [Sharma et al., 1997] , except that the proposed model uses an aggregate flow variable in addition to the previous month’s flow as the two model predictors. The use of the aggregate flow variable is to impose a longer term dependence in the simulated flows. Such dependence is missing in the NP1 or any other Markov order 1 dependence models. To distinguish between the NP1 model of [Sharma et al., 1997] and the nonparametric simulation model proposed here, the following convention will be used: the NP1 model of [Sharma et al., 1997] with no long term dependence will be denoted as before (NP1), whereas the nonparametric model proposed here will be denoted as NPL1 in the discussions that follow.

4    APPLICATION TO BURRENDONG DAM INFLOWS, NEW SOUTH WALES, AUSTRALIA

The nonparametric simulation model was next applied to 105 years (1890 to 1994) of reservoir inflows to the Burrendong dam in eastern NSW, Australia. The Burrendong dam is located on the Maquarie River and has an approximate catchment area of 7500 km2. While flow data has been measured since the opening of the dam in 1967, the earlier periods of record have been estimated by the New South Wales Department of Land and Water Conservation using the observed rainfall record and a rainfall-runoff model. This streamflow data poses many problems to the stochastic modeller. Firstly, there are several instances where the flow has stayed at fairly low levels for 6-10 months at a stretch. Secondly, there are several 'zeroes' in the flow record, which always pose a few challenges when prescribing a continuous probability density function. And lastly, this river is known to be susceptible to prolonged droughts, leading to long periods of very low flows (the minimum 11 month and 12-month average flows are respectively 0.8% and 2.2% of the mean annual flow).

One hundred realisations each 105 years long were simulated using the two nonparametric stochastic streamflow generation models. Statistics such as the mean, standard deviation, lag correlations and the coefficient of skewness, were computed for each month, and were found to be in good agreement with the historical values. This is to be expected given the structure of the nonparametric model. Results are not presented for lack of space. A comparison of some of the statistics of the annual flow volumes (summation of monthly flows over the water year) is presented in Table 1. While both models are able to model the annual mean flow reasonably well, the NP1 simulations are unable to reproduce the observed annual flow standard deviation, coefficient of skewness, or lag-1 correlation. This is an important result that illustrates the ability of the NPL1 model to ensure that simulated flow values preserve distributional and dependence attributes at both monthly and annual scales.

Table 1  Comparison of observed and simulated Burrendong dam annual inflow statistics

Statistic

Observed

 

NP1

NPL1

Mean

(ML)

1096697

25th %ile

1,049,526

953,064

Median

1,086,588

1,022,806

75th %ile

1,159,303

1,102,851

Standard Deviation

(ML)

1130626

25th %ile

830305

881,642

Median

928528

995,120

75th %ile

1007665

1,170,716

Skewness

3.01

25th %ile

1.56

2.18

Median

1.91

2.91

75th %ile

2.13

3.31

Lag-1 Correlation

0.114

25th %ile

-0.040

0.049

Median

0.015

0.090

75th %ile

0.078

0.134

Figure 1 illustrates the reservoir storages estimated based on NP1 and NPL1 model simulations using the sequent peak algorithm. The NPL1 model simulations lead to reservoir storage volumes that are close to those estimated based on the historical flow record. NP1 model simulations, however, lead to substantially smaller storage volumes than those suggested by the historical flows. These smaller storage volumes are due to the lack of a longer-term dependence structure in the NP1 model simulations, and have obvious implications for any water management applications they might be used for.

Fig. 1  Burrendong dam reservoir storage volume estimates for meeting

seasonally non-varying demands

Fig. 2  Variation of observed and simulated low flow values as a function of duration for the Burrendong dam monthly inflow record

Both models were also tested for their ability to simulate flows that would be likely in the event of a sustained drought. Figure 2 presents the lowest monthly flow rate as a function of duration. It is interesting that both models are able to properly simulate observed low-flow sequences for durations longer than 11 months. Neither model performs as well at simulating the worst 11-month drought on record. It is possible that more than one short term dependence variable is necessary to represent such severe flow conditions.

5    SUMMARY

A synthetic streamflow generation model was presented that was capable of modelling both short term and interannual dependencies, as well as non-standard probability density functional forms. This model was tested on two monthly streamflow data sets representing very different climatological and topographical regimes. The results indicated that the proposed model was able to represent longer-term dependence in a better way as compared to conventional streamflow simulation approaches. The improved representation of the longer-term dependence lead to significant improvements in the representation of reservoir storage volumes. It should be noted that the results presented here were based on arbitrary choices for the number and type of variables used to represent the short and long-term dependence. It is likely that results will improve further if these variables are chosen based on an elaborate sensitivity analysis. Efforts are currently underway to develop a method for selecting model variables that impart the short and long-term dependence in an optimal manner. Efforts are also underway to extend the proposed method for use with flow records at multiple locations.

References

Akaike, H., A new look at the statistical model identification, IEEE Transactions on Automatic Control, AS-19 (6), 716-723, 1974.

Bras, R.L., and I. Rodriguez-Iturbe, Random functions and hydrology, 559 pp., Dover Publications, Inc., New York, 1985.

Craven, P., and G. Wahba, Smoothing noisy data with spline functions, Numerical Mathematics, 31, 377-403, 1979.

Lall, U., Recent Advances in Nonparametric Function Estimation, in Reviews of Geophysics, pp. 1093-1102, 1995.

Lall, U., and A. Sharma, A nearest neighbor bootstrap for time series resampling, Water Resources Research, 32 (3), 679-693, 1996.

Sharma, A., U. Lall, and D.G. Tarboton, Kernel bandwidth selection for a first order nonparametric streamflow simulation model, Stochastic Hydrology and Hydraulics, 12, 33-52, 1998.

Sharma, A., D.G. Tarboton, and U. Lall, Streamflow Simulation : A Nonparametric Approach, Water Resources Research, 33 (2), 291-308, 1997.

Tarboton, D.G., A. Sharma, and U. Lall, Disaggregation Procedures For Stochastic Hydrology Based On Nonparametric Density Estimation, Water Resources Research, 34 (1), 107-119, 1998.