CHARACTERIZATION OF GROUNDWATER QUALITY VARIABLES BY STATISTICAL METHODS

 

A. H. M. Faisal Anwar1 and M. Fazlul Bari2

1, 2 Department of Water Resources Engineering, Bangladesh University of

Engineering and Technology

1Corresponding address:

Assistant Professor, WRE department, BUET, Dhaka-1000, Bangladesh.

Tel. No: 880-2-9665650 to 80 (ext 7239)

Fax No: 880-2-8613026; E-mail: fanwar@wre.buet.edu

 

Abstract: The increasing problem of groundwater contamination has resulted a need for the information that can be obtained from the properly designed groundwater monitoring programs. The utilization of groundwater quality data and its subsequent benefits depend upon an understanding of the statistical characteristics of groundwater quality variables. The primary objective of groundwater sampling is to provide early warning of pollution events. This involves statistical analyses of the data to detect changes in quality over time or space. In order to characterize the groundwater quality variables, it is necessary to test whether the data of concern are normally distributed, show seasonal variations or serially dependent. Groundwater quality data from about 40 wells located in northwest Bangladesh were used in this study. Nine parameters such as, chloride, nitrate, TDS, pH, SAR, sodium, iron, calcium and magnesium were selected as the groundwater quality variables for analyses. Results of statistical analyses revealed that most of the groundwater quality variables were not normally distributed, but have skewed right distribution. As such, most of the variables appeared to be lognormally distributed. Different statistical test for seasonal variation was performed and chloride, nitrate, TDS, SAR and magnesium were found to be showing seasonal patterns. The parameters like, pH, SAR, sodium, calcium and magnesium were found to be showing serial correlation. Spatial variability of these parameters was also investigated and some of the parameters like, NO3, Ca and Mg showed significant spatial variation.

Keywords: groundwater quality, statistical distribution, seasonal variation, autocorrelation

1    INTRODUCTION

The emerging problem of groundwater contamination due to agricultural practices and increased use of fertilizers and pesticides has created a need for information on groundwater quality. Corresponding awareness of the importance of groundwater quality monitoring has also become a focus of attention. In order to extract the information, which regulatory agencies and other organizations need for effective management of the groundwater resources, statistical analysis of monitoring data is imperative (Walpole and Myers, 1978). The design of effective monitoring (statistical sampling) programs and the selection of appropriate statistical methods for analyzing groundwater quality variables requires an understanding of the behavior of the random variables of concern. Statistical analysis of data is performed in order to detect the changes in groundwater quality over time or over space. The choice of statistical methods is dictated both by the information expectations of water quality regulators and the statistical characteristics of the water quality variables. From a statistical viewpoint, all water quality constituents are considered random variables.

The effective design of monitoring programs and subsequent utilization of data obtained depend upon an understanding of the general statistical characteristics of groundwater quality variables. Moreover, if suitable probability distribution function can be fitted to observed data then the values needed by the hydrologists and engineers may be extrapolated beyond the range of the observational period. To select appropriate statistical tests for change, one must know whether the water quality variables of concern: (1) are normally distributed, (2) exhibit seasonal variations and (3) are correlated in time or serially dependent (Harris et al., 1987; Montgomery et al., 1987; Anwar, 1995). In monitoring and evaluation of the effects of the contamination source of groundwater, information on the spatial variation in the background groundwater quality is also needed (Bjerg and Christensen, 1992). Characterization of variables in these terms is a necessary first step in regulatory data analysis and hence it was performed in this study. This step of analyses ensures the validity of assumptions on which the selected trend testing methods are based. This paper is an effort to provide guidance in the difficult task of analyzing limited background data sets for the purpose of characterizing groundwater quality populations. The methods may be applied generally to the groundwater quality data collected with equal time interval (such as, quarterly). An indication for analysis is also provided in order to characterize the background data sets that are not sampled with equal time space.

2    STUDY AREA AND DATA COLLECTION

Northwestern zone of Bangladesh was selected as the study area that includes the greater administrative districts of Dinajpur, Rangpur, Rajshahi, Bogra and Pabna. The area is situated between longitudes 88°10′ and 89°50′ E and between latitudes 23°30′ and 26°38′N. The region is adjacent to the Indian states of Bihar and West Bengal and forms a hydrological distinct area bounded on the south by the Ganges-Padma river and on the east by the Brahmaputra-Jamuna river. The major land types in the region include recent alluvial flood plains and Pleistocene terraces. Abundant groundwater supplies occur throughout the region in a shallow unconfined aquifer. The northwestern zone was selected for this study because the number of tubewells, the level of groundwater use and the cropping intensity were found to be significant in this area.

Groundwater quality data were collected from Bangladesh Water Development Board (BWDB)-a regulatory agency responsible for nationwide groundwater quality monitoring (BWDB, 1976-84). BWDB collects water samples from 117 tubewells and piezometric wells throughout the year in whole Bangladesh. Of these, approximately 40 designated observation wells are located in the northwestern zone. Nineteen different parameters for groundwater quality are usually determined by BWDB. Among them, concentration data on nine parameters such as, chloride (Cl), nitrate (NO3), total dissolved solid (TDS), pH, sodium adsorption ratio (SAR), sodium (Na), iron (Fe), calcium (Ca) and magnesium (Mg) were selected for analyses. These parameters were considered because most of them are likely to show the changes in quality due to variations in infiltrations over time (Bjerg and Christensen, 1992). Data series of 1973-1983 for these parameters were selected for analysis because of its complete database.

3    DATA ANALYSIS AND DISCUSSION

3.1    Normality

Of all the commonly used probability distributions, normal distribution is most widely used because of its performance as a base distribution for comparison and error analysis. Large departures from normality, particularly in the form of skewness, or lack of symmetry, can invalidate results. Transformations (for example, log transformation) of non-normal data are often used to remove skewness and produce normally distributed values. In this study, data were examined for normality by several alternative methods, such as fitting empirical distribution, skewness test and chi-square goodness-of-fit test.

3.1.1    Empirical distribution

Empirical distribution of observations is usually represented by frequency histograms, which provides a visual indication of the symmetry of probability distributions. Histograms were constructed for original and logtransformed data sets for all the parameters. Histogram for chloride is presented in Fig. 1 in order to perform a visual inspection for normality.

Fig. 1    Frequency histogram for Chloride (Cl) data series

Relative frequency provides an estimate of the probability of parameter concentration falling in the indicated range or class interval. Frequency distribution of all the parameters appeared to be skewed to the right. Qualitatively it can be stated that the degree of skewness varied considerably for the parameters, which may contain only a few values that are significantly larger than the average values. These values may arise from measurement errors or from groundwater contamination, in which case the high values may belong to a population different from that of the remaining sample values (Montgomery et al., 1987). However, the groundwater quality variables were found lognormally distributed, which is similar to Chow et al., (1988).

3.1.2    Skewness test

Skewness is a measure of symmetry of the distributions and can be a conclusive indicator of non-normality. The skewness co-efficients (Cs) for original and logtransformed data were calculated following the procedure given by McCuen (1993). Results revealed that skewness of all the parameters have positive value except pH. Negative skewness is not as common as positive skewness in groundwater quality, but the example is affected by a very few sample points which are very much lower than the rest. Theoretical values of skewness for normal and lognormal distributions are reported by Law and Kelton (1991) as 0.00 and 6.18 respectively. Comparing the computed values with these values for normal and lognormal distribution, all water quality parameters were appeared to be lognormally distributed. However, most of the skewness for logtransformed data shows negative value, which is usually expected for nonnormal data set (Haan, 1977).

3.1.3    Chi-square (χ2) test

The Chi-square goodness of fit test was used to test for a significant difference between the distribution suggested by a data sample and a selected probability distribution. Here the test assumed the data drawn from a normal population; chi-square test checked the validity of this assumption. The hypothesis of χ2-test was performed at 5%significance level and the computed χ2 was compared with critical χ2 value (McCuen, 1993). The results revealed that the computed χ2 values for the groundwater quality parameters are larger than the critical χ2 values, which indicates that the parameters were not normally distributed.

3.2    Seasonality

Hydrogeologic conditions of the site may suggest specific forms of seasonality, which can be tested. For example, with quarterly data, a reasonable form of annual cycle of one quarter may be different from the other three. Again high recharge occurs in the monsoon season. So form of annual cycle in the monsoon season (high recharge period) may be different from other seasons. If no prior determination of seasonality can be made, each of the seasons must be tested to determine whether its mean tends to be different from that of the other seasons. Prior to seasonality test, it is necessary to adjust data series to make it stationary, which can be obtained by transforming data into zero-mean or trend-free series (Harris et al., 1987). For this, the data was organized into quarterly values; as such, the first quarter of a year was defined as January, February and March. Zero-mean or trend-free series was obtained in the following manner: a linear regression of concentration versus time was plotted and the slope of the regression was tested for significance (at 5%). Those parameter values showed significant linearity, the trend (a+bt) was removed from the sample values, otherwise, the mean concentration was removed from the parameter observation and thus, a series with trend free or zero-mean was obtained. Following this procedure, mean was removed from Cl, NO3, TDS, SAR, Fe, and trend was removed from pH, Na, Ca, Mg data and thus the series was made stationary.

The stationary data were then checked for seasonality by different statistical tests such as, 2-sample test (Student's t-test and Mann-Whitney U test) or 4-sample test (ANOVA and Kruskal-Wallis test). (McCuen, 1993). In the analysis quarter 1, and 2 were taken as group 1 and quarter 3 and 4 were taken as group 2 in 2-sample tests. Quarter 1, 2, 3 and 4 were taken as group 1, 2, 3 and 4, respectively for 4-sample test. Summary of all these test results are given in Table 1. In 2-sample test, NO3, TDS, SAR and Mg show seasonality in t-test and Mann-Whitney U Test respectively. The proportion of NO3 ions originating from fertilizers and other agricultural practices varies from season to season. SAR and Mg of group 1 (dry and premonsoon season) should be higher than the group 2 (monsoon and post monsoon season) due to different in precipitation. Again TDS (ppm) is a measure of salinity that can be expected to increase through the combined effects of groundwater recycling, high evapotranspiration and low rainfall (Nightingale, 1970). During dry period agriculture depends more on groundwater pumping conceivably, the water recharged in this period is recycled and concentrated due to evapotranspiration, resulting in an increase of groundwater salinity. Students t-test shows seasonality at higher level of significance (20%) because that the test assumes the data are independent and come from normally distributed populations with equal variances, which was not true for the present case. It was seen in the 4-sample tests that Cl concentration shows seasonality. That is, mean Cl concentration differs from season to season. From ANOVA, it was further seen that the mean of group 3 differs from the means of group 2 and 4 at 5% significance level. Group 3 indicates the monsoon season of July, August and September with high precipitation and recharge events and thus, Cl concentration may be assumed to be lower in this season.

Table 1    Summary of statistical tests to detect seasonality at different level of significance (%)

Variables

2-sample tests

4-sample tests

t-test (20%)

Mann-Whitney (5%)

ANOVA (5%)

Kruskal-Wallis (5%)

Cl

no

No

yes

yes

NO3

yes

Yes

no

no

TDS

yes

Yes

no

no

pH

no

No

no

no

SAR

yes

No

no

no

Na

no

No

no

no

Fe

no

No

no

no

Ca

no

no

no

no

Mg

no

yes

no

no

3.3    Serial dependence

Serial correlation can exist between an observation at one time period and an observation k time periods earlier for k = 1, 2.... In this discussion of serial correlation, it is assumed that observations are equally spaced in time (quarterly sampling frequency). The population serial correlation coefficient is denoted by p(k) (frequently called the autocorrelation coefficient) where k is the lag or number of time intervals between the observations being considered. For water quality processes, the function p(k) decays with increasing k (Harris et al., 1987). The sample lag-k autocorrelation coefficient for a sample of size n was calculated using the process given by Haan (1977). The sample serial correlation coefficient is denoted by r(k) which was tested for significance using confidence interval approach. As autocorrelation coefficients range from -1.0 to +1.0, where positive values indicate a direct relationship. For example, a positive lag one autocorrelation coefficient r(1) would mean that high values in one time period tend to be followed by high values in the next time period. Negative autocorrelation implies an inverse relationship. Autocorrelation coefficients (for k = 1, 2, ... 8) for all the parameters were calculated and tested for significance at 95% confidence intervals. Among them, autocorrelation function of Cl and Na is shown in Fig. 2.

Fig. 2    Auto correlation function exhibited by Cl and Na (Sample frequency of data is quaterly). Dashed lines are 95% confidence intervals.

It was found that the autocorrelation of pH, SAR, Na, Ca and Mg were significant at 95% confidence intervals with different lag k and hence these were said to be serially dependent. In the other hand, autocorrelation of Cl, NO3, TDS and Fe were found to be not significant and these were considered as serially independent. Parameter like Cl, NO3 and TDS are mainly leached to the groundwater through the application of fertilizers and pesticides in the agricultural land. These agrochemicals are usually applied in a particular season of the year. As a result, these parameters have been shown seasonality and now showing serially independence.

3.4    Spatial variation

Five wells, one located in each disrtict of the northwest region, were selected for this analysis. Inspection of means and standard errors of parameter concentrations at five well locations, as presented in Table 2, reveals spatial variation. Such variation is likely due to small-scale horizontal regional variations and local variations in the aerial application of fertilizer. NO3 may be denitrified in anaerobic stagnant water, which may cause a large variation spatially (Bjerg and Christensen, 1992). However, parameter concentrations were within the acceptable limits except Fe and Mg. Fe concentration exceeded the permissible limit at all locations while Mg exceeded in flood plain areas of major rivers. Cl, TDS, pH, Na and Mg showed minimum values in upland areas and maximum values at locations near Brahmaputra River, indicates an increasing trend near major rivers. As expected SAR was lower near major rivers because of higher concentration of Ca and Mg at these locations. SAR values at all locations were well below the irrigation standard.

Table 2    Mean and standard error of water quality parameters for five selected wells

Well No.

Cl

NO3

TDS

pH

SAR

Na

Fe

Ca

Mg

4

16.3± 8.6

3.6± 1.9

120.6± 16.4

6.9± 0.7

2.4± 0.5

25.5± 4.5

3.7± 1.1

11.5± 2.3

9.1± 3.3

14

30.5± 5.2

7.2± 2.4

225.8± 24.5

7.2± 0.6

1.3± 0.4

31.3± 4.4

1.9± 0.7

33.4± 9.3

18.5± 7.6

23

23.5± 4.6

0.2± 0.1

181.6± 16.4

7.2± 0.7

1.6± 0.3

37.6± 5.3

2.6± 0.7

27.9± 2.2

18.7± 4.9

27

52.9± 14.7

0.5± 0.2

425.8± 59.4

7.7± 0.8

1.2± 0.2

37.5± 6.7

0.9± 0.3

55.7± 11.7

58.1± 16.6

39

31.6± 6.9

1.2± 0.4

380.6± 32.1

7.7± 0.5

1.5± 0.4

32.6± 4.4

10.9± 3.1

71.3± 12.6

56.7± 20.6

Well 4 located at Thakurgaon, Dinajpur; 14 at Rajshahi Town; 23 at Gakul, Bogra; 27 at Nawabganj, Rajshahi; 39 at Nagarbarighat; Pabna.

4    CONCLUSIONS

In this study, groundwater quality variables such as, Cl, NO3, TDS, pH, SAR, Na, Fe, Ca and Mg from northwest region of Bangladesh were analyzed statistically. Statistical analyses were performed to check whether the water quality data of concern are normally distributed, exhibits seasonal variation or serially dependent. Spatial variation of parameters was also performed in 5 selected wells located in 5 different districts in the region. Normality of the data was checked by empirical distribution, skewness test and chi-square goodness-of-fit test and was found nonnormal distribution, rather the data was appeared to be lognormally distributed, which is usually expected for water quality data. Tests for seasonality were performed using 2-sample tests (Student's t-test and Mann-Whitney U test) and 4-sample tests (ANOVA and Kruskal-Wallis test). It was seen from the results that Cl, NO3, TDS, SAR and Mg shows seasonal variation because of the variation of precipitation in different seasons. The parameters like, pH, SAR, Na, Ca and Mg were found to be showing serial correlation. Spatial variability of these parameters was also investigated and found significant variation, especially in case of NO3, Ca and Mg because of the variations in aerial application of the agrochemicals in different locations.

References

Anwar, A.H.M. F., 1995, Statistical investigation of groundwater quality variables in north-western zone of Bangladesh, M.Sc. Eng. Thesis, Department of Water Resources Eng., BUET, Dhaka, Bangladesh.

Bangladesh Water Development Board (BWDB), 1976-84, Groundwater qualities of Bangladesh, BWDB water supply paper-394, 403, 424, 433, 444, and 447, Groundwater Data Processing and Research Circle, BWDB, Dhaka.

Bjerg, P. L. and Christensen, T. H., 1992, Spatial and temporal small-scale variation in groundwater quality of a shallow sandy aquifer, J. Hydrol., 131:131-149.

Chow, V. T., Maidment, D. R. and Mays, L. W., 1988, Applied hydrology, McGraw-Hill, New York.

Haan, C. T., 1977, Statistical methods in hydrology, The Iowa State University Press.

Harris, J., Loftis, J. C., and Montgomery, R. H., 1987, Statistical methods for characterizing groundwater quality, Groundwater 25(2): 185-193.

Law, A. M. and Kelton, W. D., 1991, Simulation modelling and analysis, McGraw Hill Inc., New York.

McCuen, R. H., 1993, Microcomputer applications in statistical hydrology, PTR, Prentice Hall, Englewood Chiffs, New Jersey.

Montgomery, R. H., Loftis, J. C., and Harris, J., 1987, Statistical characteristics of groundwater quality variables, Groundwater, 25(2):176-184.

Nightingale, H. I., 1970, Statistical evaluation of salinity and nitrate content and trends beneath urban and agricultural areas- Fresno, California, Groundwater, 18(1): 22-28.

Walpole, R. E., and Myers, R. H., 1978, Probability and statistics for engineers and scientists, 2nd Ed., Macmillan Pub. Co., Inc.