RIVER FLOW SIMULATIONS USING PARALLEL COMPUTING TECHNIQUES

 

 

Jiang Chunbo, Zhang Qinghai and An Xiaomi

Department of Hydraulic Eng., Tsinghua University, Beijing 100084, China

E-mail: jcb@mail.tsinghua.edu.cn

 

 

Abstract: To study the effect of the Three George Reservoir on the flow field of Yangtz river, we use the Shallow Water Equations and the Finite Element Method to simulate the change of the velocity field before and after the construction of the reservoir. Since the scale of this problem is quite large, parallel computing technique is engaged to implement the implicit solving of the large linear system.

 

Keywords: three george project, finite element method, shallow water equations, parallel computing technique, GMRES method

 

1    INTRODUCTION

Parallel Computing Technique attracts more and more attention because of its powerful computing ability. By breaking down a large task into many small ones and letting many PEs (processing elements) compute the tasks simultaneously, the parallel computing technique not only enlarges the maximum numbers of meshes engaged, but also reduce the time required the computation drastically.

Three George Project, by now the largest hydraulic engineering project in China, brings on many engineering problems, one of which is the pollution state of the river will be totally changed. It is desirous to predict the distribution of the pollutants in advance. But the velocity fields should be simulated first.

In this paper, we use parallel computing technique to simulate the two cases of the velocity fields before and after the construction of the Three George Reservoir. We put the emphasis on the issues of parallel implementation, such as the auto mesh partitioning, renumbering and parallel GMRES method.

2    GOVERNING EQUATIONS

The river flow can be modeled using the shallow water equations, which are obtained from the conservation of mass and momentum, vertically integrated, assuming a hydrostatic pressure distribution:

                            (1)

           (2)

Where:  is the water elevation,  is the water depth,  is the mean horizontal velocity, g is the gravitational acceleration, is the eddy viscosity,  is the density of water, is the surface shear stress, is the bottom shear stress.

The shear stress item can be given as:

;                (3)

Where:  is the x-directional velocity of a certain node, is the y-directional velocity, ;   is the averaged velocity in a triangular mesh element,   is the Manning coefficient.

3    PARALLEL IMPLEMENTATION

3.1    Parallel Environment

The parallel environment consists of three parts: the hardware, the operating system and the software environment.

A cluster of workstations is chosen as the parallel computing hardware. Clusters have many advantages among which are the scalability, low price and high performance. The workstations are all of IA32 architectures with 550 MHz Pentium III processors, 128 SDRAM and we choose Fast Ethernet (100M/s) as the network form.

Redhat Linux 6.1 is selected as the operating system because of its low memory pre-occupation, superb network performance and its excellent support to many libraries of message passing interfaces.

The software environment comprises the underlying message passing interface and the parallel programming library. The MPICH1.2 is chosen as the message passing interface for two reasons: on one hand the MPI has been an international standard of message passing; on the other hand there are already many libraries which support MPI. As for the parallel libraries, we use the famous BLAS and LAPACK library to facilitate the programming.   
Figure 1 shows the whole parallel environment.                   

3.2    Auto mesh partitioning

We use a Delaunay triangulation algorithm [1] to generate unstructured triangular mesh in the computation domain. After that, the mesh should be partitioned to equal parts and distributed to different processors. To achieve high efficiency, the number of cutting edges should be minimized as small as possible so as to minimize the overhead of communication. We studied several partition algorithm and selected an algorithm [2] based upon Graph Theory and proposed by G.L.Miller etc.

Algorithm 1. Mesh Partitioning

(1) Project up. Project the input points form IRd to the unit sphere centered at the origin in IRd+1 along the line through p and the “north pole” (0, …, 0, 1). See Figure 2-c.

(2) Find Centerpoint. Compute a centerpoint of the projected points in IRd+1. See Figure 2-c.

(3) Rotate and Dilate. Rotate the Projected points about the origin in IRd+1 so that the centerpoint becomes a point (0, …, 0, r) on the d+1st axis; then dilate the surface of the sphere so that the center point becomes the origin. See Figure 2-d.

(4) Find Great Circle. Choose a random great circle on the unit sphere in IRd+1. See Figure 2-d.

(5) Unmap and Project Down. Transform the great circle to a circle in IRd by undoing the dilation, rotation, and stereographic projection. See Figure 2-e.

(6) Convert Circle to Separator. Figure 2-f shows a edge separator.

 

                                             Fig. 2-a                                       Fig. 2-b                                       Fig. 2-c

      

                                                          Fig. 2-d                                        Fig. 2-e                            Fig. 2-f

3.3    Renumbering

Generally speaking, the linear systems will be preconditioned to reduce the bandwidth of the matrix, either by the left-preconditioning method or right-preconditioning method. Otherwise, the large bandwidth will spoil the speed of convergence and the overhead of communication thus induced will deteriorate the parallel efficiency.

However, all the preconditioning methods require a lot of floating point operations, so we use renumbering to reduce the bandwidth of the matrix instead of preconditioning. This is quite economical since the time required by renumbering is much shorter than that of preconditioning.

The figures below show an example mesh, the corresponding original matrix structure and renumbered matrix structure. We can see that the result is quite good and the required amount of communication is reduced significantly.

Fig. 3-a    An example mesh

      

                                           Fig. 3-b    Original mesh structure                       Fig. 3-c    Renumbered Mesh structure

      

                                        Fig. 4-a    Computed flow field before the       Fig. 4-b    Computed flow field after the

                                                           construction of the reservoir                                   construction of the reservoir


3.4    Parallel GMRES method

Consider the linear system with  and with a nonsingular nonsymmetrical matrix . The Krylov subspace  is defined by . The GMRES algorithm [3] uses the Arnoldi process to construct an orthonormal basis  for . The full GMRES method allows the Krylov subspace dimension to increase up to  and always terminates in at most  iterations. The restarted GMRES method restricts the Krylov subspace dimension to a fixed value  and restarts the Arnoldi process using the last iterated  as an initial guess. It may stall. Below is the algorithm for the restarted GMRES.

Algorithm 2. Sequential GMRES algorithm

is the tolerance for the residual norm;

Convergence = false;

Choose ;

UNTIL convergence DO

    

* Arnoldi Process: Construct a basis Vm of the Krylov subspace K

FOR j=1, …, m DO

FOR I=1, …, j DO

ENDFOR;

   

ENDFOR;

* Minimize  for

* Solve a least-square problem of size m

compute  solution of ;

IF , convergence = true;

ENDDO

Where the matrix  is a upper Hessenberg matrix of order , and we get the fundamental relation

The GMRES algorithm computes  where  solves the least-squares problem . Usually a QR factorization of  using Givens rotations is used to solve this least-square problem.

Several authors have studied the parallelization of GMRES. Some of them consider alter the sequential Arnoldi Process to achieve parallelization [4,5], some of them define a new basis of the Krylov subspace [6,7], other variants such as -GMRES [8], GMRES with a Chebychev basis [9] have also been brought forward.

Our implementation is that we do not alter the sequential algorithm itself, but break down the algorithm into basic operations and parallelize the basic operations. In the GMRES, the basic operation most frequently used is the matrix-vector multiplication and therefore it should be optimized (reduce the communication time as much as possible) so that the overall parallel performance would be enhanced drastically.

4    NUMERICAL EXPERIMENT

After the implementation of parallel solving of linear systems, we discretize the Shallow Water Equation to linear systems using standard Galerkin Method in space domain. In time domain we solve a large linear system using parallel GMRES method in every time step.

Then the flow fields of Yangtz River in JiaLin section, both before the construction of Three George Reservoir and after that, are simulated. The figures in the next page show the results of these two cases.

 

Acknowledgement

This research is supported by Spring Sunshine Project-1999 under the Ministry of Education and the National Science Foundation of China, the Grant No. is 59979013.

References

[1]    George, P.L., “Automatic Mesh GenerationApplication to Finite Element Methods”, WILEY,1991.

[2]    Miller G.L., Teng S.H., etc. “Automatic mesh partitioning”. In A.George, J.Gilber, and J.Liu, editors, Sparse Matrix Computations: Graph Theory Issues and Algorithms, IMA Volumes in Mathematics and its Applications. Springer-Verlag, 1993.

[3]    Saad, Y. and Schultz, M. H., “GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems”, SIAM, J.Sci.Statist.Comput., 7(1986), pp.856-869.

[4]    DI Brozolo, G..R. and Robert, Y., “Parallel conjugate gradient-like algorithms for solving sparse nonsymmetric linear systems on a vector multiprocessor”, Parallel Comput., 11(1989),pp.223-239.

[5]    Da Cunha, R.D. and Hopkins, T., “A parallel implementation of the restarted GMRES iterative algorithm for nonsymmetric systems of linear equations, Adv. Comp. Math., 2 (1994), pp261-277.

[6]    Kim, S.K. and Chronopoulos, A., “An efficient Arnoldi method implemented on parallel computers”, in 1991 International Conference on Parallel Processing, vol. III, 1991, pp.167-196.

[7]    DE Sturler E., “A parallel variant of GMRES(m)”, in Proc. Of the 13th IMACS world Confress on Computation and Applied Mathematics, ed. J.J.H. Miller and R.Vichnevetsky, IMACS, Criterion Press Dublin, 1991, pp.682-683.

[8]    Xu X. and Richards B., “Alpha-GMRES: a new parallelizable iterative solver for large sparse nonsymmetrical linear systems arising from CFD, Int. J. Numer. Meth. Fluids, 15(1992), pp.613-623.

[9]    Joubert W.D. and Carey G..F., “parallelizable restarted iterative methods for nonsymmetric linear systems I- theory, II-parallel implementation”, Int. J. Comput. Math., 44(1992), pp.243-290.

 


Supported by Spring Sunshine Project-1999 under the Ministry of Education and the National Science Foundation of China, the Grant No. is 59979013