Jiang Chunbo, Zhang Qinghai and An Xiaomi
Department of Hydraulic Eng., Tsinghua University, Beijing 100084, China
E-mail:
jcb@mail.tsinghua.edu.cn
Abstract: To study the effect of the Three George Reservoir on the flow field of Yangtz river, we use the Shallow Water Equations and the Finite Element Method to simulate the change of the velocity field before and after the construction of the reservoir. Since the scale of this problem is quite large, parallel computing technique is engaged to implement the implicit solving of the large linear system.
Keywords: three george project, finite element method, shallow water equations, parallel computing technique, GMRES method
Parallel Computing Technique attracts more and more attention because of its powerful computing ability. By breaking down a large task into many small ones and letting many PEs (processing elements) compute the tasks simultaneously, the parallel computing technique not only enlarges the maximum numbers of meshes engaged, but also reduce the time required the computation drastically.
Three George Project, by now the largest hydraulic engineering project in China, brings on many engineering problems, one of which is the pollution state of the river will be totally changed. It is desirous to predict the distribution of the pollutants in advance. But the velocity fields should be simulated first.
In this paper, we use parallel computing technique to simulate the two cases of the velocity fields before and after the construction of the Three George Reservoir. We put the emphasis on the issues of parallel implementation, such as the auto mesh partitioning, renumbering and parallel GMRES method.
The river flow can be modeled using the shallow water equations, which are obtained from the conservation of mass and momentum, vertically integrated, assuming a hydrostatic pressure distribution:
(1)
(2)
Where:
is the water elevation,
is the water depth,
is the mean horizontal velocity, g
is the gravitational acceleration,
is the eddy viscosity,
is the density of water,
is the surface shear stress,
is the bottom shear stress.
The shear stress item can be given as:
;
(3)
Where:
is the x-directional velocity of a
certain node,
is the y-directional velocity,
;
is the averaged velocity in
a triangular mesh element,
is the Manning coefficient.
The parallel environment consists of three parts: the hardware, the operating system and the software environment.
A cluster of workstations is chosen as the parallel computing hardware. Clusters have many advantages among which are the scalability, low price and high performance. The workstations are all of IA32 architectures with 550 MHz Pentium III processors, 128 SDRAM and we choose Fast Ethernet (100M/s) as the network form.
Redhat Linux 6.1 is selected as the operating system because of its low memory pre-occupation, superb network performance and its excellent support to many libraries of message passing interfaces.
The software environment comprises the
underlying message passing interface and the parallel programming library. The
MPICH1.2 is chosen as the message passing interface for two reasons: on one hand
the MPI has been an international standard of message passing; on the other hand
there are already many libraries which support MPI. As for the parallel
libraries, we use the famous BLAS and LAPACK library to facilitate the
programming.
Figure 1 shows the whole parallel environment.

We use a Delaunay triangulation algorithm [1] to generate unstructured triangular mesh in the computation domain. After that, the mesh should be partitioned to equal parts and distributed to different processors. To achieve high efficiency, the number of cutting edges should be minimized as small as possible so as to minimize the overhead of communication. We studied several partition algorithm and selected an algorithm [2] based upon Graph Theory and proposed by G.L.Miller etc.
Algorithm 1. Mesh Partitioning
(1) Project up. Project the input points form IRd to the unit sphere centered at the origin in IRd+1 along the line through p and the “north pole” (0, …, 0, 1). See Figure 2-c.
(2) Find Centerpoint. Compute a centerpoint of the projected points in IRd+1. See Figure 2-c.
(3) Rotate and Dilate. Rotate the Projected points about the origin in IRd+1 so that the centerpoint becomes a point (0, …, 0, r) on the d+1st axis; then dilate the surface of the sphere so that the center point becomes the origin. See Figure 2-d.
(4) Find Great Circle. Choose a random great circle on the unit sphere in IRd+1. See Figure 2-d.
(5) Unmap and Project Down. Transform the great circle to a circle in IRd by undoing the dilation, rotation, and stereographic projection. See Figure 2-e.
(6) Convert Circle to Separator. Figure 2-f shows a edge separator.

Fig. 2-a Fig. 2-b Fig. 2-c

Fig. 2-d Fig. 2-e Fig. 2-f
Generally speaking, the linear systems will be preconditioned to reduce the bandwidth of the matrix, either by the left-preconditioning method or right-preconditioning method. Otherwise, the large bandwidth will spoil the speed of convergence and the overhead of communication thus induced will deteriorate the parallel efficiency.
However, all the preconditioning methods require a lot of floating point operations, so we use renumbering to reduce the bandwidth of the matrix instead of preconditioning. This is quite economical since the time required by renumbering is much shorter than that of preconditioning.
The figures below show an example mesh, the corresponding original matrix structure and renumbered matrix structure. We can see that the result is quite good and the required amount of communication is reduced significantly.

Fig. 3-a An example mesh

Fig. 3-b Original mesh structure Fig. 3-c Renumbered Mesh structure
Fig. 4-a Computed flow field before the Fig. 4-b Computed flow field after the
construction of the reservoir construction of the reservoir
Consider the linear system
with
and with a nonsingular
nonsymmetrical matrix
. The Krylov subspace
is defined by
. The GMRES algorithm [3] uses the Arnoldi process to construct an
orthonormal basis
for
. The full GMRES method allows the Krylov subspace dimension to increase up to
and always terminates in at most
iterations. The restarted GMRES
method restricts the Krylov subspace dimension to a fixed value
and restarts the Arnoldi process
using the last iterated
as an initial guess. It may stall.
Below is the algorithm for the restarted GMRES.
Algorithm 2. Sequential GMRES algorithm
is the tolerance for the residual norm;
Convergence = false;
Choose
;
UNTIL convergence DO
* Arnoldi Process: Construct a basis Vm of the Krylov subspace K
FOR j=1, …, m DO
FOR I=1, …, j DO
ENDFOR;
ENDFOR;
* Minimize
for
* Solve a least-square problem of size m
compute
solution of
;
IF
, convergence = true;
ENDDO
Where the matrix
is a upper Hessenberg matrix of
order
, and we get the fundamental relation
The GMRES algorithm computes
where
solves the least-squares problem
. Usually a QR factorization of
using Givens rotations is used to
solve this least-square problem.
Several authors have studied the parallelization
of GMRES. Some of them consider alter the sequential Arnoldi Process to achieve
parallelization [4,5], some of them define a new basis of the Krylov
subspace [6,7], other variants such as
-GMRES [8], GMRES with a Chebychev basis [9] have also
been brought forward.
Our implementation is that we do not alter the sequential algorithm itself, but break down the algorithm into basic operations and parallelize the basic operations. In the GMRES, the basic operation most frequently used is the matrix-vector multiplication and therefore it should be optimized (reduce the communication time as much as possible) so that the overall parallel performance would be enhanced drastically.
After the implementation of parallel solving of linear systems, we discretize the Shallow Water Equation to linear systems using standard Galerkin Method in space domain. In time domain we solve a large linear system using parallel GMRES method in every time step.
Then the flow fields of Yangtz River in JiaLin section, both before the construction of Three George Reservoir and after that, are simulated. The figures in the next page show the results of these two cases.
Acknowledgement
This research is supported by Spring Sunshine Project-1999 under the Ministry of Education and the National Science Foundation of China, the Grant No. is 59979013.
[1] George, P.L., “Automatic Mesh Generation—Application to Finite Element Methods”, WILEY,1991.
[2] Miller G.L., Teng S.H., etc. “Automatic mesh partitioning”. In A.George, J.Gilber, and J.Liu, editors, Sparse Matrix Computations: Graph Theory Issues and Algorithms, IMA Volumes in Mathematics and its Applications. Springer-Verlag, 1993.
[3] Saad, Y. and Schultz, M. H., “GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems”, SIAM, J.Sci.Statist.Comput., 7(1986), pp.856-869.
[4] DI Brozolo, G..R. and Robert, Y., “Parallel conjugate gradient-like algorithms for solving sparse nonsymmetric linear systems on a vector multiprocessor”, Parallel Comput., 11(1989),pp.223-239.
[5] Da Cunha, R.D. and Hopkins, T., “A parallel implementation of the restarted GMRES iterative algorithm for nonsymmetric systems of linear equations, Adv. Comp. Math., 2 (1994), pp261-277.
[6] Kim, S.K. and Chronopoulos, A., “An efficient Arnoldi method implemented on parallel computers”, in 1991 International Conference on Parallel Processing, vol. III, 1991, pp.167-196.
[7] DE Sturler E., “A parallel variant of GMRES(m)”, in Proc. Of the 13th IMACS world Confress on Computation and Applied Mathematics, ed. J.J.H. Miller and R.Vichnevetsky, IMACS, Criterion Press Dublin, 1991, pp.682-683.
[8] Xu X. and Richards B., “Alpha-GMRES: a new parallelizable iterative solver for large sparse nonsymmetrical linear systems arising from CFD, Int. J. Numer. Meth. Fluids, 15(1992), pp.613-623.
[9] Joubert W.D. and Carey G..F., “parallelizable restarted iterative methods for nonsymmetric linear systems I- theory, II-parallel implementation”, Int. J. Comput. Math., 44(1992), pp.243-290.