3.12. GASFLOW Parallelization
3.12.1. Approach
GASFLOW-MPI is the parallel version of GASFLOW using the paradigms of Message Passing Interface (MPI) and domain decomposition. The data structure, parallel linear solvers and pre-conditioners of Portable Extensible Toolkit for Scientific Computing (PETSc) were employed.
PETSc is one of the most widely used software library for high-performance computational science. It can provide numerical infrastructure for application codes in which the implicit numerical solution of partial differential equations are involved. PETSc features distributed data structures, such as index sets, distributed vectors and distributed matrices in several sparse storage formats, as the fundamental objects. Krylov subspace methods, preconditioners and Newton-like nonlinear methods are implemented in a data structure-neutral manner which provides a uniform interface for ap-plication programmers. The portability of PETSc is achieved through MPI, but the detailed message passing required during the coordination of the computations is handled inside the PETSc library.
GASFLOW serial version was written in FORTRAN 90 with more than 120,000 lines and 634 subroutines in version 3.5. The ICE’d-ALE solution methodology incorporated in GASFLOW requires the solution of an elliptic pressure equation for the efficient calculation of flows at all-speeds. The discretization of this elliptic equation results in a large scale symmetrically sparse linear equation system. The GASFLOW serial preconditioning algorithm is dependent upon a recursive numerical methodology that heavily depends upon “indirect addressing” which may reduce the computational efficiency and not be suitable for parallelization. Therefore, all the programs relevant to the linear solver and preconditioner in GASFLOW serial version must be replaced by the parallel linear solvers and preconditioners in the PETSc library. Sparse symmetric system is derived from the discretization of the elliptic pressure equation in GASFLOW-MPI. The combination of linear solver, conjugate gradient (CG) and pre-conditioner, Block Jacobi (BJACOBI), was selected as the default solver for the solution of the elliptic pressure equation in the current version of GASFLOW-MPI.
3.12.2. Domain decomposition
In principle, the user should not manually decompose the computational domain in ingf file. By default, the computational domain is automatically decomposed in an optimized way in GASFLOW-MPI. Nevertheless, there is an option available to manually control the domain decomposition for advanced users. For most of the users, they can use the same input deck as they used for GASFLOW serial version.
The input variable is “autodecomp”. By default, autodecomp = 1 which means the domain decom-position is controlled automatically in GASFLOW-MPI. Unless absolutely necessary, such as for debugging purpose, the user can use autodecomp = 0 to manually control the domain decomposition. nxprocs, nyprocs and nzprocs are number of processes in x, y and z axis, respectively. Please note that nxprocs_nyprocs_nzprocs must be equal to the total number of processes allocated to the parallel computating.
Warning: autodecomp=0 means that the users manually control the domain decomposition. With this option, best performance is not guaranteed. It is highly recommended that the users use the default value autodecomp = 1.
3.12.3. To obtain decent parallel efficiency
GASFLOW-MPI can run on any kind of parallel systems which supports MPI. In order to achieve the best parallel performance, the users need to have:
A fast, low-latency interconnect between computational nodes;
High per-core memory performance. Each core needs to have its own memory bandwidth of roughly 2 or more Gigabytes/second. This is because the speed of sparse matrix computations is almost totally determined by the speed of the memory access, not the speed of the CPU. Number of floating point instructions submitted to the CPU is significantly less than number of memory references which have to be resolved to obtain data, meaning that matrix vector multiply kernel is memory bound;
The computational domain must be decomposed in the way that each sub-domain has no less than approximately 10,000~20,000 cells. Workload of each CPU must overweigh the communication time. For example, for small problem with 640,000 cells, using 64 processor can usually obtain good speed-up. The performance may decrease by using more processors because the communication effort increases.
3.12.4. Running GASFLOW-MPI
We will demonstrate how to run GASFLOW-MPI in parallel on distributed processors. Domain decomposition is used as the method of parallel computing. The geometry and associated field variables are broken into small pieces in sub-domains and allocated to separate processors for solution. The parallel running uses the public domain openMPI implementation of the standard MPI. GASFLOW-MPI has been designed to be compatible to the input and output of the GASFLOW serial version. It means GASFLOW-MPI can read the same input file, ingf, and export the same calculation results in NETCDF format as GASFLOW serial version. Therefore, the users can use GASFLOW-MPI in the same way as they used the GASFLOW serial version without the need to know details of parallelization.
GASFLOW-MPI can be run on a local multiprocessor machine very simply but when running on machines across a network, a file must be created that contains the host names of the machines. The file can be given any name and located at any path. In the following description we shall refer to such a file by the generic name, including full path, <machines>.
An application is run in parallel using mpirun.
mpirun --hostfile <machines> -np <nProcs> xgfmpi
xgfmpi denotes the executable of GASFLOW-MPI. -np represents number of processes the user needs for the parallel computing. For example, if you have the hostfile, hostpgf, and you want to run xgfmpi with 32 processes:
mpirun --hostfile hostgf -np 32 xgf
Last updated
Was this helpful?