CS 240A Progress Report of Final Project
Investigation of a Finite Volume Code Scaling Performance
Esteban Gonzalez
Feb 28, 2006

Contents

1  Summary
2  Density currents
3  Code description
4  Procedure
5  Parallel computers used
6  Time study results

1  Summary

A time study was performed on the main operations of a parallel code used to simulate density currents. We were interested in knowing where potential improvements could be made to increase the speed at which such operations are carried. It was found that modifications to the multigrid solver and the integration of the stresses along a specific surface could result in drastic speed improvements.
Four are the future objectives of the present project: 1) Add one more parallel computer to the time study. 2) Study the effect on how partitioning of the two-dimensional domain affects the elapsed time of different operations in the code. 3) Implement a parallel integration of the stresses along a surface. 4) Implement a "completely" parallel multigrid solver (This will not be ready by March-22).

2  Density currents

The flow of one fluid within another, both of which are miscible, due to a horizontal density gradient is called a density (or gravity) current. An simple example of a density current occurs every time a refrigerator door is open. The denser cold air coming from inside the refrigerator displaces the light warm air of the surroundings. The denser cold air is "put into motion" by the perturbation of the interface between the cold and hot air. Such perturbation starts by suddenly opening the door. Afterwards, because of the horizontal density gradient, the subsequent motion is driven by gravity forces.
Because density currents arise in many different environmental and industrial situations, their study have been of great importance in a diverse number of scientific disciplines. Features of practical interest are the speed of the current front and the final run-out length.

3  Code description

The continuity, momentum conservation, and scalar conservation equations in cartesian coordinates are discretized in the streamwise x and vertical y directions using the finite volume method. Fourier transforms are taken on the spanwise z direction. The problem is, essentially, two-dimensional. A semi-implicit iterative solution procedure is used: on the momentum and scalar conservation equations diffusion and advection terms are treated implicitly in the y direction only. Second order accuracy is ensured in space and time. The main advantages of using this finite volume code in the study of density currents is its ability to easily introduce complicated rectangular geometries, e.g. a density current flowing over an square obstacle, and the possibility of simulating the high Reynolds number flows that are observed in nature. On the other hand, its main drawback is its requirement of considerable computer power to accurately resolve the flowfield in comparison with, say, a code using spectral methods or higher order finite differences. Efficient parallelization of the code is then imperative.
At each time-step, the following operations are performed:
The code is written in Fortran and parallelized with MPI.

4  Procedure

A set of system timer subroutines were incorporated into the code to measure the elapse time of the following operations:
The code was run for exactly the same conditions during 10 timesteps such that the elapsed time of each operation was averaged from at least 10 measurements. The timer subroutines were called in every processor and the final measure of the elapsed time was taken as the maximum time among the processors. Errors were introduce into this time study primarily from the fact that other users were competing for the processors, network bandwidth, and I/O bandwidth of the machines used. Furthermore, being this a simulation of a turbulent flow, the initial conditions required random number generators, adding more errors into the present time study.
The two-dimensional domain was split as shown in figure 1.


Figure 1: Splitting of the two-dimensional domain.

5  Parallel computers used

Three different parallel computers were used in this study. Its characteristics and compilers are described below.

6  Time study results

Figure 2 shows the elapsed time for different operations of the code in DATASTAR. tri denotes the elapsed time for the tridiagonal solver (Thomas algorithm), fft the time for the FFT algorithm, mg for the multigrid algorithm, stress for the stress integration along a surface, and sim is the total time for the steps outlined in a previous section. The stress calculation was not parallelized, so its elapsed time was expected to be approximately constant with variation of the number of processors. Since a gather call is invoked in the stress calculation subroutines at each time step, it was also expected that the time of such operations will slightly increase with number of processors due to an increase in communication). This is shown in figure 2. The most surprising result of figure 2 is that the elapsed time for the multigrid solver is independent of the number of processors! It was found that such operation was performed only by two processors regardless of the number of processors chosen. Considering that this is were most of the time is spent at each time step of the simulation, an efficient parallelization of the multigrid solver could make the overall computation much cheaper. The tridiagonal solver and FFT algorithm seem to scale well with the number of processors. Keep in mind the importance of the tridiagonal solver in this code: it is invoked at several times at each iteration. Similar trends were found when using a 1024x256 grid. Figure 3 shows the elapsed time for different operations of the code in SNOWWHITE. The trends are similar to those of DATASTAR, but the elapsed times are longer.

Figure 2: Elapsed time for different operations of the code in DATASTAR using a 2048x512 grid.

Figure 3: Elapsed time for different operations of the code in SNOWWHITE using a 2048x512 grid.



File translated from TEX by TTH, version 3.61.
On 28 Feb 2006, 13:43.