Two main classes of models exist to perform tasks in parallel computers. These use shared memory architecture and distributed memory architecture. The first builds from a set of processors which have access to a common memory (SMP machines), while in the later each processor has its ow/ private memory and messages are used for communication between units (cluster computing devices).

Both models are used with specific APIs/systems that implement multiprocessing programming. OpenMP is used for SMP and MPI for distributed-memory machines.

OpenMP
MPI
Implementation and comparison of OpenMP/MPI;
Multidimensional arrays in MPI
Parallelization with random numbers

OpenMP

Some OpenMP directives, clauses, and library references. To refresh the use of the directives, check this summary sheet.

MPI

Excellent resources with tutorials and examples can be found in Mey and Reichstein (2007) or Barney (2012).

Implementation and comparison of OpenMP/MPI

Computation of \pi using:
4\pi=\int_{0}^{1}\frac{1}{x^{2}+1}dx

The parallelization strategy consists in approximating the above integral with an array of N elements. The implementation under OpenMP can be found here: pi_omp.f90, pi_omp.out. The MPI implementation can instead be found here: teste_mpi.f90, teste_mpi.out

Some notes on passing multidimensional arrays in MPI

Communicating with multidimensional arrays in MPI isn't trivial as message passing transforms such arrays in contiguous vectors. For that reason, it is better to assume directly that constrain by transforming an initial array requiring some work on a contiguous vector. Special care should be given to the leading indexes of the loop that is being parallelized and on the conformity with the independent work across loops. The following code give us an example of such implementation: arraysmpi.f90, arraysmpi.out

Issues with parallelization with random numbers

It is possible that a program calling, for example, random_number runs MORE SLOWLY when adding parallel threads. This is due to a contention among the access by threads of internal saved variables that control the state of the random number generator. A potential solution (far from optimal) is to specify a specific seed per thread.

As an example we have: random_omp.f90, random_omp.out