Vector Add with MPIΒΆ

Message passing is one way of distributing work to multiple processes that run indepentdently and concurrently on either a single computer or a cluster of computers. The processes, which are designated to start up in the software and are run by the operating system of the computer, serve as the processing units. This type of parallel programming has been used for quite some time and the software libraries that make it available follow a standard called Message Passing Interface, or MPI.

One feature of MPI programming is that one single program designates what all the various processes will do– a single program runs on all the processes. Each process has a number or rank, and the value of the rank is used in the code to determine what each process will do. In the following code, the process numbered 0 does some additional work that the other processes do not do. This is very common in message passing solutions, and process 0 is often referred to as the master, and the other processes are the workers. In the code below, look for three places where a block of code starts with this line:

if (rank == MASTER)  {

This is where the master is doing special work that only needs to be done once by one process. In this case, it is the initialization of the arrays at the beginning of the computation, the check to ensure accuracy after the main computation of vector addition is completed, and freeing the memory for the arrays at the end.

The MPI syntax in this code takes some getting used to, but you should see the pattern of how the data decomposition is occuring for a process running this code:

  1. First initialize your set of processes (the number of processes in designated when you run the code).
  2. Determine how many processes there are working on the problem.
  3. Determine which process rank I have.
  4. If I am the master, initialze the data arrays.
  5. Create smaller arrays for my portion of the work.
  6. Scatter the equal-sized chunks of the larger original arrays from the master out to each process to work on.
  7. Compute the vector addition on my chunk of data.
  8. Gather the completed chunks of my array C and those of each process back onto the larger array on the master.
  9. Terminate all the processes.

The following code contains comments with these numbered steps where they occur. This is the file VectorAdd/MPI/VA-MPI-simple.c in the compressed tar file of examples that accompanies this reading.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
/*
 *  Prerequisties:
 *     This code runs using an MPI library, either OpenMPI or MPICH2.
 *     These libraries can be installed in either a cluster of computers
 *     or a multicore machine.
 *     
 *  How to compile:
 *     mpicc -o vec-add VA-MPI-simple.c
 *
 *  How to execute:
 *     mpirun -np 2 ./vec-add
 *
 *     Note that this executes the code on 2 processes, using the -np command line flag.
 *     See ideas for further exploration of MPI using this code at the end of this file.
 */


#include "mpi.h"      // must have a system with an MPI library
#include <stdio.h>    //printf
#include <stdlib.h>   //malloc

/*
 * Definitions
 */
#define MASTER 0         //One process will take care of initialization
#define ARRAY_SIZE 8     //Size of arrays that will be added together.

/*
 *  In MPI programs, the main function for the program is run on every
 *  process that gets initialized when you start up this code using mpirun.
 */
int main (int argc, char *argv[]) 
{
	// elements of arrays a and b will be added
	// and placed in array c
	int * a;
	int * b; 
	int * c;
	
	int total_proc;	 // total nuber of processes	
	int rank;        // rank of each process
	int n_per_proc;	// elements per process	
	int n = ARRAY_SIZE;   // number of array elements
	int i;       // loop index
		
	MPI_Status status;   // not used in this arguably poor example
	                     // that is devoid of error checking.

	// 1. Initialization of MPI environment
	MPI_Init (&argc, &argv);
	MPI_Comm_size (MPI_COMM_WORLD, &total_proc);
	// 2. Now you know the total number of processes running in parallel
	MPI_Comm_rank (MPI_COMM_WORLD,&rank);
	// 3. Now you know the rank of the current process
	
	// Smaller arrays that will be held on each separate process
    	int * ap;
	int * bp;
	int * cp;
	
	// 4. We choose process rank 0 to be the root, or master,
	// which will be used to  initialize the full arrays.
	if (rank == MASTER)  {
		a = (int *) malloc(sizeof(int)*n);
		b = (int *) malloc(sizeof(int)*n);
		c = (int *) malloc(sizeof(int)*n);
		
		// initialize arrays a and b with consecutive integer values
		// as a simple example
		for(i=0;i<n;i++)
			a[i] = i;
		for(i=0;i<n;i++)
			b[i] = i;
	}
	
	// All processes take part in the calculations concurrently
		
	// determine how many elements each process will work on
	n_per_proc = n/total_proc;
	/////// NOTE:
	// In this simple version, the number of processes needs to
	// divide evenly into the number of elements in the array
	///////////
	
	// 5. Initialize my smaller subsections of the larger array
	ap = (int *) malloc(sizeof(int)*n_per_proc);
	bp = (int *) malloc(sizeof(int)*n_per_proc);
	cp = (int *) malloc(sizeof(int)*n_per_proc);
	
	// 6.
	//scattering array a from MASTER node out to the other nodes
	MPI_Scatter(a, n_per_proc, MPI_INT, ap, n_per_proc, MPI_INT, MASTER, MPI_COMM_WORLD); 
	//scattering array b from MASTER node out to the other node
	MPI_Scatter(b, n_per_proc, MPI_INT, bp, n_per_proc, MPI_INT, MASTER, MPI_COMM_WORLD); 
	
	// 7. Compute the addition of elements in my subsection of the array
	for(i=0;i<n_per_proc;i++)
		cp[i] = ap[i]+bp[i];
	
	// 8. MASTER node gathering array c from the workers
	MPI_Gather(cp, n_per_proc, MPI_INT, c, n_per_proc, MPI_INT, MASTER, MPI_COMM_WORLD);

/////////////////////// all concurrent processes are finished once they all communicate
/////////////////////// data back to the master via the gather function.

	// Master process gets to here only when it has been able to gather from all processes
	if (rank == MASTER)  {			
		// sanity check the result  (a test we would eventually leave out)
		int good = 1;
		for(i=0;i<n;i++) {
			//printf ("%d ", c[i]);
			if (c[i] != a[i] + b[i]) {
				printf("problem at index %lld\n", i);
				good = 0;
				break;
			}
		}
		if (good) {
			printf ("Values correct!\n");
		}
		
	}

	// clean up memory
	if (rank == MASTER)  {
		free(a);  free(b); free(c);
	}
	free(ap);  free(bp); free(cp);
	
	// 9. Terminate MPI Environment and Processes
	MPI_Finalize();  
	
	return 0;
}