Patterns used when threads share data values

11. Shared Data Algorithm Strategy: Parallel-for-loop pattern needs non-shared, private variables

file: openMP/11.private/private.c

Build inside 11.private directory:

make private

Execute on the command line inside 11.private directory:

./private

In this example, you will try a parallel for loop where additional variables (i, j in the code) cannot be shared by all of the threads, but must instead be private to each thread, which means that each thread has its own copy of that variable. In this case, the outer loop is being split into chunks and given to each thread, but the inner loop is being executed by each thread for each of the elements in its chunk. The loop counting variables must be maintained separately by each thread. Because they were initially declared outside the loops at the begininning of the program, by default these variables are shared by all the threads.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/* private.c
 * ... illustrates why private variables are needed with OpenMP's parallel for loop
 *
 * Joel Adams, Calvin College, November 2009.
 *
 * Usage: ./private 
 *
 * Exercise: 
 * - Run, noting that the sequential program produces correct results
 * - Uncomment line A, recompile/run and compare
 * - Recomment line A, uncomment line B, recompile/run and compare
 */

#include <stdio.h>
#include <omp.h>
#include <stdlib.h>

#define SIZE 100

int main(int argc, char** argv) {
    int i, j, ok = 1;
    int m[SIZE][SIZE];

    printf("\n");
    // set all array entries to 1
//    #pragma omp parallel for                     // A
//    #pragma omp parallel for private(i,j)        // B
    for (i = 0; i < SIZE; i++) {
        for (j = 0; j < SIZE; j++) {
            m[i][j] = 1;
        }
    }

    // test (without using threads)
    for (i = 0; i < SIZE; i++) {
        for (j = 0; j < SIZE; j++) {
            if ( m[i][j] != 1 ) {
                printf("Element [%d,%d] not set... \n", i, j);
                ok = 0;
            }
        }
    }

    if ( ok ) {
        printf("\nAll elements correctly set to 1\n\n");
    }

    return 0;
}

12. Race Condition: missing the mutual exclusion coordination pattern

file: openMP/12.mutualExclusion-atomic/atomic.c

Build inside 12.mutualExclusion-atomic directory:

make atomic

Execute on the command line inside 12.mutualExclusion-atomic directory:

./atomic

When a variable must be shared by all the threads, as in this example below, an issue called a race condition can occur when the threads are updating that variable concurrently. This happens because there are multiple underlying machine instructions needed to complete the update of the memory location and each thread must execute all of them atomically before another thread does so, thus ensuring mutual exclusion between the threads when updating a shared variable. This is done using the OpenMP pragma shown in this code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/* atomic.c
 * ... illustrates a race condition when multiple threads read from / 
 *  write to a shared variable (and explores OpenMP atomic operations).
 *
 * Joel Adams, Calvin College, November 2009.
 *
 * Usage: ./atomic
 *
 * Exercise:
 *  - Compile and run 10 times; note that it always produces the correct balance: $1,000,000.00
 *  - To parallelize, uncomment A, recompile and rerun multiple times, compare results
 *  - To fix: uncomment B, recompile and rerun, compare
 */

#include <stdio.h>  // printf()
#include <omp.h>    // OpenMP

int main() {
    const int REPS = 1000000;
    int i;
    double balance = 0.0;
  
    printf("\nYour starting bank account balance is %0.2f\n", 
               balance);

    // simulate many deposits
    // #pragma omp parallel for                      // A
    for (i = 0; i < REPS; i++) {
        // #pragma omp atomic                        // B
        balance += 1.0;
    }

    printf("\nAfter %d $1 deposits, your balance is $%0.2f\n\n", 
		REPS, balance);

    return 0;
}

13. The Mutual Exclusion Coordination Pattern: two ways to ensure

file: openMP/13.mutualExclusion-critical/critical.c

Build inside 13.mutualExclusion-critical directory:

make critical

Execute on the command line inside 13.mutualExclusion-critical directory:

./critical

Here is another way to ensure mutual exclusion in OpenMP.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/* critical.c
 * ... fixes a race condition when multiple threads read from / 
 *  write to a shared variable	using the OpenMP critical directive.
 *
 * Joel Adams, Calvin College, November 2009.
 *
 * Usage: ./critical
 *
 * Exercise:
 *  - Compile and run several times; note that it always produces the correct balance $1,000,000.00 
 *  - Comment out A; recompile/run, and note incorrect result
 *  - To fix: uncomment B1+B2+B3, recompile and rerun, compare
 */

#include<stdio.h>
#include<omp.h>

int main() {
    const int REPS = 1000000;
    int i;
    double balance = 0.0;
  
    printf("\nYour starting bank account balance is %0.2f\n", balance);

    // simulate many deposits
    #pragma omp parallel for
    for (i = 0; i < REPS; i++) {
        #pragma omp atomic                          // A
//        #pragma omp critical                      // B1
//        {                                         // B2
        balance += 1.0;
//        }                                         // B3
    }

    printf("\nAfter %d $1 deposits, your balance is %0.2f\n", 
		REPS, balance);

    return 0;
}

14. Mutual Exclusion Coordination Pattern: compare performance

file: openMP/14.mutualExclusion-critical2/critical2.c

Build inside 14.mutualExclusion-critical2 directory:

make critical2

Execute on the command line inside 14.mutualExclusion-critical2 directory:

./critical2

Here is an example of how to compare the performance of using the atomic pragma directive and the critical pragma directive. Note that there is a function in OpenMP that lets you obtain the current time, which enables us to determine how long it took to run a particular section of our program.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
/* critical2.c
 * ... compares the performance of OpenMP's critical and atomic directives
 *
 * Joel Adams, Calvin College, November 2009.
 *
 * Usage: ./critical2
 *
 * Exercise:
 *  - Compile, run, compare times for critical vs. atomic
 *  - Note how much more costly critical is than atomic
 *  - Research: Create an expression that, when assigned to balance,
 *     critical can handle but atomic cannot 
 */

#include<stdio.h>
#include<omp.h>

void print(char* label, int reps, double balance, double total, double average) {
    printf("\nAfter %d $1 deposits using '%s': \
            \n\t- balance = %0.2f, \
            \n\t- total time = %0.12f, \
            \n\t- average time per deposit = %0.12f\n\n", 
               reps, label, balance, total, average);
}

int main() {
    const int REPS = 1000000;
    int i;
    double balance = 0.0,
           startTime = 0.0, 
           stopTime = 0.0,
           atomicTime = 0.0,
           criticalTime = 0.0;
  
    printf("\nYour starting bank account balance is %0.2f\n", balance);

    // simulate many deposits using atomic
    startTime = omp_get_wtime();
    #pragma omp parallel for 
    for (i = 0; i < REPS; i++) {
        #pragma omp atomic
        balance += 1.0;
    }
    stopTime = omp_get_wtime();
    atomicTime = stopTime - startTime;
    print("atomic", REPS, balance, atomicTime, atomicTime/REPS);


    // simulate the same number of deposits using critical
    balance = 0.0;
    startTime = omp_get_wtime();
    #pragma omp parallel for 
    for (i = 0; i < REPS; i++) {
         #pragma omp critical
         {
             balance += 1.0;
         }
    }
    stopTime = omp_get_wtime();
    criticalTime = stopTime - startTime;
    print("critical", REPS, balance, criticalTime, criticalTime/REPS);

    printf("criticalTime / atomicTime ratio: %0.12f\n\n", 
             criticalTime / atomicTime);

    return 0;
}

15. Mutual Exclusion Coordination Pattern: language difference

file: openMP/15.mutualExclusion-critical3/critical3.c

Build inside 15.mutualExclusion-critical3 directory:

make critical3

Execute on the command line inside 15.mutualExclusion-critical3 directory:

./critical3

The following is a C++ code example to illustrate some language differences between C and C++. Try the exercises described in the code below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* critical3.c
 * ... a simple case where OpenMP's critical works but atomic does not.
 *
 * Joel Adams, Calvin College, November 2009.
 *
 * Usage: ./critical3
 *
 * Exercise:
 *  - Compile, run, note resulting output is correct.
 *  - Uncomment line A, recompile, rerun, note results.
 *  - Uncomment line B, recompile, note results.
 *  - Recomment line B, uncomment line C, recompile, 
 *     rerun, note change in results. 
 */

#include<iostream>   // cout
#include<omp.h>      // openmp
using namespace std;

int main(int argc, char** argv) {
    cout << "\n";

    if (argc > 1) {
        omp_set_num_threads( atoi(argv[1]) );
    }

//    #pragma omp parallel                          // A
    {
        int id = omp_get_thread_num();
        int numThreads = omp_get_num_threads();
//        #pragma omp atomic                        // B
//        #pragma omp critical                      // C
        cout << "Hello from thread #" << id
             << " out of " << numThreads
             << " threads.\n";
    }

    cout << "\n";
}

Some Explanation

A C line like this:

printf(“Hello from thread #%d of %dn”, id, numThreads);

is a single function call that is pretty much performed atomically, so you get pretty good output like.

Hello from thread #0 of 4
Hello from thread #2 of 4
Hello from thread #3 of 4
Hello from thread #1 of 4

By contrast, the C++ line:

cout << “Hello from thread #” << id << ” of ” << numThreads << endl;

has 5 different function calls, so the outputs from these functions get interleaved within the shared stream cout as the threads ‘race’ to write to it. You may have observed output similar to this:

Hello from thread #Hello from thread#Hello from thread#0 of 4Hello from thread#

2 of 43 of 4

1 of 4

The other facet that this particular patternlet shows is that OpenMP’s atomic directive will not fix this – it is too complex for atomic, so the compiler flags that as an error. To make this statement execute indivisibly, you need to use the critical directive, providing a pretty simple case where critical works and atomic does not.