Shared Memory Program Structure and Coordination Patterns

0. Program Structure Implementation Strategy: The basic fork-join pattern

file: Vath_pth/00.forkJoin/forkJoin.C

Build inside 00.forkJoin directory:

make forkJoin

Execute on the command line inside 00.forkJoin directory:

./forkJoin

The SPool TH() constructor on line 20, tells the compiler to create a team of two worker threads. The Dispatch() function activates the set of threads to execute the thread function passed as an argument. The WaitForIdle() function joins the threads after all worker threads have completed their task. Notice that unlike OpenMP, the join is explicit. You can conceptualize how this works using the following diagram, where time is moving from left to right:

../_images/ForkJoin1.png

Observe what happens on the machine where you are running this code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/*
 * forkJoin.C
 *
 * Using Victor Alessandrini's vath_pth library
 * ... illustrates the fork-join pattern
 *
 * Modeled from code provided by Joel Adams, Calvin College, November 2009.
 * Hannah Sonsalla, Macalester College, 2017.
 *
 * Usage: ./forkJoin
 *
 * Exercise:
 * - Compile & run
 */

#include <stdlib.h>
#include <stdio.h>
#include <SPool.h>

SPool TH(2);    // initialize threads, global

void thread_fct(void *P)  {
    printf("\n During... \n");
}

int main(int argc, char **argv)  {

    printf("\n Before... \n");
    // -------------------------------------------------------
    TH.Dispatch(thread_fct, NULL); // activates worker threads
    TH.WaitForIdle();   // joins all worker threads
    // -------------------------------------------------------
    printf("\n After... \n\n");

    return 0;
}

1. Program Structure Implementation Strategy: Fork-join with setting the number of threads

file Vath_pth/01.forkJoin2/forkJoin2.C

Build inside 01.forkJoin2 directory:

make forkJoin2

Execute on the command line inside 01.forkJoin2 directory:

./forkJoin2

This code illustrates that one program can fork and join more than once. Programmers can set the number of threads to use when creating the team of worker threads.

Note on line 22 there is a vath library SPool utility function called Spool TH() that takes the number of threads as an argument. Follow the instructions in the header of the code file to understand how constructing SPool objects, and forking and joining threads repeatedly works.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/*
 * forkJoin2.C
 *
 * Using Victor Alessandrini's vath_pth library.
 * ... illustrates the fork-join pattern and setting number of threads.
 *
 * Modeled from code provided by Joel Adams, Calvin College, November 2009.
 * Hannah Sonsalla, Macalester College, 2017.
 *
 * Usage: ./forkJoin2
 *
 * Exercise:
 * - Compile & run
 * - Rebuild and rerun using 2, 3, 4 threads
 * - What do you notice about the number of times statements are printed?
 */

#include <stdlib.h>
#include <stdio.h>
#include <SPool.h>

SPool TH(1);    // initialize threads, global

void thread_fct(void *P)  {
    printf("\n  Part Completed");
}

void runWorkerThreads(int n)  {
    for (int i = 0; i < n; i++){
      TH.Dispatch(thread_fct, NULL);
      TH.WaitForIdle();
    }
}

int main(int argc, char **argv)  {

    printf("\n Beginning");
    // -------------------------------------------------------
    runWorkerThreads(2);
    // -------------------------------------------------------
    printf("\n Between I and II... ");
    runWorkerThreads(3);
    // -------------------------------------------------------
    printf("\n Between II and III... ");
    runWorkerThreads(1);
    // -------------------------------------------------------
    printf("\n End \n");

    return 0;
}

2. Program Structure Implementation Strategy: Single Program, multiple data

file: Vath_pth/02.spmd/spmd.C

Build inside 02.spmd directory:

make spmd

Execute on the command line inside 02.spmd directory:

./spmd

Note how there is a SPool utility function GetRank() to obtain a thread number. We have one program, but multiple threads executing the thread function, each with a copy of the rank variable. Programmers write one program, but write it in such a way that each thread has its own data values for particular variables. This is why this is called the single program, multiple data (SPMD) pattern.

Most parallel programs use this SPMD pattern as writing one program is ultimately the most efficient method for programmers. It does require you as a programmer to understand how this works, however. Each thread executing in parallel has its own set of variables. Conceptually, it looks like this, where each thread has its own memory for the variable rank:

../_images/SPMD.png

When you execute the code, what do you observe about the order of the printed lines? Run the program multiple times–does the ordering change? This illustrates an important point about threaded programs: the ordering of execution of statements between threads is not guaranteed. This is also illustrated in the diagram above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/*
 * spmd.C
 *
 * Using Victor Alessandrini's vath_pth library.
 * ... illustrates the single-program-multiple-data (SPMD) pattern
 *
 * Modeled from code provided by Joel Adams, Calvin College, November 2009.
 * Hannah Sonsalla, Macalester College, 2017.
 *
 * Usage: ./spmd
 *
 * Exercise:
 * - Compile & run multiple times - what do you observe about the
 *   order of the printed lines?
 */

#include <stdlib.h>
#include <stdio.h>
#include <SPool.h>

SPool TH(8);

void thread_fct(void *P)  {

    int rank = TH.GetRank();
    printf("Hello from thread %d \n", rank);

}

int main(int argc, char **argv)  {

    TH.Dispatch(thread_fct, NULL);
    TH.WaitForIdle();

    return 0;
}

3. Program Structure Implementation Strategy: Single Program, multiple data with user-defined number of threads

file: Vath_pth/03.spmd2/spmd2.C

Build inside 03.spmd2 directory:

make spmd2

Execute on the command line inside 03.spmd2 directory:

./spmd2 4
Replace 4 with other values for the number of threads

Here we enter the number of threads to use on the command line. This is a useful way to make your code versatile so that you can use as many threads as you would like. In this case, a global pointer to a SPool object is declared and it is later initialized by main().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/*
 * spmd2.C
 *
 * Using Victor Alessandrini's vath_pth library.
 * ... illustrates the single-program-multiple-data (SPMD)
 *     using command line arguments to control the
 *     number of threads.
 *
 * Modeled from code provided by Joel Adams, Calvin College, November 2009.
 * Hannah Sonsalla, Macalester College, 2017.
 *
 * Usage: ./spmd2 [numThreads]
 *
 * Exercise:
 * - Compile & run with no commandline args
 * - Rerun with different commandline arg 4, 10, 20, etc.
 *
 */

#include <stdlib.h>
#include <stdio.h>
#include <SPool.h>

SPool *TH;

void thread_fct(void *P)  {

    int rank = TH->GetRank();
    printf("Hello from thread %d \n", rank);

}

int main(int argc, char **argv)  {
    int numThreads;

    if(argc==2) numThreads = atoi(argv[1]);
    else numThreads = 4;     // default number of threads

    // Create worker threads
    // -----------------------------
    TH = new SPool(numThreads);

    // Launch worker threads
    // -----------------------------
    TH->Dispatch(thread_fct, NULL);
    TH->WaitForIdle();

    delete TH;
    return 0;
}

4. Coordination: Synchronization with a Barrier

file: Vath_pth/04.barrier/barrier.C

Build inside 04.barrier directory:

make barrier

Execute on the command line inside 04.barrier directory:

./barrier 4
Replace 4 with other values for the number of threads

The barrier pattern is used in parallel programs to ensure that all threads complete a parallel section of code before execution continues. This can be necessary when threads are generating computed data (in an array, for example) that needs to be completed for use in another computation.

Conceptually, the running code is executing like this:

../_images/Barrier1.png

Note what happens with and without the commented barrier function on line 42.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/*
 * barrier.c
 *
 * AUTHOR: Victor Alessandrini, 2015
 * Example in book "Shared Memory Application Programming"
 * Edited by Hannah Sonsalla, Macalester College, 2017.
 *
 * ... illustrates the use of the barrier command,
 * 	   using the commandline to control the number of threads...
 *
 * Shows how to construct Barrier synchronization using the Pthreads
 * idle wait protocol. All threads write "before" message, wait
 * on the barrier, write "after" message, and exit.
 *
 * Usage: ./barrier [numThreads]
 *
 * Exercise:
 * - Compile & run several times, noting interleaving of outputs.
 * - Uncomment the BarrierWait function on line A, recompile, rerun,
 *    and note the change in the outputs.
 */

#include <stdlib.h>
#include <stdio.h>
#include <SPool.h>
#include <pthread.h>

#include "pthreadBarrier.h"

int numThreads;
SPool *TH;

// -------------------
// Worker threads code
// -------------------

void thread_fct(void *idp)  {

    int rank = TH->GetRank();
    printf("Thread %d of %d is BEFORE barrier\n", rank, numThreads);
    //BarrierWait(rank);                                // A
    printf("Thread %d of %d is AFTER barrier\n", rank, numThreads);

}

int main(int argc, char **argv)  {

    if(argc==2) numThreads = atoi(argv[1]);
    else numThreads = 2;
    count = numThreads;

    // Create worker threads
    // ------------------------------
    TH = new SPool(numThreads);

    // Launch worker threads
    // -----------------------------
    TH->Dispatch(thread_fct, NULL);
    TH->WaitForIdle();
    return 0;

}