Coin-flipping in Parallel

Now that we know a bit about how OpenMP works to provide threads that run code in parallel, let’s look at how we can update our coin-flipping example. The places in this code where you see this comment:

    /***  OMP ***/

indicate where OpenMP was used to enable running the original coin-flipping code example on multiple threads, or where the code needed changes to enable running on multiple threads. Examine these places in the following code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
/*
  Original code provided by Dave Valentine, Slippery Rock University.
  Edited by Libby Shoop, Macalester College.
*/
//
// Simulate many coin flips with rand_r() on multiple threads
// to determine how random the values are that are returned
// from each call.
//

#include <stdio.h>        // printf()
#include <stdlib.h>       // srand() and rand()
#include <time.h>         // time()

#include <omp.h>          // OpenMP functions and pragmas


//Standard chi sqaure test
double chiSq(int heads, int tails) {
    double sum = 0;                //chi square sum
    double tot = heads+tails;      //total flips
    double expected = 0.5 * tot;   //expected heads (or tails)
    
    sum = ((heads - expected)*(heads-expected)/expected) + \
        ((tails - expected)*(tails-expected)/expected);
    return sum;
}



int main() {
    int numFlips,             //loop control
        numHeads, numTails;   //counters
    
    /***  OMP ***/
    int nThreads;           // number of threads to use
    double ompStartTime, ompStopTime;  // holds wall clock time
    /***  OMP ***/


/***** Initialization *****/
    
    printf("Threaded Simulation of Coin Flip using rand_r()\n");
    /***  OMP ***/
    nThreads = 4;     // try increasing this if you have more cores
    
    //print our heading text
    printf("\n\n%15s%15s%15s%15s%15s%15s",
           "Trials","numHeads","numTails","total",
           "Chi Squared", "Time (sec)\n");
    
    
    //create seed using current time
    unsigned int seed = (unsigned) time(NULL);
    
    //create the pseudorandom number generator
    srand(seed);


// Try several trials of different numbers of flips doubling how many each round.
// 
// Use a unsigned int because we will try a great deal of flips for some trials.
    unsigned int trialFlips = 256;          // start with a smal number of flips
    unsigned int maxFlips = 1073741824;     // end with a very large number of flips
    
    // below we will double the number of trial flips and come back here
    // and run another trial, until we have reached > maxFlips.
    // This will be a total of 23 trials
    while (trialFlips <= maxFlips) {  
        
        numHeads = 0;               //reset counters
        numTails = 0;
        
        /***  OMP ***/
        ompStartTime = omp_get_wtime();   //get start time for this trial
    
    /***** Flip a coin trialFlips times, on each thread in parallel,
     *     with each thread getting its 1/4 share of the total flips.
     *****/

/***  OMP ***/
#pragma omp parallel for num_threads(nThreads) default(none) \
        private(numFlips, seed) \
        shared(trialFlips) \
        reduction(+:numHeads, numTails)
        for (numFlips=0; numFlips<trialFlips; numFlips++) {
            // rand() is not thread safe in linux
            // rand_r() is available in linux and thread safe,
            // to be run on separate threads concurrently.
            // On windows in visual studio, use rand(), which is thread safe.
            if (rand_r(&seed)%2 == 0) // if random number is even, call it heads
                numHeads++;       
            else
                numTails++;
        }
                
        /***  OMP ***/
        ompStopTime = omp_get_wtime();  //get time this trial finished
        
        // Finish this trial by printing out results

        printf("%15d%15d%15d%15d%15.6f%15.6f\n", trialFlips, numHeads, numTails,
               (numHeads+numTails), chiSq(numHeads, numTails),
               (double)(ompStopTime-ompStartTime));    /***  OMP ***/

        trialFlips *= 2;   // double the number of flips for the next trial
    }
    
    /***** Finish Up *****/
    printf("\n\n\t<<< Normal Termination >>>\n\n");
    return 0;
}

Some notes about this code

  1. On line 15 we include the OpenMP library.

  2. On lines 75 and 98 we use the OpenMP function to return a wall clock time in seconds. The difference between these provides the total amount of time to run the section of code enclosed by these lines. Note that this OpenMP function called omp_get_wtime specifically provides the overall time for the threaded code to run. We need to use this function because the original method using the clock() function does not work properly with threaded code.

  3. Lines 82 - 85 indicate the setup for running the for loop of coin flips in equal numbers of iterations per thread. There are several directives needed to be added to the parallel for pragma:

    • num_threads(nThreads) designates how many threads to fork for this loop.
    • default(none) designates that all variables in the loop will be defined as either private within each thread or shared between the threads by the next three directives.
    • the \ designates that the pragma declaration is continuing onto another line
    • private(numFlips, seed) designates that each thread will keep its own private copy of the variables numFlips and seed and update them independently.
    • shared(trialFlips) designates that the variable trialFlips is shared by all of the threads (this is safe because no thread will ever update it.)
    • reduction(+:numHeads, numTails) is a special indicator for the the two values numHeads and numTails, which need to get updated by all the threads simultaneously. Since this will cause errors when the threads are executing, typically the OpenMP threaded code will have each thread keep a private copy of these variables while they execute their portion of the loop. Then when they join back after they have finished , each thread’s private numHeads and numTails sum is added to an overall sum and stored in thread 0’s copy of numHeads and numTails.
  4. You can download the file coinFlip_omp.cpp and try this code yourself. If you have 4 cores available on your computer, you should see the longer trials with many coin flips run almost four times faster than our earlier sequential version that did not use threads.