next up previous contents
Next: Profiling and Viewing Up: Gathering Performance Data Previous: Gathering Performance Data

Timing Programs

For timing parallel programs, MPI includes the routine MPI_Wtime() which returns elapsed wall clock time in seconds. The timer has no defined starting point, so in order to time something, two calls are needed and the difference should be taken between the returned times. As a simple example, we can time each of the processes in the hello world program as below:

#include <stdio.h>
#include <mpi.h>

/*NOTE: The MPI_Wtime calls can be placed anywhere between the MPI_Init
and MPI_Finalize calls.*/

main(int argc, char **argv)
{
   int node;
   double mytime;   /*declare a variable to hold the time returned*/

   MPI_Init(&argc,&argv);
   mytime = MPI_Wtime();  /*get the time just before work to be timed*/
   MPI_Comm_rank(MPI_COMM_WORLD, &node);

   printf("Hello World from Node %d\n",node);

   mytime = MPI_Wtime() - mytime; /*get the time just after work is done
                                    and take the difference */
   printf("Timing from node %d is %lf seconds.\n",node,mytime);
   MPI_Finalize();

 }

It might be nice to know what was the least/most execution time spent by any individual process as well as the average time spent by all of the processes. This will give you a vague idea of the distribution of work among the processes. (A better idea can be gained by calculating the standard deviation of the run times.) To do this, in additional to a few calls to get the value of the system clock, you need to add a call to synchronize the processes and a few more calls to collect the results. For example, to time a function called work() which is executed by all of the processes, one would do the following:

    int myrank,
        numprocs;
    double mytime,   /*variables used for gathering timing statistics*/
           maxtime,
           mintime,
           avgtime;
  
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Barrier(MPI_COMM_WORLD);  /*synchronize all processes*/
    mytime = MPI_Wtime();  /*get time just before work section */
    work();
    mytime = MPI_Wtime() - mytime;  /*get time just after work section*/
/*compute max, min, and average timing statistics*/
    MPI_Reduce(&mytime, &maxtime, 1, MPI_DOUBLE,MPI_MAX, 0, MPI_COMM_WORLD);
    MPI_Reduce(&mytime, &mintime, 1, MPI_DOUBLE, MPI_MIN, 0,MPI_COMM_WORLD);
    MPI_Reduce(&mytime, &avgtime, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);
    if (myrank == 0) {
      avgtime /= numprocs;
      printf("Min: %lf  Max: %lf  Avg:  %lf\n", mintime, maxtime,avgtime);
    }

Be sure to execute spyall to ensure that no one else is using the cluster at the time you want to run a performance execution. Other processes in the system will affect your performance timings. On a network of workstations, the reported times may be off by a second or two due to network latencies as well as interference from the operating system and the jobs of other users. On a tightly coupled machine like the Paragon, these timings should be accurate to within a second. Thus, it's best in general to time things that run for long enough that the system noise isn't significant, and time them several times. If you get an anomalous timing, don't hesitate to run the code a few more times to see if it can be reproduced.

If you are comparing the execution times of a sequential program with a parallel program written in MPI, be sure to use the mpicc compiler for both programs with the same switches, to ensure that the same optimizations are performed. Then, run the sequential version via mpirun -np 1.



next up previous contents
Next: Profiling and Viewing Up: Gathering Performance Data Previous: Gathering Performance Data



Lori Pollock
Wed Feb 4 14:18:58 EST 1998