CISC 372: INTRODUCTION TO PARALLEL PROGRAMMING - Spring 2000
Group Project 1 Deliverable 3 Specifications and Assessment
Deliverable 2: due start of class, Thursday, April 6, 2000 (extension date)
Deliverable 3: due start of class, Thursday, April 13, 2000 (extension date)

Deliverable 3

A final parallel implementation showing your best effort toward efficient, effective parallelization. This version should be able to run correctly on 1, 4 and 8 processors. You should show test runs, using a script file, for 1, 4, and 8 processors. The final draft of the documentation should also be included with this deliverable.

The details of the code:

1.

Your software will run on an 8 processor cluster of machines in a mobile command unit. The machines will run MPICH v1.0.12 (coincidentally the same configuration available at University of Delaware!). Your code will use the MPI library with the C programming language. It should written general enough to be able to run on 1, 4, and 8 processors though.

2.

Images are stored in ascii PGM format. Given a filename at runtime, your software must read the PGM file, process the image, and then save the resulting version of the image to a PGM file with the file name also specified at runtime.

3.

Your program should accept any arbitrary size image, for which the matrix to store the image is evenly divisible by 8.

4.

NEW CONDITION: Your code should be efficient in both time and space. For space, efficiency means that process 0 is the only one that should allocate space for the whole data set, and all other processes should only allocate space adequate enough to hold the data they need to hold and any data involved in communications. They should NOT all allocate a fixed size for the whole data set if they are not working on that size data. You need to be careful with two-dimensional arrays in C, especially when dynamically allocating them. See the notes at the end of this handout.

For time, efficiency means using parallelism and performing communication efficiently. So, you should think about how best to distribute the data to minimize the amount of communication needed between processes during computation as well as minimizing the amount of duplicate data distribution work. In addition, when communication is needed, you should try to do it in an efficient manner, that is, avoid wait's if possible due to sequentialized communication schemes, and package up as much as possible in a message before sending. You are welcome to use any MPI command you find to achieve these goals, but it is perfectly fine to use only those we have discussed in class so far.

5.

Your software must perform all computations as quickly as possible.

6.

Your software must perform all computations correctly.

7.

You should be prepared to give a demo of your software by the third deliverable deadline.

8.

You must clearly and thoroughly document your code. You must also provide a well written user manual that covers all aspects of use, explains algorithms used, describes any problems or shortcomings and why they exist, describes possible approaches to eliminating any problems or bugs, and clearly delineates all specifications of your software. You should include a section on how you tested your software.

Criteria for Evaluation

Your project will be evaluated according to the following criteria:

1.: (30 pts) Deliverable 1.
2.: (30 pts) Deliverable 2.
3.: (30 pts) Deliverable 3. - (2) arbitrary size image, for which matrix is evenly divisible by 8
- (8) perform correct edge detection according to selected algorithm for any number of processors
- (5) a space-efficient parallel version
- (6) a time-efficient parallel version in data distribution, load balance, and communication efficiency
- (2) internal documentation of the code
- (3) external documentation
- (2) two execution runs with timings taken demonstrated
- (2) script instructions followed to demonstrate compile and execute
4.: (10 pts) Peer Review. (more to come)

Extra Credit

(3 pts) Generalize your code to take in matrices that are not necessarily evenly divisible by the number of processes, and demonstrate that it correctly handles these cases. (2 more pts = 5 pts) Give a solution that minimizes load imbalance, and explain how you deal with minimizing load imbalance in this situation.

Notes on Dynamic Array Allocation in C

For a single dimension array:

I first declare the array as a pointer, because I don't want to have to give it a fixed size like int vector[100]; So, instead, I declare it as:

int *vector;

This create space for a single pointer called vector, in each process's address space. Now, if I want to allocate space, say for size number of int's, I write:

vector = (int *) calloc(size, sizeof(int));

vector = (int *) malloc(size * sizeof(int));

You can do a man on malloc or calloc and find out more.

Now, to access vector, I can either use pointer notation and pointer arithmetic, or I can use array notation.

So, I can do:

for(i=0; i<size;i++)
   printf("vector[%d]:%d\n", i, vector[i]);

to print out the values in the array. They should be all zero at this point, as malloc initializes the locations allocated to 0.

For two-dimensional arrays, it is much trickier: Here are some portions of my code:

In C, two-dimensional arrays are really an array of arrays, such that a[i][j] accesses the jth element of the ith array in the array of arrays. So, to declare a two-dimensional array that does not have space allocated yet, I need to declare a pointer to a pointer as:

int **grid;

Then, to allocate space for the whole grid of size, I can do it in two steps:

I first allocate the array of pointers which will point to the arrays. This can be done by:

grid = (int **) malloc(size * sizeof(int *));

This create space for size number of addresses; an address can be stored in the space of a pointer to an int.

Now, I have space as if I had size number of vector declarations. So, now I need to allocate the space for the vectors (single dimension arrays themselves). So, I do the same as I did for allocating a single vector, except I need size number of them, so I put it in a loop:

for (i=0;i<size;i++)
  grid[i] = (int *) malloc(size * sizeof(int));

Note that I can now access grid by the grid[i][j] notation, because it is fully allocated now.

About this document ...

This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)

The command line arguments were:
latex2html -split 0 group1c.tex.

The translation was initiated by Lori Pollock on 2000-04-06

Lori Pollock
2000-04-06