Please choose a project in the very near future. Send me an email giving a thumbnail sketch of the project topic and telling who will work on it (if a team effort). Due next week: A serial code whose parallel equivalent will be some part of your project. A working serial version is an essential first step in a parallel programming effort. ==================================================================================================== Parallel programming: -------------------- Started project: protein folding -------------------- Started project: pattern match search in sound samples -------------------- Possible project: sparse-matrix times vector product Assume: repeated mults with same matrix but different vectors will be needed. Issue: scheme to divide up the work evenly among the processors. Variant: block method. Sparse-matrix times block of several vectors at a time. Full solution: make a blackbox class for linbox which represents the sparse matrix and provides the parallel matrix vector product as it's "apply" function. Basis: serial impl in LinBox -------------------- Possible project: integer multiplication by 3 primes FFT algorithm parallelization Embarassingly parallel part: the three primes in parallel Further parallelism within the fft, also within the CRA? Application area: cryptology, number theory Basis: serial impl in NTL -------------------- Possible project: polynomial multiplication by parallel Karatsuba algorithm. This is divide and conquer with 3 concurrent parts per stage. Issue: How many stages deep to go? Analysis: What does this do to the threshold between Karatsuba poly mult and FFT poly mult? Full solution: make a blackbox class for linbox which represents a Toeplitz matrix and provides the parallel matrix vector product (defined in terms of polynomial mult) as it's "apply" function. Basis: serial impl in NTL -------------------- Possible project: Villard characteristic polynomial algorithm parallelization. Basics: Given matrix A, call minpoly on A + UV, for several random low rank updates UV in parallel. Issue: Decide how much parallelism to use, when to quit, whether to use divide and conquer or linear stepping -------------------- Possible project: Parallel FFT Basis: serial impl in NTL ==================================================================================================== Parallel computation tool design: -------------------- C++ interface to MPI For instance, make a communicator class in which init is a constructor and communication functions are template member functions. // send/recv one item of arbitrary type T template void C.Send(const T& x, proc dest); // tag optional template void C.Recv(T& x, proc src); // fold status object into C for use only when needed. and // send/recv items of arbitrary type T, e-b of them which are adjacent in a vector, array, deque, list, etc. template C.Send(T::iterator& b, T::iterator& e, proc dest); template C.Recv(T::iterator& b, T::iterator& e, proc dest); similar for collective comms. Project is to implement a nice C++ interface that is considerably simpler for the user than the standard C interface, yet maps quite directly to an underlying C implementation such as mpich. Illustrate the implementation with some simple parallel programs. -------------------- Racing tool Set up simple, clean, easy to use mechanism allowing user to race several function calls. All processes should be killed (anyway, the raced activity terminated) within a reasonable time of the first one to complete. Variant, more than one winner: Master program indicates when to terminate after several have completed. System should provide for user wanting to race n things when p processors are available and n >> p. Then fewer than n processes should be created... R = race(f, arg_f, g, arg_g); wait(R); // get first result, kill others or for (i = 0; i < k; ++i) add_to_race(R, f, arg_i); wait(R); // get first result, kill others or for (i = 0; i < k; ++i) add_to_race(R, f, arg_i); while (not enough racers have finished) { get another result from racers. decide if that makes enough results } then kill remaining or similar Illustrate the implementation with some simple parallel programs. -------------------- Implementation of Linda (slight variant) as a C or C++ library eval() is an issue. Either fork a process, ssh a process, or activate an existing MPI process... Focus on out(), in(), rd(). Skip the full associative memory model but allow optimizations greater than javaspaces seems to allow. Illustrate the implementation with some simple parallel programs. -------------------- Condor at UDel Make condor installation successful on a substantial subset of eecis. This is a social/technical problem solving exercise. Illustrate the implementation with some simple parallel programs. -------------------- User level exploration of parameter space tool Often the available parallelism is extremely "embarassingly parallel." The user wants to run several independent programs or several independent runs of the same program with different parameters. However, experience with early runs influences what further runs the experimenter will decide to try. Thus the goal is a management tool to effectively display and record the status of runs. For each the program, the parameters, the status (done or not, where running, how long running, etc), and the results if done. Possible as database application. Possible as CVS "extension". Unfortunately, experimenters often work by modifying code and/or input data files from run to run. This can lead to confusion later as to what was the state of the program and data for a particular run. Possibly CVS could be used for this kind of user. The experimenter would use a script to launch runs that would automatically check-in (as belonging to the same version) the code, input, and output of the run. Illustrate the implementation with some simple parallel programs.