Nonetheless, there are several common elements that I would like all projects to include.
An elegant way to do this is to write a code that uses P processes each having T threads. It could be in MPI + cilk-or-OpenMP-or-pthreads to use what is most readily available. Experiments can then include timing of the cases (P = 1, T > 1) and (P > 1, T = 1), with intermediate values also being of interest (P = 8. T = 2) on porsche's nodes, for instance, or (P <= 20, T = 4) on nsfri cluster...