| mstack.c mstack.f |
The Mstack reference implementations |
| mstack2.c mstack2.f |
Identical to reference implementation except for swap implementation |
| mstack3.c mstack3.f |
Identical to reference implementation except for swap implementation |
| mstack4.f | Identical to reference implementation except for swap implementation |
| mstack5.f | Identical to reference implementation except for swap implementation |
| mstackv.c mstackv.f |
Mstack vectorized implementation (Substantially different from reference implementation) | mstackv2.c | Second vectorized implementation |
| mstacko.c mstacko.f |
Optimized version of the reference implementation |
| mstack2o.c mstack2o.f |
Optimized versions of mstack2 C and Fortran versions. |
| mstack3o.c mstack3o.f |
Optimized versions of mstack3 C and Fortran versions. |
| mstack4o.c mstack4o.f |
Optimized versions of mstack4 C and Fortran versions. |
| mstack5o.f | Optimized version of mstack5 Fortran version |
| mstackvo.c mstackvo.f |
Optimized versions of mstackv C and Fortan vectorized versions |
| mstackv2o.c | Optimized version of second vectorized implementation |
| mstackomta1.c | One extra level of dimensionality added to the scratch array (scratch now has 2 dimensions) |
| mstackomta2.c | Two extra levels of dimensionality added to the scratch array (scratch now has 3 dimensions) |
| mstackomta3.c | Scratch array eliminated entirely. All data access and manipulation now done directly on traces (which has 3 dimensions) |
| mstackomta5.c | This was an attempt to reduce the number of memory operations by using the recurrence. (Note: in practice, this did not produce the expected speed up) |
| mstackpomta.c | Replaces the bubblesort with an odd-even transposition sort (effectively a parallelized bubblesort). This introduces an additional factor of n/2 parallelism. |
| mstackp.c | Same as C reference implementation except uses an odd-even transposition sort instead of bubblesort. |
| mstackpo.c | Same as mstackp but using Ioannis's optimization |
| mstackmo.c | A somewhat crude effort to use a merge-sort in combination with odd-even transposition sort. |
| mstacko-cyclops.c | Similiar to mstackomta2. It does not assume automatic parallelization, but instead relies on nested openmp parallelization. Theoretically has 1 million units of parallelism. |
| dime-c_8sorts.c | DIME-C source code implementing 8 independent sorting units in parallel |
| example.c | Host C file containing FPGA specific API functions for design execution | dimetalk_48units.dt3 | DIMEtalk file for the network of 48 sort processing elements |
| host5.c mitrion_numchn5.mitc |
Mitrion host C file and FPGA .mitc file - Implements optimized inner loop bubble sort for 5 channels |
| host50.c mitrion_numchn50.mitc |
Mitrion host C file and FPGA .mitc file - Implements optimized inner loop bubble sort for 50 channels |
| host75.c mitrion_numchn75.mitc |
Mitrion host C file and FPGA .mitc file - Implements optimized inner loop bubble sort for 75 channels |
| host128.c mitrion_numchn128.mitc |
Mitrion host C file and FPGA .mitc file - Implements optimized inner loop bubble sort for 128 channels |
| Mstack_Cell.tar | Mstack-on-Cell source code |
| c64_mstack.tar | Mstack reference version |
| c64_mstacko.tar | Optimized Mstack reference version |
| c64_mstack2.tar | Identical to reference implementation except for swap implementation |
| c64_mstack2o.tar | Optimized version of c64_mstack2 |
| c64_mstack3.tar | Identical to reference implementation except for swap implementation |
| c64_mstack3o.tar | Optimized version of c64_mstack3 |
| c64_mstackv.tar | Vectorized mstack implementation |
| c64_mstackvo.tar | Optimized vectorized Mstack implementation |
| c64_mstackv2.tar | Second vectorized version |
| c64_mstackv2o.tar | Optimized second vectorized version |
| mstack_classes.cpp | C++ classed based implementation (precursor to ParalleX) |
| parallex_main.cpp | ParalleX main |
| mstack.hpp | Header file containing Mstack class and return_set struct definitions |
| mstack.cpp | Class definitions |
| mstack_local_mem.cu | CUDA mstack implementation | ParalleX main |