Current Research
Summer 2011 Research
| Week 1 | Week 2 | Week 3 | Week 4 | Week 5 | Week 6 |
| Week 7 | Week 8 | Week 9 |
| Week 10 | Week 11 |
Week 1
Accomplishments
- Creation of Profile.
- Creation of website.
- Be able to compile and run CUDA and OpenCL code.
- Fully understand the priovided CUDA and OpenCL code.
Week 2
Accomplishments
- Understand and can compile the provided CUDA kernels.
- Can modify the provided code to provide extra functionality and fix problems.
- Basic understanding of the principles behind parallel programming.
- Be able to understand and compile OpenCl code.
- Fully understand the goals of this research.
Week 3
Accomplishments
- Understand and can compile OpenCL code.
- Can modify and extend OpenCl code to provide additional functionality.
- Began coding CUDA production kernels and conducted performance analysis.
- Continue coding on production kernels for Matrix-Matrix mulitiplication in CUDA and OpenCL.
Week 4
Accomplishments
- Created working CUDA and OpenCL kernels.
- Demonstrated accuracy of the kernels for arbitrary size non-square matrices.
- Improve the organization of the kernel wrapper functions.
- Conduct extensive performance testing on CUDA and OpenCL implementations, and construct comparison between the two implementations and a CPU reference implementation.
- Investigate different methods of performing modulus to keep computational cost low yet still allow for large primes to be used.
Week 5
Accomplishments
- Extensive performance testing of CUDA and OpenCL implementations conducted.
- Determined that OpenCL is slightly slower than CUDA but comparaable, and that no alternative for fmod() is required for high performance gains over sequential CPU code.
- Port the OpenCL code over to C++ and reorganize to lower initialization cost.
- Compare performance against current LinBox implementation for double precision.
Week 6
Accomplishments
- Began porting over to C++ and creating a generic API for future expansion.
- Successfully tested double precision in OpenCL code on GTX280 and GTX580 for 16 to 4K matrices.
- Continue porting over to C++.
Week 7
Accomplishments
- Continued porting over to C++ and creating a generic base API.
- Experimented with retaining buffers for use in other calculations.
- Finish porting over to C++.
- Conduct performance testing on the C++ code and compare it to the C code and to LinBox's current implementations.
Week 8
Accomplishments
- Completed porting to C++ and finished the design of a generic base API and matrix multiplication specific API.
- Successfully tested the OpenCL code on multicore CPU and implemented device auto selection in API. The selection has a preference for GPU's with a weighted scoring system.
- Conduct timing tests on random matrices on both GPU and CPU with the new API.
- Create abstract of project for presentation.
Week 9
Accomplishments
- Abstract for project written and approved by adviser.
- API timing tests completed.
- Conducted timing tests of highly optimized sequential CPU code for comparison against API.
- Analyze results of timing tests and determine speedup on tested hardware over sequential code.
- Write CPU optimized versions of kernels.
- Create poster for presentation of work.
Week 10
Accomplishments
- Created poster for presentation.
- Presented work to other members of the lab and sponsor.
- Ran timing tests using GPU kernels on CPU and achived impressive speedup.
- Worked on creating CPU optimized kernels. Results were slower than using GPU optimized kernels on CPU.
- Enjoy my two week vacation.
- Spend part of vaction reading up on optimizations for CPU.
- Present at the Undergraduate Research and Service Symposium.
Week 11
Accomplishments
- Implemented manual platform and device selection.
- Created CPU optimized kernels. There are slightly faster for certain sizes.
- Presented work at the Undergraduate Research and Service Symposium.
- Create infrastructure for configure time setup including device query, evaluation, and selection, and an offline OpenCL compiler.
- Determine if different hardware introduces errors.
- Improve integration with LinBox.
- Improve CPU optimized kernels.
....

Docking@Home
GCLab at UDel