Current Research

Summer 2011 Research

Full Research Description

Week 1 Week 2 Week 3
Week 4 Week 5 Week 6
Week 7 Week 8 Week 9
Week 10 Week 11

Week 1
Accomplishments

  • Creation of Profile.
  • Creation of website.
Goals for Next Week
  • Be able to compile and run CUDA and OpenCL code.
  • Fully understand the priovided CUDA and OpenCL code.

Week 2
Accomplishments

  • Understand and can compile the provided CUDA kernels.
  • Can modify the provided code to provide extra functionality and fix problems.
  • Basic understanding of the principles behind parallel programming.
Goals for Next Week
  • Be able to understand and compile OpenCl code.
  • Fully understand the goals of this research.

Week 3
Accomplishments

  • Understand and can compile OpenCL code.
  • Can modify and extend OpenCl code to provide additional functionality.
  • Began coding CUDA production kernels and conducted performance analysis.
Goals for Next Week
  • Continue coding on production kernels for Matrix-Matrix mulitiplication in CUDA and OpenCL.

Week 4
Accomplishments

  • Created working CUDA and OpenCL kernels.
  • Demonstrated accuracy of the kernels for arbitrary size non-square matrices.
Goals for Next Week
  • Improve the organization of the kernel wrapper functions.
  • Conduct extensive performance testing on CUDA and OpenCL implementations, and construct comparison between the two implementations and a CPU reference implementation.
  • Investigate different methods of performing modulus to keep computational cost low yet still allow for large primes to be used.

Week 5
Accomplishments

  • Extensive performance testing of CUDA and OpenCL implementations conducted.
  • Determined that OpenCL is slightly slower than CUDA but comparaable, and that no alternative for fmod() is required for high performance gains over sequential CPU code.
Goals for Next Week
  • Port the OpenCL code over to C++ and reorganize to lower initialization cost.
  • Compare performance against current LinBox implementation for double precision.

Week 6
Accomplishments

  • Began porting over to C++ and creating a generic API for future expansion.
  • Successfully tested double precision in OpenCL code on GTX280 and GTX580 for 16 to 4K matrices.
Goals for Next Week
  • Continue porting over to C++.

Week 7
Accomplishments

  • Continued porting over to C++ and creating a generic base API.
  • Experimented with retaining buffers for use in other calculations.
Goals for Next Week
  • Finish porting over to C++.
  • Conduct performance testing on the C++ code and compare it to the C code and to LinBox's current implementations.

Week 8
Accomplishments

  • Completed porting to C++ and finished the design of a generic base API and matrix multiplication specific API.
  • Successfully tested the OpenCL code on multicore CPU and implemented device auto selection in API. The selection has a preference for GPU's with a weighted scoring system.
Goals for Next Week
  • Conduct timing tests on random matrices on both GPU and CPU with the new API.
  • Create abstract of project for presentation.

Week 9
Accomplishments

  • Abstract for project written and approved by adviser.
  • API timing tests completed.
  • Conducted timing tests of highly optimized sequential CPU code for comparison against API.
Goals for Next Week
  • Analyze results of timing tests and determine speedup on tested hardware over sequential code.
  • Write CPU optimized versions of kernels.
  • Create poster for presentation of work.

Week 10
Accomplishments

  • Created poster for presentation.
  • Presented work to other members of the lab and sponsor.
  • Ran timing tests using GPU kernels on CPU and achived impressive speedup.
  • Worked on creating CPU optimized kernels. Results were slower than using GPU optimized kernels on CPU.
Goals for Next Week
  • Enjoy my two week vacation.
  • Spend part of vaction reading up on optimizations for CPU.
  • Present at the Undergraduate Research and Service Symposium.

Week 11
Accomplishments

  • Implemented manual platform and device selection.
  • Created CPU optimized kernels. There are slightly faster for certain sizes.
  • Presented work at the Undergraduate Research and Service Symposium.
Goals for the Fall
  • Create infrastructure for configure time setup including device query, evaluation, and selection, and an offline OpenCL compiler.
  • Determine if different hardware introduces errors.
  • Improve integration with LinBox.
  • Improve CPU optimized kernels.

....