Selected Publications

Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling. We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA’s next- generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA’s 83.72x speedup over the same serial im- plementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif.
In ACM Digital Library, https://doi.org/10.1145/3218176.3218228, 2018.

This work utilizes MapReduce framework Apache Spark, in conjunction with CUDA in order to simultaneously take advantage of automatic data distribution and specialized hardware present on each node of our HPC systems. We demonstrate scalability with regard to compute intensive portions of the code that are parallelizable, as well as an exploration of the parameter space for each application. Using our paradigm, we accelerate three real-world, compute and data intensive, graph analytics applications: a function call graph similarity application, a triangle enumeration subroutine, and a graph assaying application.
In Vol. 10. DOI: 10.1504/IJHPCN.2017.10007922, IJHPCN, 2017.

This work builds a testsuite to validate and verify implementations of the OpenACC programming model in compilers and their conformance to the OpenACC specification. This testsuite has been integrated into the harness infrastructure of the TITAN and Summitdev systems at Oak Ridge National Lab and is being used for production.
In LNCS, volume 10524, pp 557-575, 2017.

MIT developed a sparse FFT (sFFT) algorithm in 2012 to compute FFT in a sub-linear time by efficiently locating the most significant output (usually very few ”large” coefficients are present in the frequency domain). In this paper, we use CUDA to parallelize and accelerate sFFT on massively parallel processors, such as GPUs. Our CUDA-based sFFT, cusFFT, performs over 10x faster than the state-of-the-art cuFFT library on GPUs and over 28x faster than the parallel FFTW on multicore CPUs. Pease refer to our publication on Parallel sFFT to learn about how we used OpenMP to parallelize sparse FFT on multicore platforms. An online data layout runtime transformation algorithm for locality aware sFFT was also developed.
In IEEE, pp. 963-972, DOI: 10.1109/IPDPS.2016.95, 2016.

Upcoming & Recent Talks

HPC-as-a-Service to Domain Scientists
Jul 3, 2018 5:00 PM
Opportunities and Challenges Migrating Scientific Code to Accelerators
Jun 17, 2018 1:00 PM
Path forward for softwarization to tackle evolving hardware
Apr 18, 2018 1:00 PM
Adapting Minisweep, a Proxy Application, on Heterogeneous Systems Using OpenACC Directives
Mar 27, 2018 5:00 PM
Achieving Performance While Preserving Portability for NGS Application
Mar 7, 2018 1:00 PM
Talks at SC17
Nov 10, 2017 10:00 AM
Using OpenACC for NGS Techniques to create a portable and easy-to-use code base
May 9, 2017 1:00 PM
Programming Models and Libraries for Modernizing Legacy Applications for Exascale
Feb 27, 2017 4:35 PM

Professional Activities

Please refer to the CV for a complete list of roles and contributions.

CONTINUE READING

Recent News

Vertically Integarted Project (VIP) undergraduate students won the Research Poster prize for accelerating the prediction of chemical shift of protein structures using state-of-the-art NVIDIA V100 GPUs at the annual VIP mid-atlantic poster competition held at UD. The acceleration brought down the time taken from 10+ hours to 2 minutes on a dataset of approximately 11K atoms. Stay tuned for more details.

Sunita Chandrasekaran and her doctoral student Robert Searles have accelerated a nuclear reactor-based miniapp, Minisweep using OpenACC on GPUs. The paper is accepted to be published in PASC 2018. For more see UDaily and OLCF news.

Sunita Chandrasekaran and Guido Juckeland published an Edited book on OpenACC for Programmers: Concepts and Strategies, November 2017. The book provides a comprehensive and practical overview of OpenACC for massively parallel programming. Related articles: Eurekalert and insideHPC.

In this video from SC17, Sunita Chandrasekaran from OpenACC.org and Stan Posey from NVIDIA describe how OpenACC eases GPU programming for HPC. More

In Summer 2016, scientists from NASA, NCI, BNL and 3 UDEL teams gathered at the University of Delaware for a GPU Programming Hackathon. Watch the recap video shared by NVIDIA news center and photos. Similarly in summer 2017, a Brookathon was held at Brookhaven National Lab in collaboration with Meifeng Lin, a CSI computational scientist, Fernanda Foertter, a HPC user support specialist and programmer and several others. The hackathon stories and training experiences were captured in this paper presented at the 2017 EduHPC workshop co-located with SC17.

Teaching

CISC 360, CISC 662, ELEG 467: Vertically Integrated Project (VIP) Program, CISC 849

CONTINUE READING

Research

Accelerating Chemical Shift Prediction of Protein Structures using GPUs

Accurate prediction of the chemical shift of a protein is essential in certain areas of molecular dyanamics research such as drug discovery. This is a compute-intensive problem. Currently, there is not an available application that can predict chemical shift of large protein structures in a realistic amount of time. We took a chemical shift prediction application called PPM_One, and accelerated it using OpenACC to reduce the time taken from 10+ hours on a single core to 2 minutes on NVIDIA V100 GPUs.

Parallelization and Acceleration of Nuclear Reactor Miniapp: Minisweep

Denovo is a production code for nuclear reactor neutronics modeling and is in use by a current DOE INCITE project to model the ITER fusion reactor. Our project investigates the sweep kernel within Denovo that counts for approximately 80-99% of Denovo’s overall computational expenses. We use OpenACC, a high-level, directive-based programming model running on NVIDIA’s next-generation Volta GPU for this work and our preliminary shows promising speedup comparable to CUDA.

AccSeq: A High-Level Parallel NGS tool using Novel Time-Memory Efficient Algorithm

This project explores building a fast whole genome sequence alignment algorithm for both long and short reads. The tool is being evaluated with Saccharomyces Genome Database (SGD) (yeast) and human genome sequences. Stay tuned to learn more.

OpenMP Validation and Verification Testsuite

Sunita’s group builds validation and verification (V&V) testsuites for OpenMP directive-based parallel programming models and focusses on the offloading features. Work on OpenMP testsuite is part of the Exascale Computing Project (ECP) SOLLVE project. The testsuite project is funded by Oak Ridge National Laboratory (ORNL). The goal of this project is to validate and verify implementations of the programming model features in various compilers.

OpenACC Validation and Verification Testsuites

Sunita’s group builds validation and verification testsuite (V&V) for OpenACC directive-based paralell programming models. Work on OpenACC testsuite is supported by OpenACC/NVIDIA. The goal of this project is to validate and verify implementations of the programming model features in various compilers. This testsuite has been integrated into the harness infrastructure of the TITAN and Summitdev systems at Oak Ridge National Lab and is being used for production.

Other Research Projects from CRPL Lab

Please refer to my Computational Research and Programming Lab CRPL website for more projects.

Other Publications

Book & Book Chapter(s)

A Textbook on Parallel Programming Released in 2017

  • Sunita Chandrasekaran and Guido Juckeland. Editors. OpenACC for Programmers: Concept and Strategies, Addison-Wesley Professional, 1 edition. ISBN: 978-0-13-469428-3. 2017

Book Chapter(s)

  • Sunita Chandrasekaran, Rengan Xu, Barbara Chapman. Chapter on “Using OpenACC for stencil and Feldkamp algorithms”.
  • Barbara Chapman, Deepak Eachempati, Sunita Chandrasekaran. Chapter on “OpenMP”.

Journals

  • Robert Searles, Stephen Herbein, Travis Johnston, Michela Taufer, Sunita Chandrasekaran, “Creating a portable, high-level graph analytics paradigm for compute and data-intensive applications”, In Proceedings of International Journal of High Performance Computing and Networking. Special Issue of High Level Programming Approaches for Accelerators. DOI: 10.1504/IJHPCN.2017.10007922. 2017
  • Xiaonan Tian, Rengan Xu, Yonghong Yan, Sunita Chandrasekaran, Deepak Eachempati, and Barbara Chapman, “Compiler Transformation of Nested Loops for GPGPUs”, Concurrency and Computation: Practice and Experience, Special Issue on Programming Models and Applications for Multicores and Manycore, http://dx.doi.org/10.1002/cpe.3648, ISSN: 1532-0634, 2015
  • Rengan Xu, Sunita Chandrasekaran, Barbara Chapman, “Multi-GPU Support on Shared Memory System using Directive-based Programming Model”, Scientific Programming, Special Issue on Programming Models, Languages and Compilers for Manycore and Heterogeneous Architectures, http://dx.doi.org/10.1155/2015/621730, Vol. 2015, Article ID 621730, 2015
  • Sunita Chandrasekaran, Shilpa Shanbagh, Ramkumar Jayaraman, HuiYan Cheah and Douglas Maskell, “C2FPGA: A Dependency-Timing Graph Design Methodology”, Journal of Parallel and Distributed Computing JPDC: S p e c i a l I s s u e o n Novel architectures for high- performance computing, Elsevier, http://dx.doi.org/10.1016/j.jpdc.2012.09.001, 2012.

Conferences/Workshops & Journals

  • Robert Searles, Sunita Chandrasekaran, Wayne Joubert, Oscar Hernandez. 2018. Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures. 5th Proceedings of the Platform for Advanced Scientific Computing Conference (PASC) DOI: 10.1145/ 3218176.3218228, 2018
  • Millad Ghane, Sunita Chandrasekaran, Robert Searles, Margaret Cheung, Oscar Hernandez, “Path forward for softwarization to tackle evolving hardware”, Proc. SPIE 10652, Disruptive Technologies in Information Sciences, 106520O, 2018, DOI: 10.1117/ 12.2304813; https://doi.org/10.1117/12.2304813
  • Robert Searles, Stephen Herbein, and Sunita Chandrasekaran. 2016. A Portable, High-Level Graph Analytics Framework Targeting Distributed, Heterogeneous Systems. In Proceedings of the Third International Workshop on Accelerator Programming Using Directives (WACCPD ‘16). IEEE Press, Piscataway, NJ, USA, 79-88. DOI: 10.1109/WACCPD.2016.012
  • Kyle Friedline, Sunita Chandrasekaran, Graham Lopex, Oscar Hernandex. 2017. OpenACC 2.5 Validation Testsuite targeting multiple architectures. In LNCS Proceedings of 2nd International Workshop on Performance Portable Programming Models for Accelerators, LNCS, volume 10524, pp 557-575, 2017
  • Sergio Pino, Lori Pollock, and Sunita Chandrasekaran. Exploring translation of OpenMP to OpenACC 2.5: Lessons learned. Proceedings of the Seventh International Workshop on Accelerators and Hybrid Exascale Systems (AsHES). IEEE Press, 2017. DOI: 10.1109/IPDPSW.2017.84.
  • Michael Wolfe, Seyong Lee, Jungwon Kim, Xiaonan Tian, Rengan Xu, Sunita Chandrasekaran and Barbara Chapman. Implementing the OpenACC Data Model. Proceedings of the Seventh International Workshop on Accelerators and Hybrid Exascale Systems (AsHES). IEEE Press, 2017. DOI: 10.1109/IPDPSW.2017.85.
  • Robert Searles, Stephen Herbein, and Sunita Chandrasekaran. 2016. A Portable, High-Level Graph Analytics Framework Targeting Distributed, Heterogeneous Systems. In Proceedings of the Third International Workshop on Accelerator Programming Using Directives (WACCPD ‘16). IEEE Press, Piscataway, NJ, USA, 79-88. DOI: 10.1109/WACCPD.2016.012.

2016

  • Rengan Xu, Sunita Chandrasekaran, Xiaonan Tian, Barbara Chapman, “An Analytical Model-based Auto-tuning Framework for Locality-aware Loop Scheduling”, pp. 3-20, In the International Supercomputing Conference (ISC), Frankfurt, Germany, 2016
  • Cheng Wang, Sunita Chandrasekaran, Barbara Chapman, “cusFFT: A High-Performance Sparse Fast Fourier Transform Algorithm on GPUs”, pp. 963-972, DOI: 10.1109/IPDPS.2016.95, In the 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, Illinois, USA, 2016
  • Suyang Zhu, Sunita Chandrasekaran, Peng Sun, Barbara Chapman, Tobias Schuele, Marcus Winter, “Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API”, TO APPEAR In Proceedings of 4th Workshop on Runtime and Operating Systems for the Many-core Era (ROME) co-located with EUROPAR, Grenoble, France, 2016
  • Cheng Wang, Sunita Chandrasekaran and Barbara Chapman, “Towards Exploiting DataLocality for Irregular Applications on Shared-Memory Multicore Architectures”, TO APPEAR In Proceedings of DOD’s The First Workshop of Mission-Critical Big Data Analytics (MCBDA 2016), Houston, TX, 2016

2015

  • Peng Sun, Sunita Chandrasekaran, and Barbara Chapman, “Deploying OpenMP Task Parallelism on Multicore Embedded Systems with MCA Task APIs”, In Proceedings of IEEE HPCC 2015, pp. 843 - 847, 2015
  • Peng Sun, Sunita Chandrasekaran, and Barbara Chapman, “OpenMP- MCA:Leveraging Multiprocessor Embedded Systems using industry standards”, In Proceedings of the 2015 IEEE International Parallel & Distributed Processing Symposium Workshops, PLC2015, 10.1109/IPDPSW.2015.13, pp. 679-688, 2015

2014

  • Rengan Xu, Maxime Hugues, Henri Calandra, Sunita Chandrasekaran and Barbara Chapman. “Accelerating Kirchhoff Migration on GPU using Directives”, In Proceedings of ACM SIGHPC, The first Workshop on Accelerator Programming using Directives (WACCPD 2014) co-located with SC14, IEEE Press, pp. 37-46, 2014
  • Guido Juckeland, William Brantley, Sunita Chandrasekaran, et al., “SPEC ACCEL - A Standard Application Suite for Measuring Hardware Accelerator Performance”, In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS14) co-located with SC14, Springer Verlag, pp 46-67, 2014
  • Peng Sun, Sunita Chandrasekaran, and Barbara Chapman, “Targeting Heterogeneous SoCs using MCAPI”, In TECHCON 2014, in the GRC Research Category Section 29.1, SRC, 2014
  • Rengan Xu, Cheng Wang, Sunita Chandrasekaran, Barbara Chapman, “An OpenACC 1.0 Validation Suite”, In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), MTAAP 2014, pp 1407-1416, 2014
  • Rengan Xu, Xiaonan Tian, Yonghong Yan, Sunita Chandrasekaran, Barbara M. Chapman, “Reduction Operations in Parallel Loops for GPGPUs”, In Proceedings of ACM, Programming Models and Applications on Multicores and Manycores (PMAM), pp 10:10–10:20, 2014
  • Rengan Xu, Xiaonan Tian, Sunita Chandrasekaran, Yonghong Yan and Barbara Chapman. “NAS Parallel Benchmarks on GPGPUs using a Directive-based Programming Model”. In Proceedings of Springer Verlag, The 27th International Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 67-81, 2014

2013

  • Cheng Wang, Mauricio Araya, Sunita Chandrasekaran, Barbara Chapman, Detlef Hohl, “Parallel Sparse FFT”, In Proceedings of ACM, The 3rd Workshop on Irregular Applications: Architectures and Algorithms (IA^3), co-located with SC 2013, pp 10:1–10:8, 2013
  • Xiaonan Tian, Rengan Xu, Yonghong Yan, Zhifeng Yun, Sunita Chandrasekaran, and Barbara Chapman. “Compiling A High-Level Directive-based Programming Model for Accelerators”. In Proceedings of Springer Verlag, 26th International Workshop on Languages and Compilers for High Performance Computing (LCPC 2013), pp. 105-120, 2013
  • Sayan Ghosh, Sunita Chandrasekaran, Barbara Chapman, “Statistical Modeling of Power/Energy of Scientific Kernels on a Multi-GPU system”, In Proceedings of IEEE, Third International Workshop on Power Measurement and Profiling (PMP) co-located with IGCC, 1-6, 2013
  • Cheng Wang, Sunita Chandrasekaran, Barbara Chapman, Jim Holt, “Portable Mapping of OpenMP to Multicore Embedded Systems Using MCA APIs”, In Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems, 153-162, 2013
  • Rengan Xu, Sunita Chandrasekaran, Barbara Chapman, “Exploring Programming Multi-GPUs using OpenMP & OpenACC-based Hybrid Model”, In Proceedings of 2012 IEEE Workshop on Multicore and GPU Programming Models, Languages and Compiler (PLC) co-located IPDPS, pp 1169-1176, 2013
  • Cheng Wang, Sunita Chandrasekaran, Barbara Chapman, Jim Holt, “libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems”, In Proceedings of ACM, International Workshop on Programming Models and Applications for Multicores and Manycore co- located with PPoPP., pp 83-92, 2013

2012

  • Rengan Xu, Sunita Chandrasekaran, Barbara Chapman, Christoph F. Eick, “Directive-based Programming Models for Scientific Applications - A Comparison”, In Proceedings of IEEE, Second International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (Wolfhpc) co-located with Supercomputing (SC), pp 1-9, 2012
  • Lei Huang, Eric Stotzer, Hangjun Yi, Barbara Chapman, Sunita Chandrasekaran, “Parallelizing Ultrasound Image Processing using OpenMP on Multicore Embedded Systems”, In Proceedings of 2012 IEEE Global High Tech Congress on Electronics (GHTCE), 131-138, DOI 10.1109/GHTCE.2012.6490139, 2012.
  • Cheng Wang, Sunita Chandrasekaran, Barbara Chapman, “An OpenMP3.1 Validation testsuite”, In Proceedings of IWOMP 2012, LNCS, Volume 73122012,p.237-249, Rome, Italy, June 2012
  • Sayan Ghosh, Sunita Chandrasekaran, Barbara Chapman, “Energy Analysis of Parallel Scientific Kernels on Multiple GPUs”, In Proceedings of IEEE, SAAHPC, ISSN: 2166-5133, p.54-63, Chicago, July 2012

2011 AND EARLIER

  • Eric Stahlberg, Thomas Steinke, Melissa C. Smith, Sunita Chandrasekaran, Barbara Chapman, “Heterogeneous Accelerated Bioinformatics, Perspectives for Cancer Research”, In Proceedings of ERSA 2011, co-located with WorldComp 2011, Nevada, USA, July 2011
  • Ayodunni Aribuki, Sunita Chandrasekaran, and Barbara Chapman, “Extending OpenMP for Heterogeneous Multicore Systems”, TECHCON, 2011, Semiconductor Research Corporation (SRC) Project, Austin, USA, September 2011
  • Ayodunni Aribuki, Sunita Chandrasekaran, Jae-Chang Hong, and Barbara Chapman, “Improving Heterogeneous Multicore Programming using OpenMP”, In Proceedings of Multicore Expo 2011, CA, USA
  • Sunita Chandrasekaran, HuiYan Cheah, Douglas. L. Maskell, “Automatic Scheduling and Mapping Techniques for mapping HLL applications on FPGA”, Many-Core and Reconfigurable Supercomputing Conference, MRSC 2011, Bristol, UK, April 2011.
  • Sunita Chandrasekaran, Shilpa Shanbagh, Douglas. L. Maskell, “A Dependency Graph based th Methodology for Parallelizing HLL Applications on FPGA”, In Proceedings of the 18 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Proceedings, (abstract only), FPGA 2010, Monterey, California, USA, February 2010.
  • Sunita Chandrasekaran, Avanthi B. Reddy, Shilpa Shanbagh, Douglas. L. Maskell, “HLL mapping to FPGA using a dependency analysis based graphical methodology”, Many-Core and Reconfigurable Supercomputing Conference - MRSC 2010, Rome, Italy, March 2010.
  • Kevin A. Huck, Oscar Hernandez, Van Bui, Sunita Chandrasekaran, Barbara Chapman, Allen D. Malony, Lois Curfman McInnes, Boyana Norris, “Capturing Performance Knowledge for Automated Analysis”, IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis -SC08, Austin, Texas, November 2008.
  • Sunita Chandrasekaran, Oscar Hernandez, Douglas Maskell, Barbara Chapman, Van Bui, “Compilation and Parallelization Techniques with Tool Support to realize Sequence Alignment Algorithm on FPGA and Multicore”,Workshop on New Horizons in Compilers, IEEE Int. Conf. on High Performance Computing (HiPC), Goa, India, 2007.

Contact