Profile Picture of William Killian

William Killian

Parallel Runtimes — Performance Portability — Machine Learning

Ph.D. student at the University of Delaware

Advised by John Cavazos

Collaborator at LLNL

Contributor to RAJA

Mentor at OLCF GPU Hackathons

Download CV




Existing methods of exploring a search space are primarily limited to a form iterative compilation which requires the exploration of the entire space. Many times this optimization space is too large to exhaustively search. With an autotuning framework, the search space for optimizations can be explored automatically and intelligently without any need for user interaction.

Directive-Based Languages

Many applications cannot be rewritten, but maintainers are expected to improve the performance. Directive-based languages allow a developer to annotate the existing program for guiding features such as specific optimizations to apply, auto-parallelization of code regions, and even targeting accelerators (e.g. NVIDIA GPUs and Intel Coprocessors) automatically.

Programming Models

Directive-based languages are often inflexible to various programming models easily without resorting to (ab)use of the preprocessor. Programming models such as RAJA place that burden on efficient tag dispatching and SFINAE in modern C++. I leverage RAJA as a tuning framework, which enables autotuning on directive-based languages by simply changing a single C++ type.


  • [PAPER] R. Searles, L. Xu, W. Killian, T. Vanderbruggen, T. Forren, J. Howe, Z. Pearson, C. Shannon, J. Simmons, and J. Cavazos, “Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection,” 25th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2017) – St. Petersburg, Russia, 2017. [DOWNLOAD]
  • [POSTER] W. Killian, A. Kunen, I. Karlin, J. Cavazos, “Discovering Optimal Execution Policies in KRIPKE using RAJA,” ACM Student Poster Competition, 29th International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) – Salt Lake City UT, 2016. [DOWNLOAD]
  • [POSTER] W. Killian, G. Zagaris, B. Ryujin, B. Pudliner, J. Cavazos, “Portable Performance of Large-Scale Physics Applications: Toward Targeting Heterogeneous Exascale Architectures Through Application Fitting,” ACM Student Poster Competition, 28th International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) – Austin TX, 2015. [DOWNLOAD]
  • [PHD PRELIM] W. Killian, J. Cavazos, “Using Graph-Based Characterization for Predictive Modeling of Vectorizable Loop Nests,” University of Delaware – Newark DE, 2015. [DOWNLOAD]
  • [WHITEPAPER] W. Killian, R. Miceli, E. Park, M. Alvarez Vega, J. Cavazos, “Performance Improvement in Kernels by Guiding Compiler Auto-Vectorization Heuristics,” Partnership for Advanced Computing in Europe (PRACE) Performance Prediction, 2014. [DOWNLOAD]
  • [POSTER] W. Killian, W. Wang, E. Park, J. Cavazos, “Energy Tuning of Polyhedral Kernels on Multicore and Many-Core Architectures,” at SEAK: DAC Workshop on Suite of Embedded Applications and Kernels (SEAK 2014), San Francisco, CA, USA, 2014. [DOWNLOAD]
  • [WORKSHOP] S. Grauer-Gray, W. Killian, R. Searles, and J. Cavazos, “Accelerating Financial Applications on the GPU,” in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6), New York, NY, USA, pp. 127–136, ACM, 2013. [DOWNLOAD]



I have an awesome Github page.



This project contains codes for Black-Scholes, Monte-Carlo, Bonds, and Repo financial applications which can be run on the CPU and GPU. All original algorithms were ported from QuantLib to CUDA, OpenCL, HMPP, and OpenACC. We showed that certain algorithms were able to achieve several hundred times speedup over sequential CPU.


PolyBench is a collection of benchmarks containing static control parts. The purpose is to uniformize the execution and monitoring of kernels, typically used in past and current publications. PolyBench/ACC originated from Pouchet's original PolyBench/C suite. We added CUDA, OpenCL, OpenACC, HMPP, and OpenMP versions of the original code.

RAJA Portability Layer

RAJA is a collection of C++ software abstractions, being developed at Lawrence Livermore National Laboratory (LLNL), that enable architecture portability for HPC applications.


PolyBench/RAJA originated from Pouchet's original PolyBench/C suite. All Polybench kernels have been converted to use the RAJA portability layer.


University of Delaware

Ph.D. Computer and Information Science — 2017 (est)

M.S. Computer and Information Science — 2013

  • Courses for MS Degree
  • Logic
  • Algorithms
  • Computer Architecture
  • Computer Networks II
  • Advanced Compiler Construction
  • Advanced Software Engineering
  • Advanced Parallel Programming
  • Text Analysis in Software Engineering
  • Wireless Networks and Mobile Computing
  • Additional Coursework
  • Computer Graphics
  • Machine Learning
  • Databases
  • Applications of Financial Technology
  • HPC Data Analytics
  • Formal Methods of HPC

Millersville University

B.S. Computer Science — 2011

  • Research Projects
  • 3-D physics simulation
  • Investigating parallel programming
  • Efficient rendering of BSP maps
  • Image classification using machine learning
  • List of Advanced Courses
  • Computer Graphics
  • Computer Networks
  • Topics: Mobile Programming
  • Software Engineering
  • Artificial Intelligence
  • Database Management Systems
  • Parallel Programming


I welcome you to contact me through one of the methods below.

If you need reach me via traditional mail, please consult my CV