Hi. My name is Robbie, and I am a PhD student at the University of Delaware.

My research area focuses on high performance computing, specifically leveraging high-level programming models to target and optimize machine learning code running on parallel architectures and next-generation HPC systems.

My PhD advisor is Dr. Sunita Chandrasekaran.



Existing methods of exploring a search space are primarily limited to a form iterative compilation which requires the exploration of the entire space. Many times this optimization space is too large to exhaustively search. With an autotuning framework, the search space for optimizations can be explored automatically and intelligently without any need for user interaction.

Directive-Based Languages

Many applications cannot be rewritten, but maintainers are expected to improve the performance. Directive-based languages allow a developer to annotate the existing program for guiding features such as specific optimizations to apply, auto-parallelization of code regions, and even targeting accelerators (e.g. NVIDIA GPUs, AMD GPUs, and Intel Coprocessors) automatically.


On the path to exascale computing, the performance-per-watt needs to increase by a couple orders of magnitude. Currently, many simple cores allow for a higher PPW than most conventional computation systems. Accelerators such as GPUs and coprocessors feature simple cores. We need to transition from the existing programming models to a host–device execution model to exploit accelerators well.

Graph Kernels

Graphs are commonly used to represent large amounts of data in a way that is both meaningful and aesthetically pleasing. However, it is often the case that there is still too much information for a human to make sense of on their own. Graph kernels are functions that compute the similarity between graphs, and we can leverage these powerful algorithms to draw conclusions about large datasets.


  • [JOURNAL] R. Searles, S. Herbein, T. Johnston, M. Taufer, S. Chandrasekaran, “Creating a Portable, High-Level Graph Analytics Paradigm For Compute and Data-Intensive Applications,” at International Journal of High Performance Computing and Networking, IJHPCN 2017 Vol. 10.
  • [CONFERENCE] R. Searles, L. Xu, W. Killian, T. Vanderbruggen, T. Forren, J. Howe, Z. Pearson, C. Shannon, J. Simmons, and J. Cavazos , “Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection,” at 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, St. Petersburg, Russia, 2017.
  • [WORKSHOP] R. Searles*, S. Herbein*, and S. Chandrasekaran, “A Portable, High-Level Graph Analytics Framework Targeting Distributed, Heterogeneous Systems,” in Proceedings of the Third Workshop on Accelerator Programming Using Directives (WACCPD '16). IEEE Press, Piscataway, NJ, USA, 79-88.
  • [WORKSHOP] S. Grauer-Gray, W. Killian, R. Searles, and J. Cavazos, “Accelerating Financial Applications on the GPU,” in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, (New York, NY, USA), pp. 127–136, ACM, 2013.
  • [CONFERENCE] S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, “Auto-tuning a High-Level Language Targeted to GPU Codes,” at Innovative Parallel Computing 2012, InPar2012, (San Jose, CA, USA), IEEE, 2012.


Energy Tuning

How can applications be optimized for minimal energy consumption? Is there a correlation between performance and total energy consumption? How does SIMD and multicore affect total energy usage?

Performance Tuning

How can applications be optimized for minimal execution time? How can optimizations be applied without rewriting code? Are these optimization applied directly to source-code or are they compiler-driven?


How can we optimize our applications to run across multiple nodes in an HPC system? How can we adapt existing applications to run on these systems with minimal overhead to the programmer?


I have an awesome Github page.


Lab Projects


This project contains our implementations of FSK, Triangle Enumeration, and Graph Assaying mentioned in the publications above. These applications are accelerated with CUDA, and they have been adapted for multi-node scaling in HPC systems using Apache Spark. We showed that using Spark in conjunction with a GPU framework can yield excellent performance on modern and future HPC systems.


This project contains codes for Black-Scholes, Monte-Carlo, Bonds, and Repo financial applications which can be run on the CPU and GPU. All original algorithms were ported from QuantLib to CUDA, OpenCL, HMPP, and OpenACC. We showed that certain algorithms were able to achieve several hundred times speedup over sequential CPU.


PolyBench is a collection of benchmarks containing static control parts. The purpose is to uniformize the execution and monitoring of kernels, typically used in past and current publications. PolyBench/ACC originated from Pouchet's original PolyBench/C suite. We added CUDA, OpenCL, OpenACC, HMPP, and OpenMP versions of the original code.


University of Delaware

B.S. Computer Science — 2012

  • Introduction to Software Engineering
  • Computer Architecture
  • Operating Systems
  • Introduction to Algorithms
  • Computer Graphics
  • Artificial Intelligence
  • Computer Networks I
  • Compiler Design
  • Advanced Software Engineering
  • Digital Intellectual Property
  • System Security
  • Mobile Application Development

University of Delaware

M.S. Computer Science — 2016

  • Algorithm Design and Analysis
  • Computer Networks II
  • Advanced Parallel Programming
  • Computer Systems: Architecture
  • Wireless Networks and Mobile Computing
  • Database Systems
  • Advanced Computer Graphics
  • Compiler Construction
  • HPC and Data Analytics
  • Financial Services Analytics
  • Programming Heterogeneous Systems


I welcome you to contact me through one of the methods below.

If you need reach me via traditional mail, please consult my CV.