Hi. My name is Robbie, and I am a former PhD student at the University of Delaware. I now work as a Solutions Architect at NVIDIA.

My graduate research area focused on high performance computing, specifically leveraging high-level programming models to target and optimize computational science applications running on parallel architectures and next-generation HPC systems.

My PhD advisor is Dr. Sunita Chandrasekaran.



On the path to exascale computing, the performance-per-watt needs to increase by a couple orders of magnitude. Currently, many simple cores allow for a higher PPW than most conventional computation systems. Accelerators such as GPUs contain thousands of cores. We need to transition from existing programming models to a host–device execution model to exploit accelerators well.

Directive-Based Languages

Many applications cannot be rewritten, but programmers are expected to improve their performance. Directive-based languages allow a developer to annotate the existing program for guiding features such as specific optimizations to apply, auto-parallelization of code regions, and even targeting accelerators (e.g. NVIDIA GPUs, AMD GPUs, and Intel Coprocessors) automatically.


Existing methods of exploring an optimization space are primarily limited to a form iterative compilation, which requires the exploration of the entire space. Many times this optimization space is too large to exhaustively search. With an autotuning framework, the search space for optimizations can be explored automatically and intelligently without any need for user interaction.

Graph Kernels

Graphs are commonly used to represent large amounts of data in a way that is both meaningful and aesthetically pleasing. However, it is often the case that there is still too much information for a human to make sense of on their own. Graph kernels are functions that compute the similarity between graphs, and we can leverage these powerful algorithms to draw conclusions about large datasets.


  • [JOURNAL] E. Wright, M. Ferrato, A. Bryer, R. Searles, J. Perilla, S. Chandrasekaran, “Accelerating prediction of chemical shift of protein structures on GPUs: Using OpenACC,” in PLOS Computational Biology 2020.
  • [DISSERTATION] R. Searles, “Creating a Portable Programming Abstraction for Wavefront Patterns Targeting HPC Systems”
  • [JOURNAL] R. Searles, S. Chandrasekaran, W. Joubert, O. Hernandez, “MPI + OpenACC: Accelerating Radiation Transport Mini-Application, Minisweep, on Heterogeneous Systems,” in Computer Physics Communications, CPC 2018.
    DOI: 10.1016/j.cpc.2018.10.007
  • [CONFERENCE] R. Searles, S. Chandrasekaran, W. Joubert, O. Hernandez, “Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures,” at The Platform for Advanced Scientific Computing, PASC 2018.
    DOI: 10.1145/3218176.3218228
  • [CONFERENCE] Millad Ghane, Sunita Chandrasekaran, R. Searles, Margaret Cheung, Oscar Hernandez, “Path Forward for Softwarization to Tackle Evolving Hardware,” at Proc. SPIE 10652, Disruptive Technologies in Information Sciences 2018.
    DOI: 10.1117/12.2304813
  • [JOURNAL] R. Searles, S. Herbein, T. Johnston, M. Taufer, S. Chandrasekaran, “Creating a Portable, High-Level Graph Analytics Paradigm For Compute and Data-Intensive Applications,” at International Journal of High Performance Computing and Networking, IJHPCN 2017 Vol. 10.
    DOI: 10.1504/IJHPCN.2017.10007922
  • [CONFERENCE] R. Searles, L. Xu, W. Killian, T. Vanderbruggen, T. Forren, J. Howe, Z. Pearson, C. Shannon, J. Simmons, and J. Cavazos , “Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection,” at 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, St. Petersburg, Russia, 2017.
  • [WORKSHOP] R. Searles*, S. Herbein*, and S. Chandrasekaran, “A Portable, High-Level Graph Analytics Framework Targeting Distributed, Heterogeneous Systems,” in Proceedings of the Third Workshop on Accelerator Programming Using Directives (WACCPD '16). IEEE Press, Piscataway, NJ, USA, 79-88.
  • [WORKSHOP] S. Grauer-Gray, W. Killian, R. Searles, and J. Cavazos, “Accelerating Financial Applications on the GPU,” in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, (New York, NY, USA), pp. 127–136, ACM, 2013.
  • [CONFERENCE] S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, “Auto-tuning a High-Level Language Targeted to GPU Codes,” at Innovative Parallel Computing 2012, InPar2012, (San Jose, CA, USA), IEEE, 2012.


  • [CONFERENCE] R. Searles, S. Chandrasekaran, W. Joubert, O. Hernandez, “Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures,” at The Platform for Advanced Scientific Computing, PASC 2018.


Energy Tuning

How can applications be optimized for minimal energy consumption? Is there a correlation between performance and total energy consumption? How does SIMD and multicore affect total energy usage?

Performance Tuning

How can applications be optimized for minimal execution time? How can optimizations be applied without rewriting code? Are these optimization applied directly to source-code or are they compiler-driven?


How can we optimize our applications to run across multiple nodes in an HPC system? How can we adapt existing applications to run on these systems with minimal overhead to the programmer?


Lab Projects


The Minisweep proxy application is part of the Profugus radiation transport miniapp project and reproduces the computational pattern of the sweep kernel of the Denovo Sn radiation transport code. Denovo was one of six applications selected for early application readiness on ORNL’s Titan system, and it was used for acceptance testing of ORNL's Summit supercomputer.


This project contains our implementations of FSK, Triangle Enumeration, and Graph Assaying mentioned in the publications above. These applications are accelerated with CUDA, and they have been adapted for multi-node scaling in HPC systems using Apache Spark. We showed that using Spark in conjunction with a GPU framework can yield excellent performance on modern and future HPC systems.


This project contains codes for Black-Scholes, Monte-Carlo, Bonds, and Repo financial applications which can be run on the CPU and GPU. All original algorithms were ported from QuantLib to CUDA, OpenCL, HMPP, and OpenACC. We showed that certain algorithms were able to achieve several hundred times speedup over sequential CPU.


PolyBench is a collection of benchmarks containing static control parts. The purpose is to uniformize the execution and monitoring of kernels, typically used in past and current publications. PolyBench/ACC originated from Pouchet's original PolyBench/C suite. We added CUDA, OpenCL, OpenACC, HMPP, and OpenMP versions of the original code.


University of Delaware

B.S. Computer Science — 2012

  • Introduction to Software Engineering
  • Computer Architecture
  • Operating Systems
  • Introduction to Algorithms
  • Computer Graphics
  • Artificial Intelligence
  • Computer Networks I
  • Compiler Design
  • Advanced Software Engineering
  • Digital Intellectual Property
  • System Security
  • Mobile Application Development

University of Delaware

M.S. Computer Science — 2016

  • Algorithm Design and Analysis
  • Computer Networks II
  • Advanced Parallel Programming
  • Computer Systems: Architecture
  • Wireless Networks and Mobile Computing
  • Database Systems
  • Advanced Computer Graphics
  • Compiler Construction
  • HPC and Data Analytics
  • Financial Services Analytics
  • Programming Heterogeneous Systems


I welcome you to contact me through one of the methods below.

If you need reach me via traditional mail, please consult my CV.