I am currently working in the DAPLDS project lead by Dr. Michela Taufer. The goal of this project consists in exploring the multi-scale nature of algorithmic adaptations in protein-ligand docking by using distributed volunteered computing systems. In particular I'm currently in charge of Docking@Home, a volunteer computing system for docking of proteins. My research in this area includes detection of native-like docking structures under uncertainty, as well as finding opportunities to improve performance of distributed systems by introducing adaptivity on them.
My research interests include:
- Exploring machine learning techniques applied to scientific problems such as: astrophysics and boinformatics.
- Introducing automatic decision making process to global distributed computing environments.
- Applying intelligent approaches to scheduling in volunteer computing systems.
Selected projects:
2009 EmBOINC. The BOINC Emulator is a trace driven emulator that statistically models thousands of clients in a client/server volunteer computing paradigm. I designed and implemented EmBOINC as an open-source research tool to investigate the impact of scheduling, generation, and validation policies in the performance of BOINC projects. EmBOINC is implemented in C and C++ and can be conditionally compiled in the current BOINC distribution. Read more at http://gcl.cis.udel.edu/projects/emboinc/
2008 – 2009 Docking@Home. Docking@Home (D@H) is a volunteer computing project comprising more than 25,000 volunteered computers. D@H performs high-throughput virtual screening of protein-ligand docking. My achievements in this project consist on identifying statistical flaws in the scoring function of the docking algorithm, as well as developing a method to accurately identify native-like docked structures under uncertainty. The middleware is BOINC (C, C++ and MySql). The analysis is in Matlab and Perl. http://docking.cis.udel.edu/
2007 Automatic generation of scheduling policies for volunteer computing (VC) projects. I designed and implemented a distributed genetic algorithm to automatically generate scheduling policies in a VC environment. Contrary to human-designed policies, this system was able to produce a set of scheduling policies capable of keeping high throughput across different VC projects. In addition, those policies, were robust to various levels of volatility and heterogeneity of the environment. This project was implemented in C, MPI, and Perl.
2004 Identification of stellar populations in galactic spectra. Using the widths and shape of certain lines in a galactic spectra I was able to detect the age of different stellar populations in a galaxy. To do so, I designed and implemented a ‘hierarchical ensemble’ of classifiers. Results exhibited an improved accuracy compared to traditional template-based searching methods for both: synthetic and real galactic spectra. The preprocessing of spectra was written in Matlab, and the hierarchical ensemble in C.
