BiomedicalFigureAndTextMining

Biomedical Figure and Text Mining



Proteins are complex biological polymers that are commonly considered as the workhorses of cells. They mediate virtually all the cellular functions. Correctly identifying and characterizing Protein-Protein interactions (PPI) is an important task for thoroughly understanding the molecular mechanisms within cells and the roles played by individual proteins in these processes. Despite great efforts that have been made by life science researchers to identify PPIs through experiments and then document them through publications, there still lacks an effective means for retrieving PPI data from literature.

We seek to investigate the potential of combining information extracted from figures and their associated caption for discovering experimental evidence of interaction between proteins from articles stored in publicly available databases. Our work is motivated by the idea that figures in biomedical articles often constitute direct evidence of experimental results. Therefore, image analysis methods can be coupled with text-based methods to improve knowledge discovery.


Architecture of ePPI:Experimental Protein-Protein Interaction Explorer


Publications

[ An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents ] Luis Lopez, Jingyi Yu, Cecilia Arighi, Hagit Shatkay, Hongzhan Huang, Cathy Wu. IEEE International Conference Bioinformatics and Biomedicine (BIBM) 2011

[ Robust Segmentation of Biomedical Figures Toward an Image-based Document Retrieval ] Luis Lopez, Jingyi Yu, Catalina O. Tudor, Cecilia Arighi, Hongzhan Huang, K. Vijay-Shanker, Cathy Wu. IEEE International Conference Bioinformatics and Biomedicine (BIBM) 2012

[ An Image-Text Approach for Extracting Experimental Evidence of Protein-Protein Interactions in the Biomedical Literature ] Luis Lopez, Jingyi Yu, Cecilia Arighi, Manabu Torii, Hongzhan Huang, K. Vijay-Shanker, Cathy Wu.  Δ