AMAP: Automatically Mining Abbreviation Expansions in Programs
To enhance software maintenance and program comprehension tools
Emily Hill, Zachary P. Fry, Haley Boyd, Giriprasad Sridhara, Yana Novikova, Lori Pollock, K. Vijay-Shanker
Natural Language Program Analysis (NLPA) Group
Department of Computer & Information Sciences
University of Delaware
When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. We developed an automated approach to mining abbreviation expansions from source code to enhance software maintenance and program comprehension tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art [LFB 2007].
[LFB 2007] Dawn Lawrie, Henry Feild, and David Binkley. Extracting Meaning from Abbreviated Identifiers. In Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007), pages 213-222, 2007.
"AMAP: Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools." Emily Hill, Zachary P. Fry, Haley Boyd, Giriprasad Sridhara, Yana Novikova, Lori Pollock, and K. Vijay-Shanker. MSR 2008: 5th Working Conference on Mining Software Repositories. May 2008. Best Paper Award. [more] [Presentation and Notes: PDF]
DownloadsOur approach uses a number of word lists: