AMAP: Automatically Mining Abbreviation Expansions in Programs
To enhance software maintenance and program comprehension tools

 

Overview

When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. We developed an automated approach to mining abbreviation expansions from source code to enhance software maintenance and program comprehension tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art [LFB 2007].

References

[LFB 2007] Dawn Lawrie, Henry Feild, and David Binkley. Extracting Meaning from Abbreviated Identifiers. In Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007), pages 213-222, 2007.

 

Publications

"AMAP: Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools." Emily Hill, Zachary P. Fry, Haley Boyd, Giriprasad Sridhara, Yana Novikova, Lori Pollock, and K. Vijay-Shanker. MSR 2008: 5th Working Conference on Mining Software Repositories. May 2008. Best Paper Award. [more] [Presentation and Notes: PDF]

 

Downloads

Our approach uses a number of word lists:
  • A dictionary derived from the Ispell English Word Lists with contractions, possessives, and proper nouns removed.
  • A list of common contractions. This list was derived from words with apostrophes in the ispell english word list.
  • A list of proper nouns of length 4 or longer, derived from the ispell english word list. We restricted proper nouns to length 4 or more to avoid finding long forms such as 'Io' for 'io'. This means our proper nouns do not include abbreviations for the days of the week or months of the year. The proper noun list is used with the dictionary to restrict our short form patterns to only match long forms that are actually dictionary words.
  • A modified stop word list derived from a freely available stop word list for English. We removed from the stop list any words that could be considered content words in software, such as face, member, or case.
  • The Java stop list of Java reserved words used in the LFB technique. Currently, the LFB technique is the only other known approach for automatically expanding abbreviations, which we use in our evaluation.
We are still actively developing AMAP. The AMAP research prototype is currently implemented as an Eclipse plug-in and perl scripts. Please e-mail for the code.