Extracting Social Networks from News Articles
As a class project for CISC889: Information Extraction in the spring of 2004, I designed an application that mined CNN.com news articles for social relationships. The method I used was a high-precision, low-recall method, so in many cases, very few relations were found. However, in one case, I found a number of useful and meaningful social relations.
Summary Generation using Marcu's RST summarizer
I gave a presentation of a project on generating summaries. The rough idea is that people have only really looked at content selection, not at generation of the summary. The project I implemented for CISC883: Natural Language Generation worked towards this goal.
Email Classification using Naive Bayes
As a teaching assistant, I get a lot of emails from students. When I have multiple different courses at the same time, this is also a problem. Although I did this project to fulfill the requirements of CISC889: Machine Learning in the Spring of 2004, I came up with the idea before then. I designed an implementation of Naive Bayes to predict the folder that each new email should go in. While I didn't exhaust all of my ideas on the matter, I found that I could complete the artificial task I arranged with 80-90% accuracy. Subsequently, I derived a measure of confidence in the predictions to only move emails to a folder if the confidence was high enough. This method achieved roughly 95% precision (correct folder) and 90-95% recall (percent of emails classified). The primary features used in classifying email were the sender's email address, keywords in the subject, and keywords in the body of the email. When I have the opportunity, I'll place the presentation I gave on this here. I have decided, however, not to continue work on this project, because an existing tool, POPFile, has implemented Naive Bayes email sorting.
Lexical Chains
I gave a presentation of Silber and McCoy's work in summarization with lexical chains in the Fall of 2003 for CISC882: NLP. Here is the presentation I gave with minor corrections:
The Word Scramble Problem
Latent Semantic Indexing (LSI) and Latent Semantic Analysis (LSA)
I gave a presentation of LSI and LSA in the Spring of 2003 for CISC889: Statistical NLP. Here it is with minor corrections: