CISC181 Project 2
The Linguist's Assistant
Due: April 26, in class
You have been hired by a linguistics researcher to write a program
that performs some interesting word processing. In particular, this means
you are to write a program that reads a text file and produce a concordance - and alphabetical list of words found in the text, along with the frequency of occurrence. The output words should be in lower case and case should not matter in counting. For example,
both "The" and "the" should be counted as the word "the". The table should print the words in alphabetical order. For example, if the input file contained
The very large man took the small piece of very good candy from the small child.
The word frequencies are
candy 1 child 1 from 1 good 1
large 1 man 1 of 1 piece 1
small 2 the 3 took 1 very 2
Implementation details:
Your program should, of course, be well-organized and modular, and take the following into account. (Suggested functions follow, however, you are free to do however you'd like)
Program Execution:
The program should execute approximately as follows:
- Prompt the user for the input file name, open the file, and initialize the concordance to the empty list.
- Process in input file:
- get a word
- process the word - look for the word in the concordance. If it is there, update the counter; otherwise add the word to the list and set the counter to 1. Note, be sure to add words so that the list remains alphabetical.
- When the end of file is reached, print out the entire list.
Starter Files:
Here are some starter files - you may use these, or use something you create. These may help you break up the problem into its constituent parts.
// Here is the header file for the Word Concordance
// Program
#ifndef Words_H
#define Words_H
// Constants:
const int MAX_LINE_LENGTH = 100;
// WordNode type:
struct WordNode{
char *word;
int freq;
WordNode *next;
};
// prototypes for
// Related User-defined functions:
void InitFile( ifstream &input );
char* GetWord( ifstream &input );
void ProcessWord( WordNode *&list, char *word );
WordNode* LookupWord( WordNode *list, char *word );
WordNode* MakeNewNode(char *word);
void AddAtBeginning( WordNode *&first, char *word);
void AddInMiddle(WordNode *here, char *word);
void DisplayList( WordNode *list );
#endif
#include <iostream>
#include <fstream>
#include <iomanip>
#include <stdlib>
#include <string>
using namespace std;
// your header file gets included here, too!
#include"words.h"
// Function Definitions:
WordNode* LookupWord( WordNode *list, char *word ) {
// return position (address) of word in the list
// return NULL if not there
}
void ProcessWord( WordNode *&list, char *word ) {
// if word is there - increment the frequency count
// if word is before the first word in the list
// AddAtBeginning
// otherwise, ...
//
//
return;
}
void DisplayList( WordNode *list ) {
// display the concordance
return;
}
void InitFile( ifstream &input ) {
// replace this line with code that interactively
// gets a filename, and tries to open the file.
// eg: input.open("test.data");
return;
}
WordNode *MakeNewNode(char *word) {
// TWO calls to operator new are needed
// one for the node, one for the word
}
void AddAtBeginning( WordNode*& first, char *word) {
// make a node
// link it in
}
void AddInMiddle(WordNode *here, char *word) {
}
// File: words.cc
//
// General outline for program 2 - Word
// Concordance.
// libraries needed:
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib>
using namespace std;
#include"words.h"
int main() {
WordNode *WordList = NULL;
char *word;
ifstream input;
InitFile( input );
while( (word = GetWord( input ) ) != NULL )
ProcessWord( WordList, word );
DisplayList( WordList );
input.close();
return 0;
}
Hints:
- Before you start writing code, sit down and decide how your want to tackle the problem. Break the program up into parts.
- After you write your functions, create small test programs to make sure they work. For instance, after your write getword, write a test program that simply calls getword repeatedly and make sure it keeps returning words from a file, etc.
- Start on this early! I will give you a lecture period off in order to work on this, so take advantage of this, and ask questions EARLY.
Grading:
- Correct Program Operation: 65%
- Test runs: 10%
- Following Good Coding Standards / Program structure: 25%
- Extra Credit: 5% - Have your program eliminate all "trivial" words from the list.
We will define trivial words as words with 3 characters or less. First, have your program print out the whole list, then print the list with all the trivial words removed.
What to hand in:
Other Important Information:
- I hate to waste time even writing this, but all your work on the projects
MUST BE YOUR OWN. Specifically, you are NOT allowed to
work with another student, share or discuss solutions, or copy code from another student. Failure to adhere to this rule with be dealt with per University plagarism policies. Further, if you cheat on this project, the odds of you doing well on the test, and future CIS courses, are slim. Please do you own work.