Project 1

CISC 320 Algorithms and Advanced Programming -- Spring, 1999

Project Goal.

To study sorting algorithms in some detail.
To gain experience in the process of selecting/developing a good algorithm for a given purpose and then producing a high performing implementation of that algorithm.
Specifically to provide a sorting algorithm that is faster than STL's sort on the specific kind of data described below. There will be a prize for the team with the fastest sorter on a specific data set of this kind.

Problem statement. Many sorting situations have an initial data set that is partially sorted. For example sorted data sets from each of several sources may be the input. It is natural to suppose that it would be wise to take advantage of this fact when designing an algorithm to produce a sorted combination of these data sets. For example, this is precisely the situation Foster McGeary dealt with in the ssn processing example described in class. It is also often the case that there will be several records in the input data which have the same value for the sorting key. This will be true of our data, moreover it will be important that the sort method be stable . That means that in the sorted result, the records with the same key value will be given in the same relative order they had in the input data. For example, suppose in the input data set there are 3 records with the same key value and they are in positions data[2], data[101], and data [23459]. Suppose further that there are 200 other data items with smaller key value. Then these three items should end up in positions 201, 202, and 203, with original data[2] --> data[201], data[101] --> data[202], etc.

In a nutshell, then, the problem is to produce a stable sort routine which is very efficient when the input data set consists of a few sorted sets concatenated together. The individual sorted sets are rather large. Example: there may be 10 sorted sets of about 10,000 items strung together to form a set of 100,000 items to sort.

Details

Do this command once to create the illusion that the saunders' 320 course directory is a subdirectory of your working directory:
--> ln -s ~saunders/320 320dir
Henceforward we'll assume that's been done.
Take a look at 320dir/sortMain.cc. The jist of it is that the specified data set gets read into an array, your sort routine is called, the result is checked for correctness, and timing data is given. There is no need to copy this unless you want to make modified versions of it during development. In the end, however, your code must work when linked to the compiled version sortMain.o, exactly as it is .
Copy the incomplete file 320dir/sort.cc to your working directory. Complete the sort routine given there. You can compile thusly:
--> /opt/gnu2.8.1/bin/g++ -O5 sort.cc 320dir/sortMain.o -o sort
To test your code run it on the data sets 320dir/sortData.small and 320dir/sortData.large For example:
--> sort <320dir/sortData.small
Prepare for sort competition day. On this day in class, each sort routine will be run against a new data set. This data set will also be an array consisting of about 10 sorted segments. Each team must also be prepared to present a summary of the methods they used and the reasons for their choices.

Deliverables and Due date Sort competition day is Thursday, Mar 18. Due on that day are printouts of the code and written summary explaining your approach and reporting your time on 320dir/sortData.large.