Project 1
CISC 320 Algorithms and Advanced Programming
-- Spring, 1999
Project Goal.
-
To study sorting algorithms in some detail.
-
To gain experience in the process of selecting/developing a good algorithm
for a given purpose and then producing a high performing implementation of
that algorithm.
-
Specifically to provide a sorting algorithm that is faster than
STL's sort on the specific kind of data described below. There will
be a prize for the team with the fastest sorter on a specific data set
of this kind.
Problem statement.
Many sorting situations have an initial data set that is partially
sorted. For example sorted data sets from each of several sources
may be the input. It is natural to suppose that it would be wise
to take advantage of this fact when designing an algorithm to produce
a sorted combination of these data sets. For example, this is precisely the
situation Foster McGeary dealt with in the ssn processing example described in
class. It is also often the case that there will be several records in the
input data which have the same value for the sorting key. This will be
true of our data, moreover it will be important that the sort method be
stable . That means that in the sorted result, the records
with the same key value will be given in the same relative order they
had in the input data. For example, suppose in the input data set
there are 3 records with the same key value and they are in positions
data[2], data[101], and data [23459]. Suppose further that there are
200 other data items with smaller key value. Then
these three items should end up in positions 201, 202, and 203, with
original data[2] --> data[201], data[101] --> data[202], etc.
In a nutshell, then, the problem is to produce a stable sort
routine which is very efficient when the input data set consists of
a few sorted sets concatenated together. The individual sorted sets
are rather large. Example: there may be 10 sorted sets of about 10,000
items strung together to form a set of 100,000 items to sort.
Details
-
Do this command once to create the illusion that the saunders' 320 course
directory is a subdirectory of your working directory:
--> ln -s ~saunders/320 320dir
Henceforward we'll assume that's been done.
-
Take a look at 320dir/sortMain.cc.
The jist of it is that the specified data set gets read into an array,
your sort routine is called, the result is checked for correctness,
and timing data is given.
There is no need to copy this
unless you want to make modified versions of it during development.
In the end, however, your code must work when linked to the compiled version
sortMain.o, exactly as it is .
-
Copy the incomplete file 320dir/sort.cc to your working directory.
Complete the sort routine given there.
You can compile thusly:
--> /opt/gnu2.8.1/bin/g++ -O5 sort.cc 320dir/sortMain.o -o sort
-
To test your code run it on the data sets
320dir/sortData.small and 320dir/sortData.large
For example:
--> sort <320dir/sortData.small
-
Prepare for sort competition day. On this day in class, each sort
routine will be run against a new data set. This data set will also
be an array consisting of about 10 sorted segments.
Each team must also be prepared to present a summary of the methods
they used and the reasons for their choices.
Deliverables and Due date
Sort competition day is Thursday, Mar 18.
Due on that day are printouts of the code and written
summary explaining your approach and reporting your time on
320dir/sortData.large.