A printable PDF is available.

# Assignment 3 - Due Tuesday, October 25

Objectives: There are four objectives for this assignment:

• Learn how to create and use a Java `Comparator` class
• To explore different data structures and algorithms for finding the smallest elements in a dataset
• Compare experimental to theoretical running time results
• Write a brief experimental results report comparing algorithms

Background: Distributed with the initial code for this assignment is precinct-level voting data from the 2012 presidential election, covering 37 states and a total of 86,476 precincts, based on data from the Harvard Election Data Archive (available under a Creative Commons license) that was cleaned up slightly for this assignment. The `vote2012.dat` file in the project's root directory contains this data, with each line listing a state, county, and precinct followed by the total number of votes for the Democratic candidate (Barack Obama) followed by the number of votes for the Republican candidate (Mitt Romney), with all data fields separated by semicolons. The provided code reads in this file and creates an `ArrayList` of `PrecinctVotes` objects for you to work with. Note that not all lines have all fields, so some of the strings fields (like the "county" information) may be empty -- this shouldn't affect your solution to this assignment.

Using n to represent the total number of precincts (so for this particular data, n=86,476), your goal is to identify the m precincts (for some m <= n that is provided) with the narrowest margin of victory. You will test the following four algorithms for this problem, carefully comparing the performance of the algorithms -- Algorithm 1 is implemented for you (it's just two lines after you create the necessary comparator!), but you'll need to implement the others:

Algorithm 1: Sorting all the data first.
This algorithm sorts all n items, and then copies the smallest m items into a new list. This algorithm takes Theta(nlogn) time.

Algorithm 2: Keeping the smallest m items in an array.
This option processes the data one element at a time, keeping track of the m smallest items in an array (you decide whether the data in the array should be kept in sorted order or not). This algorithm takes Theta(nm) time.

Algorithm 3: Keeping the smallest m items in a red-black tree.
This algorithm uses a Java `TreeSet` to keep all items in a red-black tree. The first m items are simply added to the tree, and then once there are m items in the tree the algorithm can check in O(logm) time whether a new item should be inserted into the tree (removing the largest item so the size remains m) -- to find the largest item in the tree (for comparison) you should use the `last()` method, and to remove the largest use `pollLast()`. This algorithm takes Theta(nlogm) time.

Algorithm 4: Keeping the smallest m items in a binary heap.
For this algorithm, keep the data in a max heap so that you can quickly locate the maximum item out of the m values you are tracking. Use the standard Java `PriorityQueue` class, but the `Comparator` should make a max heap! Once m items are put into a max heap, you can quickly locate the largest item (using method `peek()`) and remove it if necessary (using method `poll()`. This algorithm takes Theta(nlogm) time.
Every algorithm should return the m smallest values in a `List<PrecinctVotes>`, but the order in the list isn't important. Note that the asymptotic time used by the last two algorithms is the same, but the constants hidden by the asymptotic notation will be different. Which do you think will be the more efficient implementation?

The first thing you should do is to create a `Comparator` class for `PrecinctVotes` objects. In order to sort `PrecinctVotes` objects or put them into any kind of ordered data structure, you will need to create a class that implements the `Comparator<PrecinctVotes>` interface. Implement this as a nested class inside the `PrecinctVotes` class so that the comparator can directly access the `PrecinctVotes` instance variables. You should implement this comparator in such a way that `PrecinctVotes` object `a` is less than `b` if the absolute value of the vote difference is smaller, or if the vote difference is the same and the state, county, and precinct (in that order) is smaller. Note that for one of the implementations, you'll actually need the reverse of this comparator -- see the `reversed()` method in the `Comparator` class for an easy solution!
Next, implement the four algorithms described above, and test them thoroughly to make sure each one works correctly. Other than selecting the comparator this should be completely generic and independent of the `PrecinctVotes` class. Run your program with the profiler to time how long each implementation takes to process the precinct data, when the value of m is set to 100, 1000, 5000, 10000, and 20000. For each value of m, run it once in the profiler without recording times, and then run it three more times and average the times for each algorithm. Record the times in a table, like this: