Assignment 5: Basketball Hashing (60 pts)
Chris Tralie
Overview / Logistics / OOP Design
Learning Objectives
- Use object references to perform efficient lookups yielding constant time operations in a collection
- Leverage polymorphism to quickly swap out different implementations of abstract data types (LinkedMap and HashMap)
- Create tests to verify the correctness of a code implementation and to empirically explore its efficiency.
Purpose / Getting Started
The purpose of this assignment is to give you practice designing collections in C++, and to explore how the right implementation of an abstract data type can lead to much faster queries in an unordered collection. In particular, you will create a hash table implementation for the map data structure, and you will compare this to the linked map data structure experimentally to verify that it is superior.
Click here to review notes on hash tables. Then, you can obtain the code for this assignment by typing
git clone https://github.com/ursinus-cs174-s2022/HW5_BasketballHashing.git
You will be editing the files Makefile
, person.cpp
, player.cpp
playerlookup.cpp
, hashable.cpp
You will be creating the files hashmap.h
, hashmap.cpp
, stringtest.cpp
, and mapcheck.cpp
Overview of Code Design
In this assignment, you will implement your own HashMap
(hashmap.cpp
, hashmap.h
) data structure and compare it to a LinkedMap
(linkedmap.cpp
, linkedmap.h
) that has been provided for you. The follow design choices have been made for you:
-
Both
LinkedMap
andHashMap
will inherit from an abstract class known asMap
(map.h
), which has a number of methods you will need to implement for your hash map. -
The keys of a map are specified as pointers to
Hashable
(hashable.h
,hashable.cpp
) objects, which implement a methodgetHash()
to compute a hash code on that object -
The values of this map are specified as pointers to
Cloneable
(cloneable.h) objects, which are objects for which aclone()
method is provided to make deep copies. -
Since
Hashable
objects are alsoCloneable
, deep copies can be made of both the key and the value, and runtime polymorphism is leveraged so that calling theclone()
method on aCloneable*
orHashable*
pointer makes a clone of the correct type. It is up to the implemented map classes to clean up dynamic memory from cloned objects.
Overview of Data
To test LinkedMap
and HashMap
, you will load in information about NBA players born between 1913-1997 (data obtained at this link), including
- Name
- The college/school they attended
- Their height in centimeters
- Their weight in kilograms
- Their birth year
players.txt
. For example, the first 10 lines of the file store information about Curly Armstrong and Cliff Baker as follows:
To manage all of this, you have been provided with a class Player
(player.cpp
, player.h
) that encapsulates all this information in one place. You will need to load in the players in from a file and put them into a LinkedMap. Then, once you have implemented your HashMap structure, you should also load the players into a HashMap and compare the efficiency of get()
queries between the two data structures. The tasks below break this process down further.
Programming Tasks
Below are the tasks you should work through in sequence. A suggested timeline is as follows:
- By Wednesday 4/6, finish up player loading and database queries
- By Wednesday 4/13, finish up and debug your hash table implementation
- By Friday 4/15, complete all tasks (the final deadline)
Player Loading (8 Points)
Your first task should be to load in all of the players in the text file into a map by filling in the loadPlayers(Map* m)
method in the player.cpp
file. To warm up for this, familiarize yourself with the Map
abstract class and the LinkedMap
class. Then, look at Person.cpp
for an example of how to use the map. In particular, it accepts object references for both the key and value.
Hint: You should have a variable that keeps track of what line you're on as you're loading players.txt so that you know what field you're loading. Once you've loaded all 5 pieces of information, you can create a new object and add it to the map, and then move onto the next one.
Database Queries (7 Points)
Once you have loaded in the players, you should fill in the file playerlookup.cpp
to lookup a player in the map specified on the command line. For instance,
If a player is not in the database, you should output "not found"
HINT: The names of the players are stored as "Firstname Lastname", but you will need two command line arguments for the first name and last name. So you will have to use string concatenation to add them together with a space in order to match what's in the database.
String Hashing (5 pts)
We're now going to move towards making a hash map, but we first need to define how to hash objects of interest. I've provided an abstract class called Hashable
which has the abstract method getHash()
which objects extending this class will implement. Since we want to hash basketball players by their names, we'll define a hash function for a string:
Your Task:
Complete the wrapper class around strings called HashableString
by implementing the abstract method getHash()
in hashable.cpp
. You will be implementing the same hash function that Java uses for it's String
type. In particular, here is pseudocode for the algorithm
assuming that each character c
has been cast to a size_t
. Once you are finished this, you should create a file stringtest.cpp
to test out the hash codes on a few examples. You should add an entry to your makefile to automatically compile this based on its dependencies. Then, you should use it to output and verify the following hash codes by initializing objects of type HashableString
:
Hash Map Implementation (30 Pts)
You are now ready to create a hash table implementation of a map. Create a pair of files hashmap.h
and hashmap.cpp
, and add entries for them in your makefile. In these files, create a class HashMap
that implements the abstract class Map
. This should have the exact same public methods as LinkedMap
, but it will have different private variables. The main private variable should be an array of HashNode*
references, each representing the head of a linked list for a different bin. All references should start off as NULL
.
Then, you should implement all of the methods below. You can borrow heavily from the code in linkedmap.cpp
, but you will have to have an additional step of computing a hash to look up which bin's head you should start with. Below are the methods you need to implement. Be sure to develop incrementally!!. Be especially sure to compile every few lines of code you write to quickly flush out syntax errors.
-
(4 pts)
void put(Hashable* key, Cloneable* value)
: Put an item into the hash map, using the key's hash function. To keep this simple, you do not need to check if the key already exists, so it is okay to end up with duplicates of the same key. -
(4 pts)
Cloneable* get(Hashable* key)
: Return a reference to the value associated to this key, orNULL
if the key is not in the collection. -
(4 pts)
void remove(Hashable* key)
: Remove a key/value pair from the hash map, if the key exists in the table. -
(4 pts)
bool containsKey(Hashable* key)
: Return true if the key is in the hash map or false if it is not. - (4 pts)
Hashable** getKeyList(size_t* N)
: Dynamically create an array to hold references to all of the keys in this map in an arbitrary order, and return the length of this array by reference. - (4 pts) Create a destructor that properly cleans up dynamic memory. Be sure you also clean up dynamic memory outside of the destructor when necessary. You will lose points if there are any memory leaks!
(6 pts) To put this all together and test it, change the person.cpp
file to use a HashMap
instead of a LinkedMap
and make sure it still works. You should only have to change one line of code to make this swap! Then, do the same thing in playerlookup.cpp
and verify that still works as well. This is probably where you will discover a lot of bugs in your code, so be patient!
HashMap / LinkedMap Comparison (10 pts)
You will now compare your HashMap implementation to the LinkedMap implementation both to check the former's correctness, and also to compare their efficiency. First, you should increment the numOps
variable in each of your methods in HashMap
every time a hash function is computed and every time a node is visited. This will allow you to keep track of how many operations are happening under different queries (refer to how this was done in LinkedMap
if you are lost).
Next, you should create an executable file called mapcheck.cpp
that performs the following steps:
-
Create a map
m1
of typeLinkedMap
and a mapm2
of typeHashMap
, and fill them each in with the NBA players. - Reset the operation counts of each map.
-
Loop through all of the keys in
m1
and make sure they're inm2
by calling thecontainsKey()
method. If they're not, print out the ones that are missing to help you debug. -
Loop through all of the keys in
m2
and make sure they're inm1
by calling thecontainsKey()
method. If they're not, print out the ones that are missing to help you debug. - Report the number of operations in steps 3 and 4, and the average number of operations per player.
Assuming everything is working correctly, if you use 100 bins, for example, you should see the following stats: Try to see what happens when you increase or decrease the number of bins. Does the average number of operations per player change in a way that makes sense?