Hash Tables
Chris Tralie
In class, we showed how it is possible to implement a map data structure by using a linked list (the so-called LinkedMap), but that this leads to slow (linear time) average and worst case times for the get()
, containsKey()
, and remove()
operations.
Motivated by this, is possible to use something called a hash table to reduce the time of queries in a map data structure, which you will implement in homework 5. The idea is to have a deterministic function called a hash function that gives a particular number, or a hash code for each object. One can then jump right to a bucket corresponding to this hash code, which contains a linked list of all objects in the collection that have this hash code (objects that have the same hash code are said to be in "collision"). For instance, if we do hashing on a select bunch of Harry Potter characters, and their hash code is the month during which they were born, then we have the following hash table:
Then, if we are looking for Neville Longbottom (i.e. Matthew Lewis), for example, we compute his birth month of 6 and jump straight to bucket 6 and follow two pointers to get to him.
Number of Buckets
We can choose a fixed number of buckets, regardless of the range of our hash function, by simply taking the modulus of our hash code by the number of buckets, giving us the bucket indices
[0, 1, 2, ..., NBuckets-1]
For example, if we want to use only 5 buckets, then Percy Weasley in bucket 11 above would get bucket index
11 % 5 = 1
There is a classic space vs time efficiency trade-off here, however. Assuming we have a hash function which evenly distributes objects over the available buckets, then the more buckets we use, the faster queries and additions/removals will be, but the more memory is needed to store the buckets. Conversely, if we use fewer buckets, we need less memory, but these operations take longer. The most extreme case is a single bucket, in which case a HashTable reduces to a LinkedMap.