Double Hashing in Data Structure
Introduction
Hashing is a common technique used in data structures to map keys to indices in an array. The idea behind hashing is to use a hash function to convert a key into an array index. However, a hash function can produce collisions, where two different keys are mapped to the same index. Double hashing is a technique used to resolve collisions in hash tables.
What is Double Hashing?
Double hashing is a collision resolution technique in which the hash function generates a secondary hash function that is used to compute the next index to check. The secondary hash function is designed to produce a different value for each key, reducing the probability of collisions. Double hashing is a variation of open addressing, which means that the hash table is an array that can hold multiple items.
The basic idea of double hashing is to use two hash functions to generate a sequence of indices that are probed in order to find an empty slot in the hash table. The first hash function is used to calculate the initial index of the key, while the second hash function is used to calculate the offset from the initial index. The offset is added to the initial index, and the resulting index is probed. If the slot is empty, the key is inserted into the slot. If the slot is occupied, the next index in the sequence is probed until an empty slot is found.
Double hashing provides good performance because it produces a sequence of indices that are guaranteed to be unique for each key, reducing the probability of collisions. In addition, double hashing is easy to implement and does not require additional memory to store pointers to linked lists, which is a common technique used in separate chaining.
Double Hashing Algorithm
The double hashing algorithm can be described as follows:
- Initialize an empty hash table with a fixed size.
- Define two hash functions, h1(key) and h2(key).
- Calculate the initial index i = h1(key) % table_size.
- Calculate the offset j = h2(key) % table_size.
- Probe the hash table starting at index i.
- If the slot at index i is empty, insert the key into the slot and exit.
- If the slot at index i is occupied, increment i by j and repeat step 6 until an empty slot is found.
- If the entire hash table has been probed without finding an empty slot, the hash table is full.
The size of the hash table is an important factor in the performance of double hashing. If the hash table is too small, there will be a higher probability of collisions, which will result in longer search times. If the hash table is too large, there will be a lot of empty slots, which will waste memory.
Choosing the hash functions is also important in double hashing. The hash functions should be designed to produce a wide range of values to reduce the probability of collisions. In addition, the hash functions should be fast to compute to avoid slowing down the hash table operations.
Double Hashing Example
Let us consider an example to understand how double hashing works. Suppose we have a hash table of size 10 and the following keys need to be inserted into the hash table: 23, 34, 45, 56, 67, 78, 89, 90.
We can define the two hash functions as follows:
h1(key) = key % 10
h2(key) = 7 - (key % 7)
Using these hash functions, we can calculate the initial index and offset for each key as follows:
key = 23, i = 3, j = 5
key = 34, i = 4, j = 4
key = 45, i = 5, j = 3
key = 56, i = 6, j = 2
key = 67, i = 7, j = 1
key = 78, i = 8, j = 7
key = 89, i = 9, j = 6
key = 90, i = 0, j = 4
Now, we can start probing the hash table starting at the initial index for each key. For example, for key 23, we start at index 3 and check if the slot is empty. Since it is empty, we insert the key into the slot. For key 34, we start at index 4 and find that the slot is already occupied, so we add the offset 4 to the initial index and check the slot at index 8. This slot is also occupied, so we add the offset 4 again and check the slot at index 2. This slot is empty, so we insert the key into the slot.
For the remaining keys, we follow a similar process until all keys are inserted into the hash table.
The final hash table looks like this:
0: 90
1:
2: 23
3:
4: 34
5: 45
6: 56
7: 67
8: 78
9: 89
As we can see, all keys have been successfully inserted into the hash table using double hashing.
Conclusion
Double hashing is a collision resolution technique used in hash tables to reduce the probability of collisions. Double hashing generates a sequence of indices based on two hash functions that are used to probe the hash table until an empty slot is found. Double hashing is fast, memory-efficient, and easy to implement. However, it requires good hash functions to generate a unique sequence of indices for each key and can suffer from clustered data. Overall, double hashing is a useful technique in data structures for implementing hash tables.