Python Hash Table

An Introduction to Hashing

A technique which used to identify a particular object uniquely from a set of similar objects is known as Hashing. Some real-life examples of hashing implementations include:

  • A unique number is assigned to each book available in libraries that helps to determine the data related to the book, including its last issue details or its exact position in the library and a lot.
  • A unique roll number is assigned to each student studying in schools, or universities can help retrieve data about them.

In both the above examples, the books and students are hashed to a unique number.

Let’s assume that we have an object, and all we need is to assign a key to the object so that the searching part becomes easier. Usually, we use a basic array-like data structure for storing the key/value pair, where keys are represented by integers and can be used directly as an index for storing the values. But what if the keys are pretty larger and we can’t use them directly as an index. In such cases, the hashing comes into action.

Hashing allows the users to convert the large keys into small keys with its hash functions. A data structure, also known as the Hash Table, is used for the values assigned to the keys. The concept of Hashing is based on distributing the entries (key/value pairs) consistently across an array. A key (converted key) is assigned to each data element. We can easily access the data element with the help of that key in O(1) time. The hash function processes an index using the key and suggests where an item can be inserted or found.

Hashing Implementation can be done in two steps:

  • We can convert a data element into an integer using a Hash Function. We can use the element as an index for storing the original element falling into the hash table.
  • Then, these elements are stored in the hash table; so that it can be easily accessible with hashed keys.
hash = hash_func(key)
index = hash % arraysize

The hash is the array size-independent in the above method and can be reduced to an index (a number between 0 and arraysize – 1) using the modulo operator (%).

Understanding the Hash Table

A data structure used for storing values with a pair of keys and values is known as the Hash Table. A hash function helps generate a unique key that is then assigned to each value.

We can access the value associated with the key using a key name. This process makes the values searching in a hash table pretty faster, irrespective of the total items available in the hast table.

Understanding the Hash Function

 A function used for mapping an arbitrary size data set to a fixed size data set, which falls into the hash table, is known as the Hash Function. The hash function returns some values called hash codes, hash sums, hash values or hashes.

A good hash function plays a significant role in processing a sound hashing mechanism. Some of the basic characteristics of a good hash function are as follows:

  1. A good hash function should be easily computable and, most importantly, not become an algorithm in itself.
  2. The values of a good hash function must be distributed uniformly and consistently across the hash table and not clustering.
  3. A good hash function should minimize the collision numbers. Usually, the collision occurs when the same value is generated for the pairs of elements.

Note: Collisions are bound to happen regardless of how good a hash function is. Thus, to manage the hash table performance, it becomes necessary to handle collisions using numerous techniques available for collision resolution.

Apart from the characteristics stated above, the hash functions are often one-way functions. This statement implies that we can get a hash from a string but cannot get the original string from a hash due to the voluntary data loss implemented in the function. This is not a feature that required for every hash function; however, it becomes significant while securing cryptographically.

Some of the popular hash algorithms include NTLM, SHA-1, SHA-2, MD5 and many more.

What is Python hash() function?

Python provides a built-in function called the hash() function that produces an object’s hash. The hash() function generates the hash as an integer once it takes an object as input.

The hash() function helps invoke the input object’s __hash__() method internally. Thus, we can implement the __hash__() method for creating a custom hashable class. It returns an integer relied on the object’s internal state.

Let’s see some basic interpretation of the hash() function in Python.

hash(2)
 2
 hash(20)
 20
 hash(20.00)
 20
 hash(20.02)
 46116860184272916
 hash(-20.02)
 -46116860184272916 

From the above example, some might wonder why these hashes seem to have a different length? So, the Python hash() function provides integers objects. On a standard 64-bit Python 3 interpreter, these objects are often represented with 24 bytes.

Moreover, we can also observe that an integer's hash value is the same as its value by default. We can also note that this function works irrespectively of the value type we are hashing, which implies that the integer '2' and the float '2.0' have the same hash: 2.

How is it significant?

As we have discussed earlier, the hash functions are often one-way functions. In cases where two distinct objects have an identical hash, it is impracticable to reverse the process, starting from a hash and returning to the original object. In such cases, the detail regarding the type of the original hashed object is gone lost.

In addition, we can also observe that the decimal numbers have some different value from their value in hashes while hashing numbers. Even negative numbers have their negative hashes. But what if we try hashing the same number we got after hashing the decimal value? It is quite interesting that we will get the same hash. Let's see the following example to understand the same:

hash(0.2)
 461168601842738816
 >>> hash(461168601842738816)
 461168601842738816
 hash(0.2) == hash(461168601842738816)
 True 

In the above example, we can see that the integer number 461168601842738816 has the same hash as the number 0.2.  It is normal, as we have discussed so far, about the hash functions. Since we cannot have a fixed-length value representing infinite values when we can hash any string or number to get a fixed-length value, this statement implies that there have to be duplicated values. And that’s what we have already discussed; they are known as collisions. The hash values are said to collide when two objects share identical hashes.

Hashing a string is quite similar to hashing a numeric value. Let’s see an example given below to understand the same.

hash("Tutorial and Example")
6183100623509011035

From the above example, we can observe that a string can be hashed, producing a numeric value.

However, it is not necessary that the Python Interpreter would return the same output while processing the commands from the above example. This problem arises because, in Python 3.3 and above, the bytes objects and string values are salted with a random value before the hash processing.

This statement means that the string values are modified with some random values that change every time we start an interpreter before hashing.

However, we can set the PYTHONHASHSEED environment variable to an integer value for overriding this behavior. Note that this integer value should be greater than zero before we start the interpreter.

Understanding the Python hashable Types

Let’s discuss if any of the Python types are hashable. Well, we would rather say no as only immutable types are hashable in Python by default. In such cases, the user uses an immutable container (such as a tuple), and the content should also be immutable for being hashable.

Note that Python Interpreter will return a TypeError when we try to get the hash from an unhashable type. Let’s see the following example:

hash(["T", "h", "e", "T", "u", "t", "o", "r", "i", "a", "l"])
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 TypeError: unhashable type: 'list' 

However, the custom-defined objects in Python are hashable, and the hashes of these objects are derived from their IDs. This statement means that two distinct instances of the same class have different hashes by default. This can be observed in the following example:

class Vehicle():
      speed = 0
      damage = 0
      direction = 0
 first_vehicle = Vehicle()
 second_vehicle = Vehicle()
 print(hash(first_vehicle))
 print(hash(second_vehicle))
 176239617762 

Output:

176239607158
176239617762

In the above example, we have created two distinct instances in one custom object. Both instances have different hash values. However, we can implement the __hash__() method for modifying the behavior inside the custom class.

Implementing Hash Tables using Python Dictionaries

As we have discussed earlier, the hash table is one of the data structure types which generates the data element index value or the address using the hash function. This data structure provides faster access to the data because the index value acts as a key for the data value. In simpler terms, we can also say that the Hash table helps store the key-value pairs where the keys are generated using a hashing function.

So, the functions of a data element or item, including the search and insertion functions, becomes pretty much faster because the key values turn themselves into the array index, which helps store the data.

Let’s discuss how the data structure, such as Dictionary is used for implementing the hash tables. The Keys in the Dictionary meet the requirements stated below:

  • In the Dictionary, the keys are hashable; that is, the keys are generated using the hashing function by generating unique output for every unique value provided to the hash function.
  • The data elements have no fixed order in the Dictionary.

Thus, the data structure, such as a dictionary, can help to implement the hash table in the following ways:

  • To Access Values in Dictionary
  • To Update Dictionary
  • To Delete elements in Dictionary

To Access Values in Dictionary

We can access the elements in the dictionary using the standard square brackets along with the key for obtaining the value of elements. Here is an example as follows that illustrates the same:

# Declaring a dictionary 
 my_dict = {
     'First_Name': 'Carol',
     'Last_Name': 'Cloud',
     'Gender': 'Female',
     'Age': 23,
     'Profession': 'Software Engineer'    
     }
 # Accessing the elements of dictionary using its key
 print("my_dict['First_Name']: ", my_dict['First_Name'])
 print("my_dict['Last_Name']: ", my_dict['Last_Name'])
 print("my_dict['Gender']: ", my_dict['Gender'])
 print("my_dict['Age']: ", my_dict['Age'])
 print("my_dict['Profession']: ", my_dict['Profession'])
 The above snippet of code should produce an Output, as shown below:
 my_dict['First_Name']:  Carol
 my_dict['Last_Name']:  Cloud
 my_dict['Gender']:  Female
 my_dict['Age']:  23
 my_dict['Profession']:  Software Engineer 

Update Elements in Dictionary

We can add a new entry or a key-value pair for updating a dictionary. In addition to this, we can also modify an existing entry and even delete an existing entry to update a dictionary. Here is an example as follows that illustrates the same:

# Declaring a dictionary 
 my_dict = {
     'First_Name': 'Carol',
     'Last_Name': 'Cloud',
     'Gender': 'Female',
     'Age': 23,
     'Profession': 'Software Engineer'    
     }
 # Updating Existing entry
 my_dict['Age'] = 24
 # Adding New entry
 my_dict['Year_of_Experience'] = 3
 print("my_dict['Age']: ", my_dict['Age'])
 print("my_dict['Year_of_Experience']: ", my_dict['Year_of_Experience'])
 The above snippet of code should produce an Output, as shown below:
 my_dict['Age']:  24
 my_dict['Year_of_Experience']:  3 

Delete elements in Dictionary

We can either delete an individual data element of the dictionary or clear the dictionary's entire content. In addition to this, we can also remove the entire dictionary using a single operation. This single operation can remove an entire dictionary explicitly using the del command. Here are some examples as follows that illustrates the same:

  • Deleting an existing entry using a key
# Declaring a dictionary 
 my_dict = {
     'First_Name': 'Carol',
     'Last_Name': 'Cloud',
     'Gender': 'Female',
     'Age': 4,
     'Profession': 'Software Engineer',    
     'Year_of_Experience': 3
     }
 # Deleting an existing entry using a key
 del my_dict['Last_Name']
 print("my_dict['Last_Name']: ", my_dict['Last_Name']) 

The above snippet of code should produce an Output, as shown below:

Traceback (most recent call last):
   File "D:\Python\dict.py", line 14, in <module>
     print("my_dict['Last_Name']: ", my_dict['Last_Name'])
 KeyError: 'Last_Name' 
  • Clearing all the entries in the dictionary
# Declaring a dictionary 
 my_dict = {
     'First_Name': 'Carol',
     'Last_Name': 'Cloud',
     'Gender': 'Female',
     'Age': 4,
     'Profession': 'Software Engineer',    
     'Year_of_Experience': 3
     }
 # Clearing all the entries in the Dictionary
 my_dict.clear()
 print("my_dict['Gender']: ", my_dict['Gender'])
 print("my_dict['Age']: ", my_dict['Age']) 

The above snippet of code should produce an Output, as shown below:

Traceback (most recent call last):
   File "D:\Python\dict.py", line 14, in <module>
     print("my_dict['Gender']: ", my_dict['Gender'])
 KeyError: 'Gender' 
  • Deleting an Entire dictionary
# Declaring a dictionary 
 my_dict = {
     'First_Name': 'Carol',
     'Last_Name': 'Cloud',
     'Gender': 'Female',
     'Age': 4,
     'Profession': 'Software Engineer',    
     'Year_of_Experience': 3
     }
 # Printing an entire dictionary
 print(my_dict)
 # Deleting an entire dictionary
 del my_dict
 print("my_dict['First_Name']: ", my_dict['First_Name']) 

The above snippet of code should produce an Output, as shown below:

{'First_Name': 'Carol', 'Last_Name': 'Cloud', 'Gender': 'Female', 'Age': 4, 'Profession': 'Software Engineer', 'Year_of_Experience': 3}
 Traceback (most recent call last):
   File "D:\Python\dict.py", line 17, in <module>
     print("my_dict['First_Name']: ", my_dict['First_Name'])
 NameError: name 'my_dict' is not defined