Hashing and its Applications
Hashing
Hashing refers to transforming plain text data in such a way that even if it is leaked for some reason, no one would be able to make sense of it. The encrypted data is used to store data in various data structures where the information is not supposed to be decoded or needs to be stored safely.
As the name suggests, Hashing is widely used to secure various data in different programs. Hashing the data requires a key, and the complete encryption and decryption process depends on it. The value referred to as the 'key' of your data is usually a mixed and reduced length form of the original value. The reduced length value or the key is generally used for searching the individual or particular records, making the program much faster and simpler.
Hashing can also be used for the database indexing process, deriving the data from it using keys and making it way faster and more secure. In general, a key can be anything safe which could be used to identify the information; it could be the combination of symbols, digits or alphabets; it can even be as unique as a fingerprint. There are various applications of hashing; let us look at this in more detail.
Various applications of hashing
1. Password verification
2. Integrity Checks/Message digests
3. Data Structures
4. Compiler Operation
5. Robin-Karp algorithm
Password Verification
Ever wondered what if the database of any social media website gets hacked? Would the hacker have access to your passwords and retrieve all your information? The apparent response to this question is: Absolutely, NO!
As of today, almost every good developer uses hash functions to secure their data in the programs. Similarly, the passwords of all the users of such websites are not just stored in the database as plain texts. Neither is encrypted as encrypted data can be accessed by decrypting it, and the only secure way to keep the information is to hash them.
For example:
A one-way hashing function MD5 is applied so that the stored value of your password is hashed. Usually, this function is not used for passwords as storing the data would be simple, but once the password is hashed, there is no possible way to retrieve or get the original value of your password. To verify the password, you would have to try every possible password until the hash value of the attempted password matches with the stored hash value. You can find various functions like MD5 in the hashlib module of Python documentation.
But the significant advantages of the MD5 function that you can see here are that:
- Whatever may be the length of the input or the plain text, the hashed value is of fixed size and does not depend on your input format
- Uniqueness; for every unique input, the mapped hash value is also distinct.
Integrity checks or message digest
- Hash collisions generally refer to the matching of two hash values. Hash collisions in a well-programmed hash algorithm are rare; hence, hashing functions can be easily used for integrity checks.
- Sometimes the user generally needs to check whether the two files received have the same content. Still, the problem in this is that to check each and every sub directly of the file using any traditional approach would be very messy and time-consuming, but by hashing, this can be done quickly.
- Another example could be of storing data online. While storing or uploading some of your data on various cloud services available today, integrity checks are essential to know in case someone disrupts your file content or not.
- This can be done by calculating the hash value before uploading the file on a particular cloud storage platform.
- After uploading, download your file again and calculate its hash value. Suppose the file would have been tempered by someone; that 'change' would change the hash value of the new file.
- Tempering with any file and not changing its hash value is almost not possible. This is a simple way to check the integrity of such platforms.
- One of many cryptographic hash algorithms used for integrity checks or message digests is SHA256.
Data structures
Various data structures in different programming languages use hash tables. The main motive behind the approach is to reduce the time complexity of searching data from such data structures. A key-value pair is created and stored in a table in such a way that each key against every value is supposed to be unique. Two different keys could point to the same value. Some examples of data structures which use the same approach are:
- Unordered set and an Unordered map in C++.
- Dictionary in Python.
- Hash set and Hash map in Java.
Linking file name and path together
As we all know that while dealing with several tasks searching a particular file stored in your system is a commonly used operation for the search operations, Hash values are always proven to be the better choice. So connecting or linking a directory's path with its name would help locate it in the system quickly.
As for today, various model file systems allow you to store many files in the same folder or directly various operations like following parts listing files or finding five from a directory usually can say a lot about the file system's performance.
Now the common question is how one would design such a file system that allows a higher file capacity for each directory, and the performance will be unbeatable. The valid response for this problem would be using File Name Hashing. Simply describing File Name Hashing is the process of deriving a reproducing, known path from the name of your file. For example, a file named JavaTpoint.GIF would be stored in the file system as:
/J/Ja/JavaTpoint.GIF
This approach would provide you with a lot of subdirectories like; say; if we even limit the number of files in each directory to 1000, it would provide us with a capacity of
[26 *(26*26) ]* 1000 = 17576K files.
This technique can be used to solve various problems in traditional file systems.
But, using just the name of your file directly to create the path would mess up the distribution in your file system, making it unbalanced. For instance, if almost 90% of files start with the letters 'WA', this approach will put all of them in the same directory.
W/WA/ files.
Here comes the application of Hashing. Using the hash code of the value of the string object 'file name' would be an appropriate way to solve this problem.
Hashing file names is an unbeatable technique for creating a discrete or diverse file system with a wide range of directories and convenient functions for accessing data as Hashing provides constant time searching.
Hence while searching for a particular file, specify its path by calculating its hash code and using it to access the file. It is used in various professional file systems today.
Compiler operation
Everyone uses compilers. But you may not know that behind the scenes of compiler design, Hashing is one of the critical techniques used to make it efficient and faster. Did the question ever cross your mind? How does your compiler understand or identify every keyword in a constant amount of time? Whatever may be the size of the keyword, how is it identified in a matter of nanoseconds?
As of now, we can expect you to know the answer to this.
YES, you are absolutely correct!
Hashing is used here too.
Two process keywords differently and identify other identifiers or literals in your program. An efficient way to distinguish between the keywords and the literals is required to compile the program successfully. For this purpose, all the keywords are already stored in a set while designing the compiler. The set is implemented here using a hash table, assigning a unique hash code to every keyword and providing constant time search efficiency.
Compilers hence work efficiently and provide you with a great experience.
Rabin Karp algorithm
One of the several vital applications of hashing is this Rabin Karp algorithm. This algorithm is a pretty famous technique used for string searching, and it is used to find any set of patterns from a string. Plagiarism checkers today use this algorithm to find the similarities between different texts, efficiently saving your time. Rabin Karp algorithm uses hash functions to identify similar string patterns.