Data Structures Tutorial

Data Structures Tutorial Asymptotic Notation Structure and Union Array Data Structure Linked list Data Structure Type of Linked list Advantages and Disadvantages of linked list Queue Data Structure Implementation of Queue Stack Data Structure Implementation of Stack Sorting Insertion sort Quick sort Selection sort Heap sort Merge sort Bucket sort Count sort Radix sort Shell sort Tree Traversal of the binary tree Binary search tree Graph Spanning tree Linear Search Binary Search Hashing Collision Resolution Techniques

Misc Topic:

Priority Queue in Data Structure Deque in Data Structure Difference Between Linear And Non Linear Data Structures Queue Operations In Data Structure About Data Structures Data Structures Algorithms Types of Data Structures Big O Notations Introduction to Arrays Introduction to 1D-Arrays Operations on 1D-Arrays Introduction to 2D-Arrays Operations on 2D-Arrays Strings in Data Structures String Operations Application of 2D array Bubble Sort Insertion Sort Sorting Algorithms What is DFS Algorithm What Is Graph Data Structure What is the difference between Tree and Graph What is the difference between DFS and BFS Bucket Sort Dijkstra’s vs Bellman-Ford Algorithm Linear Queue Data Structure in C Stack Using Array Stack Using Linked List Recursion in Fibonacci Stack vs Array What is Skewed Binary Tree Primitive Data Structure in C Dynamic memory allocation of structure in C Application of Stack in Data Structures Binary Tree in Data Structures Heap Data Structure Recursion - Factorial and Fibonacci What is B tree what is B+ tree Huffman tree in Data Structures Insertion Sort vs Bubble Sort Adding one to the number represented an array of digits Bitwise Operators and their Important Tricks Blowfish algorithm Bubble Sort vs Selection Sort Hashing and its Applications Heap Sort vs Merge Sort Insertion Sort vs Selection Sort Merge Conflicts and ways to handle them Difference between Stack and Queue AVL tree in data structure c++ Bubble sort algorithm using Javascript Buffer overflow attack with examples Find out the area between two concentric circles Lowest common ancestor in a binary search tree Number of visible boxes putting one inside another Program to calculate the area of the circumcircle of an equilateral triangle Red-black Tree in Data Structures Strictly binary tree in Data Structures 2-3 Trees and Basic Operations on them Asynchronous advantage actor-critic (A3C) Algorithm Bubble Sort vs Heap Sort Digital Search Tree in Data Structures Minimum Spanning Tree Permutation Sort or Bogo Sort Quick Sort vs Merge Sort Boruvkas algorithm Bubble Sort vs Quick Sort Common Operations on various Data Structures Detect and Remove Loop in a Linked List How to Start Learning DSA Print kth least significant bit number Why is Binary Heap Preferred over BST for Priority Queue Bin Packing Problem Binary Tree Inorder Traversal Burning binary tree Equal Sum What is a Threaded Binary Tree? What is a full Binary Tree? Bubble Sort vs Merge Sort B+ Tree Program in Q language Deletion Operation from A B Tree Deletion Operation of the binary search tree in C++ language Does Overloading Work with Inheritance Balanced Binary Tree Binary tree deletion Binary tree insertion Cocktail Sort Comb Sort FIFO approach Operations of B Tree in C++ Language Recaman’s Sequence Tim Sort Understanding Data Processing

Hashing and its Applications

Hashing

Hashing refers to transforming plain text data in such a way that even if it is leaked for some reason, no one would be able to make sense of it. The encrypted data is used to store data in various data structures where the information is not supposed to be decoded or needs to be stored safely.

As the name suggests, Hashing is widely used to secure various data in different programs. Hashing the data requires a key, and the complete encryption and decryption process depends on it. The value referred to as the 'key' of your data is usually a mixed and reduced length form of the original value. The reduced length value or the key is generally used for searching the individual or particular records, making the program much faster and simpler.

Hashing can also be used for the database indexing process, deriving the data from it using keys and making it way faster and more secure. In general, a key can be anything safe which could be used to identify the information; it could be the combination of symbols, digits or alphabets; it can even be as unique as a fingerprint. There are various applications of hashing; let us look at this in more detail.

Various applications of hashing

1. Password verification

2. Integrity Checks/Message digests

3. Data Structures

4. Compiler Operation

5. Robin-Karp algorithm

Password Verification

Ever wondered what if the database of any social media website gets hacked? Would the hacker have access to your passwords and retrieve all your information? The apparent response to this question is: Absolutely, NO!

As of today, almost every good developer uses hash functions to secure their data in the programs. Similarly, the passwords of all the users of such websites are not just stored in the database as plain texts. Neither is encrypted as encrypted data can be accessed by decrypting it, and the only secure way to keep the information is to hash them.

For example:

A one-way hashing function MD5 is applied so that the stored value of your password is hashed. Usually, this function is not used for passwords as storing the data would be simple, but once the password is hashed, there is no possible way to retrieve or get the original value of your password. To verify the password, you would have to try every possible password until the hash value of the attempted password matches with the stored hash value. You can find various functions like MD5 in the hashlib module of Python documentation.

But the significant advantages of the MD5 function that you can see here are that:

  1. Whatever may be the length of the input or the plain text, the hashed value is of fixed size and does not depend on your input format
  2. Uniqueness; for every unique input, the mapped hash value is also distinct.

Integrity checks or message digest

  • Hash collisions generally refer to the matching of two hash values. Hash collisions in a well-programmed hash algorithm are rare; hence, hashing functions can be easily used for integrity checks.
  • Sometimes the user generally needs to check whether the two files received have the same content. Still, the problem in this is that to check each and every sub directly of the file using any traditional approach would be very messy and time-consuming, but by hashing, this can be done quickly.
  • Another example could be of storing data online. While storing or uploading some of your data on various cloud services available today, integrity checks are essential to know in case someone disrupts your file content or not.
  • This can be done by calculating the hash value before uploading the file on a particular cloud storage platform.
  • After uploading, download your file again and calculate its hash value. Suppose the file would have been tempered by someone; that 'change' would change the hash value of the new file.
  • Tempering with any file and not changing its hash value is almost not possible. This is a simple way to check the integrity of such platforms.
  • One of many cryptographic hash algorithms used for integrity checks or message digests is SHA256.

Data structures

Various data structures in different programming languages use hash tables. The main motive behind the approach is to reduce the time complexity of searching data from such data structures. A key-value pair is created and stored in a table in such a way that each key against every value is supposed to be unique. Two different keys could point to the same value. Some examples of data structures which use the same approach are:

  1. Unordered set and an Unordered map in C++.
  2. Dictionary in Python.
  3. Hash set and Hash map in Java.

Linking file name and path together

As we all know that while dealing with several tasks searching a particular file stored in your system is a commonly used operation for the search operations, Hash values are always proven to be the better choice. So connecting or linking a directory's path with its name would help locate it in the system quickly.

As for today, various model file systems allow you to store many files in the same folder or directly various operations like following parts listing files or finding five from a directory usually can say a lot about the file system's performance.

Now the common question is how one would design such a file system that allows a higher file capacity for each directory, and the performance will be unbeatable. The valid response for this problem would be using File Name Hashing. Simply describing File Name Hashing is the process of deriving a reproducing, known path from the name of your file. For example, a file named JavaTpoint.GIF would be stored in the file system as:

 /J/Ja/JavaTpoint.GIF

This approach would provide you with a lot of subdirectories like; say; if we even limit the number of files in each directory to 1000, it would provide us with a capacity of

[26 *(26*26) ]* 1000 = 17576K files.

This technique can be used to solve various problems in traditional file systems.

But, using just the name of your file directly to create the path would mess up the distribution in your file system, making it unbalanced. For instance, if almost 90% of files start with the letters 'WA', this approach will put all of them in the same directory.

W/WA/ files.

Here comes the application of Hashing. Using the hash code of the value of the string object 'file name' would be an appropriate way to solve this problem.

Hashing file names is an unbeatable technique for creating a discrete or diverse file system with a wide range of directories and convenient functions for accessing data as Hashing provides constant time searching.

Hence while searching for a particular file, specify its path by calculating its hash code and using it to access the file. It is used in various professional file systems today.

Compiler operation

Everyone uses compilers. But you may not know that behind the scenes of compiler design, Hashing is one of the critical techniques used to make it efficient and faster. Did the question ever cross your mind? How does your compiler understand or identify every keyword in a constant amount of time? Whatever may be the size of the keyword, how is it identified in a matter of nanoseconds?

As of now, we can expect you to know the answer to this.

YES, you are absolutely correct!

Hashing is used here too.

Two process keywords differently and identify other identifiers or literals in your program. An efficient way to distinguish between the keywords and the literals is required to compile the program successfully. For this purpose,  all the keywords are already stored in a set while designing the compiler. The set is implemented here using a hash table, assigning a unique hash code to every keyword and providing constant time search efficiency.

Compilers hence work efficiently and provide you with a great experience.

Rabin Karp algorithm

One of the several vital applications of hashing is this Rabin Karp algorithm. This algorithm is a pretty famous technique used for string searching, and it is used to find any set of patterns from a string. Plagiarism checkers today use this algorithm to find the similarities between different texts, efficiently saving your time. Rabin Karp algorithm uses hash functions to identify similar string patterns.



ADVERTISEMENT
ADVERTISEMENT