DAA: Rabin Karp Algorithm

Rabin Karp Algorithm

The Rabin Karp or Karp Rabin algorithm is used to matching a specific pattern in the string.

It uses the technique of hashing to match a specific text.

There also exists a naive algorithm for pattern matching. The naive algorithm iterates in the whole string one by one matching each character of our pattern.

Rabin Karp works in a different manner. It uses the hash function, a tool in which we match the larger input value to a smaller output value.

The algorithm calculates the hash value of the pattern and the current substring in the given string. If both hash and pattern values match, it then matches the individual letter in a string with the pattern.

So the main task here is to find the correct hash value of the length of the pattern then match it. 

The hash function in this algorithm calculates an integer value, and this integer value of the string is basically the numeric value of the string.

As there can be larger values of pattern, the numeric values cannot be stored in the form of integers. This optimization is done using modular arithmetic.

We need to rehash again to delete the last integer and add the new one in an optimized manner.

It can be done by following formula:

Let us consider string  length n pattern length m.

hash( text[s+1 .. s+m] ) = ( d ( hash( text[s .. s+m-1]) – text[s]*h ) + text[s + m] ) mod q

hash( text[s .. s+m-1] ) : Hash value at shift s.

hash( text[s+1 .. s+m] ) : Hash value at next shift (or shift s+1)

d: Number of characters in the alphabet

q: A prime number

h: d^(m-1)

Explanation of the above expression:

Using the idea of simple mathematics, compute the decimal value of the current window from the previous window.

For instance, Consider the pattern length is 3 and string is “23456”

After computing the value of the first window (which is “234”), the length comes as 234.

Moving ahead for the new window “345”,

We will apply the formula by eliminating 200 and add 5 (234 – 2*100)*10 + 5 and get 345.

C++ code:

 #include <bits/stdc++.h>
 using namespace std;
 #define d 256 // There can be maximum 256 characters
 void searchRabinKarp(char pattern[], char text[], int prime_no)
 {
     int M = strlen(pattern), i; // M holds the pattern size initially
     int N = strlen(text), j; // N holds the input string size initially
     int hash_pattern = 0, hash_text = 0; // To find the hash value for pattern  and hash text for text
     int h = 1;
     for (i = 0; i < M - 1; i++)
         h = (h * d) % prime_no; // Find h
     // We find the first window of text and hash value of pattern at begining
     for (i = 0; i < M; i++) {
         hash_pattern = (d * hash_pattern + pattern[i]) % prime_no;
         hash_text = (d * hash_text + text[i]) % prime_no;
     }
     // One by one slide pattern over text
     for (i = 0; i <= N - M; i++) {
         if (hash_text == hash_pattern) // If the values of hash match check each character
         {
             for (j = 0; j < M; j++) /* Check for characters one by one */
             {
                 if (text[i + j] != pattern[j])
                     break;
             }
             if (j == M) // if hash_pattern == hash_text that means  pattern[0...M-1] = text[i, i+1, ...i+M-1]
                 cout << "Pattern found at index " << i << endl;
         }
         if (i < N - M) // Find the hash value for the next window by removing the lead digit and add a trailing digit
         {
             hash_text = (d * (hash_text - text[i] * h) + text[i + M]) % prime_no;
             if (hash_text < 0) // If hash_text becomes a negative number convert to positive
                 hash_text = (hash_text + prime_no);
         }
     }
 }
 int main() // Main function for calling Rabin Karp
 {
     char text[] = "A TEST TO TEST THE RABIN KARP ALGORITHM";
     char pattern[] = "TEST";
     int prime_no = 101; // Selecting a prime number
     // Function Call
     searchRabinKarp(pattern, text, prime_no);
     return 0;
 } 

C code –

#include <stdio.h>
 #include<string.h>
 #define d 256 // There can be maximum 256 characters
 void searchRabinKarp(char pattern[], char text[], int prime_no)
 {
     int M = strlen(pattern), i; // M holds the pattern size initially
     int N = strlen(text), j; // N holds the input string size initially
     int hash_pattern = 0, hash_text = 0; // To find the hash value for pattern  and hash text for text
     int h = 1;
     for (i = 0; i < M - 1; i++)
         h = (h * d) % prime_no; // Find h
     // We find the first window of text and hash value of pattern at begining
     for (i = 0; i < M; i++) {
         hash_pattern = (d * hash_pattern + pattern[i]) % prime_no;
         hash_text = (d * hash_text + text[i]) % prime_no;
     }
     // One by one slide pattern over text
     for (i = 0; i <= N - M; i++) {
         if (hash_text == hash_pattern) // If the values of hash match check each character
         {
             for (j = 0; j < M; j++) /* Check for characters one by one */
             {
                 if (text[i + j] != pattern[j])
                     break;
             }
             if (j == M) // if hash_pattern == hash_text that means  pattern[0...M-1] = text[i, i+1, ...i+M-1]
                 printf( "Pattern found at index %d\n",i);
         }
         if (i < N - M) // Find the hash value for the next window by removing the lead digit and add the trailing digit
         {
             hash_text = (d * (hash_text - text[i] * h) + text[i + M]) % prime_no;
             if (hash_text < 0) // If hash_text becomes a negative number convert to positive
                 hash_text = (hash_text + prime_no);
         }
     }
 }
 int main() // Main function for calling Rabin Karp
 {
     char text[] = "A TEST TO TEST THE RABIN KARP ALGORITHM";
     char pattern[] = "TEST";
     int prime_no = 101; // Selecting a prime number
     // Function Call
     searchRabinKarp(pattern, text, prime_no);
     return 0;
 } 

Time complexity:

Best case and average case: O(n+m)

Worst Case: O(nm); When all characters are the same in the string as a pattern