Edit Distance Problem in Java
The Edit Distance Problem is a classic algorithmic problem that involves finding the minimum number of operations required to transform one string into another. The operations allowed are insertion, deletion, and substitution of a single character. This problem is also known as Levenshtein distance or the minimum edit distance.
In this article, we will discuss how to solve the Edit Distance Problem in Java using dynamic programming.
Dynamic Programming Solution
The Edit Distance Problem can be solved using dynamic programming. We can define a two-dimensional array dp[][] where dp[i][j] represents the minimum number of operations required to convert the first i characters of string1 to the first j characters of string2.
To fill up the dp[][] table, we can use the following recurrence relation:
if string1.charAt(i-1) == string2.charAt(j-1)
dp[i][j] = dp[i-1][j-1]
else
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
Explanation of the Recurrence Relation
The recurrence relation above can be broken down into the following cases:
- If the i-th character of string1 is the same as the j-th character of string2, then no operation is needed. Hence, we can just copy the value of dp[i-1][j-1] to dp[i][j].
- If the i-th character of string1 is different from the j-th character of string2, then we have three options:
- We can insert a character into string1. In this case, dp[i][j] is equal to 1 + dp[i][j-1].
- We can delete a character from string1. In this case, dp[i][j] is equal to 1 + dp[i-1][j].
- We can substitute a character in string1. In this case, dp[i][j] is equal to 1 + dp[i-1][j-1].
We take the minimum of the three options and store it in dp[i][j].
Here is the Java code that implements the dynamic programming solution:
Filename: EditDistance.java
public class EditDistance {
public static int editDistance(String string1, String string2) {
int m = string1.length();
int n = string2.length();
int[][] dp = new int[m+1][n+1];
// fill up the first row and column
for(int i=0; i<=m; i++)
dp[i][0] = i;
for(int j=0; j<=n; j++)
dp[0][j] = j;
// fill up the dp[][] table using the recurrence relation
for(int i=1; i<=m; i++) {
for(int j=1; j<=n; j++) {
if(string1.charAt(i-1) == string2.charAt(j-1))
dp[i][j] = dp[i-1][j-1];
else
dp[i][j] = 1 + Math.min(dp[i][j-1], Math.min(dp[i-1][j], dp[i-1][j-1]));
}
}
return dp[m][n];
}
public static void main(String[] args) {
String string1 = "kitten";
String string2 = "sitting";
System.out.println(editDistance(string1, string2)); // output: 3
}
}
Output:
3
In this code, we first initialize the dp[][] table with the base cases. We set dp[i][0] to i and dp[0][j] to j because it takes i insertions to convert an empty string to string1 of length i and j deletions to convert string2 of length j to an empty string.
Next, we fill up the dp[][] table using the recurrence relation described earlier.
Finally, we return dp[m][n], which represents the minimum number of operations required to convert string1 to string2.
Here are some additional tips and tricks to keep in mind when working with the Edit Distance Problem in Java:
- The Edit Distance Problem can also be solved using recursion with memoization. However, the dynamic programming solution is more efficient as it avoids recomputing overlapping subproblems.
- When implementing the dynamic programming solution, it is important to initialize the dp[][] table with the base cases before filling it up using the recurrence relation. This ensures that the algorithm works correctly for the smallest input sizes.
- The Edit Distance Problem can be extended to handle more complex operations such as transpositions, which involve swapping adjacent characters. This can be done by modifying the recurrence relation to take into account transpositions.
- When dealing with large input sizes, it is important to optimize the space complexity of the algorithm. This can be done by using a two-row or one-row dp[][] table instead of a two-dimensional table.
- The Edit Distance Problem can also be solved using other algorithms such as the Wagner-Fischer algorithm and the Hirschberg algorithm. These algorithms are more efficient than the dynamic programming solution for certain types of inputs.
Filename: EditDistance.java
public class EditDistance {
public static int editDistance(String string1, String string2) {
int m = string1.length();
int n = string2.length();
int[][] dp = new int[m+1][n+1];
for (int i = 0; i <= m; i++) {
dp[i][0] = i;
}
for (int j = 0; j <= n; j++) {
dp[0][j] = j;
}
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (string1.charAt(i-1) == string2.charAt(j-1)) {
dp[i][j] = dp[i-1][j-1];
} else {
dp[i][j] = 1 + Math.min(dp[i][j-1], Math.min(dp[i-1][j], dp[i-1][j-1]));
}
}
}
return dp[m][n];
}
public static void main(String[] args) {
String string1 = "kitten";
String string2 = "sitting";
int editDistance = editDistance(string1, string2);
System.out.println("Edit Distance between " + string1 + " and " + string2 + " is " + editDistance);
}
}
Output:
Edit Distance between kitten and sitting is 3
Time complexity: O(mn)
Space complexity: O(mn)
In this example, we calculate the Edit Distance between the strings "kitten" and "sitting". The output shows that the Edit Distance between these two strings is 3. This means that it takes a minimum of 3 operations (2 substitutions and 1 insertion) to convert the string "kitten" to the string "sitting".
In Conclusion, The Edit Distance Problem is a classic algorithmic problem that can be solved using dynamic programming. In this article, we discussed how to implement the dynamic programming solution in Java. This problem has many practical applications such as spell correction, DNA sequencing, and natural language processing. Understanding how to solve this problem is an essential skill for any programmer who deals with string manipulation.