Duplicate Characters Problem in Java
The Duplicate Characters Problem in Java is a common programming challenge that involves identifying or removing duplicate characters from a given string and this problem is relevant in various contexts, such as data manipulation, text processing, and algorithmic tasks. The goal is to analyze a string and determine which characters appear more than once.
Example 1:
Input
S = “programming”
Output
Duplicate characters: r g m
Explanation
In the above input string - programming, the characters r, g and m occur twice. So, r, g and m are the duplicate characters in the string.
Example 2:
Input
S = “amazing”
Output
Duplicate characters: a
Explanation
In the above input string - amazing, only character a had occurred twice. So, a is the duplicate character in the string.
Approach 1: Brute Force
The brute force approach to finding duplicate characters in a string is a straightforward and simple method that involves iterating through each character of the string and comparing it with every other character in the string and this process continues until all characters in the string have been examined, this method does not involve any advanced data structures or algorithms, making it easy to implement.
Algorithm
Step 1: Take a string as input and convert it to a character array (charArray). Iterate through each character in charArray using an outer loop (index i).
Step 2: If the current character (charArray[i]) is not marked as visited (not equal to '\0'): Use an inner loop (index j) to compare the current character with every other character in the string (starting from the next character after the current one).
Step 3: If a match is found (i.e., charArray[i] == charArray[j]), mark the duplicate character as visited by setting charArray[j] to '\0'. Print the duplicate character (charArray[i]).
Step 4: Break out of the inner loop to avoid printing the same character multiple times. The duplicate characters in the input string are printed.
Filename: BruteForceDuplicateCharacters.java
public class BruteForceDuplicateCharacters {
public static void main(String[] args) {
String input = "programming";
System.out.print("Duplicate characters: ");
findAndPrintDuplicates(input);
}
private static void findAndPrintDuplicates(String input) {
// Convert the string to a char array
char[] charArray = input.toCharArray();
// Iterate through each character
for (int i = 0; i < charArray.length; i++) {
char currentChar = charArray[i];
// Check if the character is not marked as visited
if (currentChar != '\0') {
// Compare it with every other character in the string
for (int j = i + 1; j < charArray.length; j++) {
if (currentChar == charArray[j]) {
// Mark the duplicate character as visited (using '\0')
charArray[j] = '\0';
// Print the duplicate character
System.out.print(currentChar + " ");
break; // Break to avoid printing the same character multiple times
}
}
}
}
}
}
Output:
Duplicate characters: r g m
Time Complexity
The time complexity of the given brute force algorithm for finding duplicate characters in a string is O(n^2), where 'n' is the length of the input string. This is because, for each character in the string, the algorithm uses a nested loop to compare it with every other character in the string. As a result, the number of comparisons grows quadratically with the length of the input string.
Space Complexity
The space complexity of the algorithm is O(1) because it uses only a constant amount of additional space. The primary data structure used is the character array (charArray), which is created based on the input string. The algorithm does not use any additional data structures that scale with the input size.
Approach 2: Using a HashMap
The approach of using a HashMap to find duplicate characters in a string is a more optimized and efficient method compared to the brute force approach. In this algorithm we leverage a data structure, the HashMap, to store and manage the frequency of each character in the input string. The primary advantage of this approach is that it significantly reduces the time complexity compared to the quadratic time complexity of the brute force method.
Algorithm
Step 1: Take a string as input and create an empty HashMap charFrequency to store the frequency of each character.
Step 2: Iterate through the characters in the string, for each character ch in the input string: If ch is not present in the charFrequency HashMap, add an entry to charFrequency with ch as the key and 1 as the value.
Step 3: If ch is already present in the charFrequency HashMap, increment the frequency value for ch by 1.
Step 4: Identify and Print Duplicates: Iterate through the entries in the charFrequency HashMap, for each entry (entry), if the frequency value of the character (entry.getValue()) is greater than 1: print the character (entry.getKey()) as a duplicate. The duplicate characters in the input string are printed.
Filename: HashMapDuplicateCharacters.java
import java.util.HashMap;
import java.util.Map;
public class HashMapDuplicateCharacters {
public static void main(String[] args) {
String input = "programming";
System.out.print("Duplicate characters: ");
findAndPrintDuplicates(input);
}
private static void findAndPrintDuplicates(String input) {
// Create a HashMap to store character frequencies
Map<Character, Integer> charFrequency = new HashMap<>();
// Iterate through each character in the string
for (char ch : input.toCharArray()) {
// Update the frequency in the HashMap
charFrequency.put(ch, charFrequency.getOrDefault(ch, 0) + 1);
}
// Iterate through the HashMap to find duplicates
for (Map.Entry<Character, Integer> entry : charFrequency.entrySet()) {
if (entry.getValue() > 1) {
// Print the duplicate character
System.out.print(entry.getKey() + " ");
}
}
}
}
Output:
Duplicate characters: r g m
Time Complexity
The time complexity of the algorithm using a HashMap to find duplicate characters in a string is O(n), where 'n' is the length of the input string. The key operations involving HashMap, such as put and getOrDefault, have an average time complexity of O(1). Therefore, the overall time complexity is linear with respect to the length of the input string.
Space Complexity
The space complexity of the algorithm is O(c), where 'c' is the number of unique characters in the input string. In the worst case, where all characters are unique, 'c' would be equal to 'n' (length of the input string). In the average case, 'c' would be less than or equal to 'n'.
Approach 3: Using Sorting
The approach using sorting to find duplicate characters in a string is a simple yet effective method that leverages the property of sorted sequences to identify duplicates efficiently. In this approach we involve in converting the string into an array of characters, sorting the array, and then examining consecutive elements to identify duplicates.
Algorithm
Step 1: Take a string as input containing characters and convert the string to a character array (charArray). Sort the character array.
Step 2: Iterate through the sorted array (charArray): For each index i from 1 to charArray.length - 1: If charArray[i] is equal to charArray[i - 1], print charArray[i] as a duplicate. The duplicate characters in the input string are printed.
Filename: SortingDuplicateCharacters.java
import java.util.Arrays;
public class SortingDuplicateCharacters {
public static void main(String[] args) {
// Step 1: Input
String input = "programming";
// Step 2: Convert the string to a character array and sort
char[] charArray = input.toCharArray();
Arrays.sort(charArray);
// Step 3: Identify and Print Duplicates
System.out.print("Duplicate characters: ");
for (int i = 1; i < charArray.length; i++) {
if (charArray[i] == charArray[i - 1]) {
// Print the duplicate character
System.out.print(charArray[i] + " ");
}
}
// Step 4: Output
// Duplicate characters in the input string are printed.
}
}
Output:
Duplicate characters: g m r
Time Complexity
The time complexity of the algorithm using sorting to find duplicate characters in a string is O(n log n), where 'n' is the length of the input string. This is because sorting an array of 'n' elements has an average time complexity of O(n log n).
Space Complexity
The space complexity is O(1), indicating constant space usage. The sorting is typically performed in-place, meaning that it doesn't require additional memory proportional to the size of the input.
Approach 4: Using HashSet
The approach using HashSet to find duplicate characters in a string involves utilizing the unique property of sets to efficiently identify duplicates. HashSet is a collection that does not allow duplicate elements, making it suitable for this purpose.
Algorithm
Step 1: Take a string as input and create an empty HashSet (uniqueCharacters) to store unique characters.
Step 2: Iterate through each character in the input string: Check if the character is already present in the HashSet (uniqueCharacters).
Step 3: If it is present, print the character as a duplicate. If it is not present, add it to the HashSet. The duplicate characters in the input string are printed.
Filename: HashSetDuplicateCharacters.java
import java.util.HashSet;
import java.util.Set;
public class HashSetDuplicateCharacters {
public static void main(String[] args) {
String input = "programming";
System.out.print("Duplicate characters: ");
findAndPrintDuplicates(input);
}
private static void findAndPrintDuplicates(String input) {
// Step 1: Initialize an empty HashSet
Set<Character> uniqueCharacters = new HashSet<>();
// Step 2: Iterate through each character in the input string
for (char ch : input.toCharArray()) {
// Step 3: Check if the character is already present in the HashSet
if (!uniqueCharacters.add(ch)) {
// If present, print the character as a duplicate
System.out.print(ch + " ");
}
}
}
}
Output:
Duplicate characters: r m g
Time Complexity
The time complexity of the algorithm using HashSet to find duplicate characters in a string is O(n), where 'n' is the length of the input string.
Space Complexity
The space complexity is O(c), where 'c' is the number of unique characters in the input string. In the worst case, where all characters are unique, 'c' would be equal to 'n' (length of the input string).