Find Duplicates Using Bit Array in Java
Finding duplicates in an array is a common programming problem that can be solved using various approaches. One efficient method is to use a bit array to keep track of the presence or absence of elements. In Java, you can use a BitSet to implement a bit array. A bit array, also known as a bitset or bit vector, is a data structure that represents an array of bits, where each bit can be either 0 or 1. It is a compact way to store information about the presence or absence of elements in a set.
Approach 1: Simple Bit Array Approach
The Bit Array approach for finding duplicates in an array is a technique that combines the principles of set-based operations and bitwise manipulation. The approach involves initializing a data structure, often a BitSet or boolean array, to represent the presence or absence of elements within a specified range. By iterating through the array and manipulating the bits or boolean values at corresponding positions, the algorithm efficiently identifies duplicate elements.
Algorithm
Step 1: Create a BitSet or boolean array named bitSet to represent the presence or absence of elements. Create an empty list named duplicates to store the duplicate elements.
Step 2: For each element value in the input array arr, calculate the adjusted index (index = value - 1) to map the array values to the zero-based index of the data structure.
Step 3: Identify and Store Duplicates: For each element value, if bitSet at index position is set then add value to the duplicates list.
Step 4: Else, set the bit at index position in bitSet. Return the duplicates list containing the duplicate elements.
Filename: FindDuplicatesUsingBitArray.java
import java.util.BitSet;
import java.util.ArrayList;
import java.util.List;
public class FindDuplicatesUsingBitArray {
public static List<Integer> findAndStoreDuplicates(int[] arr) {
BitSet bitSet = new BitSet();
List<Integer> duplicates = new ArrayList<>();
for (int value : arr) {
int position = value - 1; // Adjust to 0-based index
if (bitSet.get(position)) {
// Duplicate found, add to the output list
duplicates.add(value);
} else {
// Set the bit corresponding to the current element
bitSet.set(position);
}
}
return duplicates;
}
public static void main(String[] args) {
int[] arr = {1, 2, 3, 4, 5, 2, 7, 8, 3, 9, 10, 1};
List<Integer> duplicates = findAndStoreDuplicates(arr);
System.out.println("Duplicates in the array: " + duplicates);
}
}
Output:
Duplicates in the array: [2, 3, 1]
Time Complexity
The time complexity of the Set or Bit Array approach for finding duplicates in an array is O(N), where N is the number of elements in the array. This linear time complexity arises from iterating through the array once and performing constant-time operations for each element.
Space Complexity
The space complexity of the Set or Bit Array approach is O(R), where R is the range of non-negative integers in the array. This complexity arises from the use of a BitSet or boolean array to represent the presence or absence of elements.
Approach 2: Optimized Bit Array Approach
The Optimized Bit Array approach for finding duplicates in an array introduces a more memory-efficient strategy by using an array, BIT_ARRAY, to represent the presence or absence of elements. Each index of BIT_ARRAY corresponds to 32 elements, allowing for the efficient addressing of a larger range of values.
Algorithm
Step 1: Create an integer array named BIT_ARRAY with a size of 1000, initialized to 0. Create an empty list named duplicates to store duplicate elements.
Step 2: Iterate through the array, for each element ARR[i] in the input array arr, Calculate the index in BIT_ARRAY corresponding to the address of ARR[i] (index = ARR[i] / 32).
Step 3: Find the bit within the index (BIT_NO) representing the address of ARR[i] (BIT_NO = ARR[i] & 31).
Step 4: Check for duplicates, retrieve the value at BIT_ARRAY[2^BIT_NO] and check if it is equal to 1. If true, the current element is a duplicate, so append ARR[i] to duplicates.
Step 5: If false, set BIT_ARRAY[2^BIT_NO] to 1. Return the list duplicates containing the duplicate elements.
Filename: FindDuplicatesUsingOptimizedBitArray.java
import java.util.ArrayList;
import java.util.List;
public class FindDuplicatesUsingOptimizedBitArray {
public static List<Integer> findAndStoreDuplicates(int[] arr) {
int[] BIT_ARRAY = new int[1000];
List<Integer> duplicates = new ArrayList<>();
for (int i = 0; i < arr.length; i++) {
int index = arr[i] / 32;
int bitNo = arr[i] & 31;
int val = BIT_ARRAY[index] & (1 << bitNo);
if (val != 0) {
// Duplicate found, add to the output list
duplicates.add(arr[i]);
} else {
// Set the bit corresponding to the current element
BIT_ARRAY[index] |= (1 << bitNo);
}
}
return duplicates;
}
public static void main(String[] args) {
int[] arr = {1, 2, 3, 4, 5, 2, 7, 8, 3, 9, 10, 1};
List<Integer> duplicates = findAndStoreDuplicates(arr);
System.out.println("Duplicates in the array: " + duplicates);
}
}
Output:
Duplicates in the array: [2, 3, 1]
Time Complexity
The time complexity of the Optimized Set or Bit Array approach for finding duplicates in an array is O(N), where N is the number of elements in the array. This linear time complexity arises from iterating through the array once, and the operations performed for each element are constant time.
Space Complexity
The space complexity is O(1), which means it is constant. The primary data structure used is the BIT_ARRAY, which has a fixed size of 1000. The size of this array is independent of the size of the input array arr. As a result, the space required remains constant, making the algorithm space-efficient.
Approach 3: BitSet and HashSet
This approach involves maintaining a BitSet (or Bit Array) to mark the presence or absence of elements in the array. The BitSet is used to identify and store unique duplicates efficiently. The algorithm iterates through the array, checks for duplicates using the BitSet, and stores unique duplicates in a HashSet. Finally, the unique duplicates are printed.
Algorithm
Step 1: Initialize Bit Array: Create an instance of the DuplicateFinder class, which initializes a bit array (bitArray) with a size based on the given array's maximum possible value.
Step 2: Create a HashSet named Duplicates to store unique duplicates found during the iteration.
Step 3: Iterate Through the Input Array: For each element inputArray[i] in the given input array: Calculate currentNumber as inputArray[i] - 1 to adjust for 1-based indexing.
Step 4: Check if the corresponding bit for currentNumber is already set in the bit array using isBitSet method. If set, add currentNumber + 1 to the Duplicates HashSet. If not set, set the corresponding bit using the setBit method.
Step 5: Print the contents of the Duplicates HashSet, which contains unique duplicate values. In the main method, create an example input array (inputArray). Call the findAndPrintDuplicates method with the input array.
Filename: DuplicateFinder.java
import java.util.HashSet;
public class DuplicateFinder {
private int[] bitArray;
public DuplicateFinder(int arraySize) {
bitArray = new int[(arraySize >> 5) + 1];
}
public boolean isBitSet(int position) {
int index = (position >> 5);
int bitNumber = (position & 0x1F);
return (bitArray[index] & (1 << bitNumber)) != 0;
}
public void setBit(int position) {
int index = (position >> 5);
int bitNumber = (position & 0x1F);
bitArray[index] |= (1 << bitNumber);
}
public static void findAndPrintDuplicates(int[] inputArray) {
DuplicateFinder duplicateFinder = new DuplicateFinder(320000);
HashSet<Integer> Duplicates = new HashSet<>();
for (int i = 0; i < inputArray.length; i++) {
int currentNumber = inputArray[i] - 1;
if (duplicateFinder.isBitSet(currentNumber)) {
// Duplicate found, add to the set
Duplicates.add(currentNumber + 1);
} else {
duplicateFinder.setBit(currentNumber);
}
}
// Print the duplicates
System.out.print("Duplicates in the array: ");
for (int duplicate : Duplicates) {
System.out.print(duplicate + " ");
}
}
public static void main(String[] args) {
int[] inputArray = {1, 6, 4, 10, 9, 5, 4, 5, 6, 4};
findAndPrintDuplicates(inputArray);
}
}
Output:
Duplicates in the array: 4 5 6
Time Complexity
The time complexity of the provided algorithm is O(N), where N is the size of the input array. The for loop iterates through each element in the input array once, performing constant-time operations within the loop. The operations inside the loop involve bitwise operations (isBitSet, setBit), which have constant time complexity.
Space Complexity
The space complexity is primarily determined by the size of the bit array. The space complexity is O(M), where M is the maximum possible value in the input array. The HashSet Duplicates also contributes to the space complexity, but its size is determined by the number of unique duplicates, which is usually smaller than the size of the input array.
Conclusion
In conclusion, the Bit Array approach offers an elegant solution to the problem of finding duplicates, particularly when memory efficiency is crucial. It strikes a balance between time complexity and space complexity, making it well-suited for scenarios where a compact representation of element presence is required.