Bucket Sort in Java
Bucket sort is also called bin sort. Bucket sort first puts the elements of the array or list into different buckets. The first bucket contains elements of the smallest value. The second bucket contains elements that are greater than elements that are present in the first bucket. Similarly, the third bucket contains elements that are greater than the second bucket’s elements, and so on. Thus, buckets are sorted. After that, one by one, each bucket is taken and sorted. Bucket sort is specifically used when input numbers are distributed in a uniform manner over a given range.
In this section, we will implement bucket sort approach in a Java program.
Algorithm
Step I: Create an empty bucket.
Step II: Distribute elements of the input array in such a way that buckets are sorted automatically.
Step III: Iterative over each bucket and check whether it is empty or not. If the bucket is empty, move to the next bucket. If the bucket is not empty, sort the elements present in the array.
Step IV: Concatenate the elements of buckets one by one. The array or list thus formed is the sorted array or list.
Pseudo Code
Java Program
The following code implements Bucket sort using the pseudo-code defined above.
FileName: BucketSortExample.java
Output:
Explanation: The size of the input array is 8. Therefore, the first element, 0.697, goes in the 5th bucket [0.697 * 8 = 5.576 => 5 (taking only the integral part)]. The second element goes in the bucket whose index is the integer part of 3.72 (0.465 * 8 = 3.72), which is 3. In the same way, the other elements can also be put into different buckets. The following diagram illustrates the same.
After putting every number in the buckets, the static method sort() is called on each bucket to sort the elements fetched by the buckets.
As per the pseudo-code, the last part is concatenation, which is nothing but re-writing the input array a[] from starting to ending using the values stored in the buckets. The last for-loop of the method bucketSorting() does the same. The final sorted array is shown below.
Sorting numbers having non-zero integer part
So far, we have only sorted numbers that had integer parts as 0 (see the input of the above example). However, the above program fails when numbers like 1.234 are given. To handle such numbers some modification has to be made in the above program. But first of all, observe the algorithm to sort numbers that have a non-zero integer part.
Algorithm
Step I: Calculate the minimum and maximum elements of the input array.
Step II: Find the range of each bucket.
range = (maximum – minimum) / N
Here, N represents the total number of buckets.
Step III: Make N buckets with the help range calculate above.
Step IV: Spread the elements of the array into these buckets, i.e.,
bucketIdx = ( array[j] – minimum ) / range
Step V: Now, do the sorting of each of the bucket one by one
Step VI: Collect all the sorted elements from the buckets one by one and do the concatenation to form the sorted array.
Now observe the program that is based on the above algorithm.
FileName: BucketSortExample.java
Output:
Explanation: The maximum value element is 8.0934, and the minimum value element is 0.865. Therefore, the range of each of the bucket is:
(integer part of (8.0934) – integer part of (0.865)) / 8 => (8 – 0) / 8 = 1.
Thus, the first element of the input array, 2.697 goes to the integer part of ((2.697 – 0.865) / 1) => 1. Thus, the bucket sitting at index 1 stores the number 2.697. Similarly, the index of the buckets can be calculated for other numbers too.
After determining the index of the bucket for every number, the rest of the code behaves in the same way it is behaving in the previous example.
Analysis of the bucket sort
Bucket Sort works on the scattering and gathering approach. Therefore, it is also known as the distribution sort. The bucket sort is sometimes also called the cousin of radix sort, as this sorting algorithm support moving from the most to least significant digits of a number to do the sorting work.
Time Complexity
The time complexity of bucket sort depends on the distribution of the elements. The uniform distribution leads to a better time-complexity when compared with the non-uniform distribution of elements.
The worst time complexity occurs when all the elements are stored in a single container. The time complexity in this scenario is O(n^2), where n is the total number of elements present in the array.
The best time complexity occurs when there is a uniform distribution of elements in the buckets, which means every bucket contains almost an equal number of elements. The time complexity, in this case, is O(n + p), where O(p) is the time complexity of sorting the elements present in the bucket. In this case, we have to use an algorithm that has the linear time complexity to sort the elements present in the bucket. O(n) is the time complexity for the creation of the buckets.
The average time complexity occurs when elements are randomly distributed in the input array. Thus, every bucket does not contain an equal number of elements. The bucket sort algorithm in such a scenario gives the time complexity as O(n + n^2/p + p), where n is the total number of elements and p is the total number of buckets.
Note that conventionally, insertion sort is used to sort the elements present in a bucket.
Space Complexity
Suppose one has created k buckets to do the sorting of a given array. Therefore, till this point, the space complexity is O(k). Now, the maximum number of elements these buckets can hold is the total number of elements present in the input array. If n represents the total number of elements present in the array, then the space complexity of the bucket sort algorithm is O(n + k).
Conclusion
The bucket sort algorithm cannot be used on any given data set. One has to analyze whether the data is uniformly distributed or not. If the data is not uniformly distributed, this is not the algorithm that should be used to accomplish the sorting process.
Note that counting sort gives a glimpse of counting sort. Think of a scenario when each bucket is holding only one element.