Wednesday, June 14, 2006

Just my explaination on Bucket Sort!

Hey! thought of posting somethin right here! Posting on bucket sort, which i read long time back on net! Its random thought! Might not be 100% perfect but gives you a fair idea on bucket sort! Do ignore ( better post a comment) if you find a mistake!

Here it goes!

Bucket sort is an O(2n) algorithm. Its usually applied for large amount of data. Also it requires an index to the data to be sorted! i.e., it can't be used to sort strings( unless you've got a structure holding the string and an integer index associated with it!). Better say it that a structure having an index, if you have a big list of this structure tosort, better you go for bucket sort. Here you traverse the array only twice and sort the data. The explaination goes as below!

Assumptions for the explaination.
This document, considers only integers as the elements to be sorted! i.e., a simple list of numbers are sorted. If you have an indexed structure, you can easily modify the pseudo code given!

You have an unsorted array of integers. You know the biggest number in the list. Say for our explaination lets assume that the largest number in the list is 10. Then in the algorithm, we maintain an array( a bucket ) of size 10( why? will be cleared in the next section.) Lets take a sample data!

4,3,9,1,8,6,4,10,9,6,7,4,3,8 ( Lets call this array array A)

so we maintain an array of size 10. All the elements of the array are initialized to zero.( This array will be called B)

Now, we traverse the array from begin to end. While we are traversing, we check the value in hand in A, and increment the corresponding index in B! i.e., If we find 4 as the element in the array A, we increment the 4th index value of B by 1. This is done for all the elements of the array A.

After the traversal of Array A, what we do is just print the index of B those many times as its value. i.e., if the index 7 in the array B is 2 we print 7 twice.

Lets populate the array B for the example array A. We'll have a better understanding. ( in this algorithm, we'll consider 1 indexed arrays i.e., arrays start with index 1, if implementing in languages like C, where array index starts with 0, just neglecting the first index will help a lot. So the traversal, we just list the index of B with index 1-10)

Before starting.
A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 0,0,0,0,0,0,0,0,0,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 0,0,1,1,0,0,0,0,0,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 0,0,1,1,0,0,0,0,1,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,1,0,0,0,0,1,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,1,0,0,0,1,1,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,1,0,1,0,1,1,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,2,0,1,0,1,1,0

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,2,0,1,0,1,1,1

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,2,0,1,0,1,2,1

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,2,0,2,0,1,2,1

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,2,0,2,1,1,2,1

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,1,3,0,2,1,1,2,1

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,2,3,0,2,1,1,2,1

A - 4,3,9,1,8,6,4,10,9,6,7,4,3,8
B - 1,0,2,3,0,2,1,2,2,1


So, after the first iteration, we have the B array as 1,0,2,3,0,2,1,2,2,1.

after this, we just print the indexes of the array B as many times as the value contained in that index. i.e.,

1 one time, 2 zero times, 3 two times ...........

that boils down to the sorted array,

1,3,3,4,4,4,6,6,7,8,8,9,9,10

Which is sorted. This will be of real help if your data is huge. No comparisons at all in this sort. Check if your data is in billions and your max element is say 100000. If space is not your constraint, then you can go for this sort easily as an efficient implementation.

Limitations of this sort is,

1.Needs lots of space.
2.Needs an index to sort.

(I remember only these at this point.)

probable pseudo code.

for all the elements in the array A
update the corresponding index in the array B

for all elements in B
print the index as many times as the value contained in it.


A probable C chunk of function would be,

const int arraySizeOfA = 1287381246783 ;
const int maximumNumberInArrayA = 10000;

int A[ arraySizeOfA ];
int B[ maximumNumberInArrayA + 1 ];

int i,j ;

for( j = 0; j <= maximumNumberInArrayA; ++j )
B[ j ] = 0;

for( i = 0; i < arraySizeOfA; ++i )
B[A[i]]++;

for( j = 1; j < maximumNumberInArrayA; ++j )
for( i = 0; i < B[ j ]; ++i )
printf("%d", j) ;

still have doubts?? contact me!