Arranging Lists - Part II

(Divide-and-Conquer Algorithms)

We saw a nice technique for solving problems: the Greedy approach. It is time for another powerful approach to solving problems: the divide-and-conquer technique. The basic strategy is to take a large size problem and break it down into a number of smaller problems; solve them; and, then combine the individual results to get the complete solution. These algorithms are also quite useful when we can put more than one computer on the job as each of the smaller problems can be solved on individual computers and then recombined.

Let us arrange the same list of eight numbers, from the basic algorithms section, in ascending order.

Practice questions

Search the web and make a list of five problems solved using divide-and-conquer algorithms.
What happens to Quick Sort if the input list is already in correct ascending order?
Search the web to find two weaknesses of Quick Sort.

Merge Sort

Merge Sort is the most intuitive divide-and-conquer algorithm and everyone without exception comes up with it when sorting problem is posed as follows. P number of people are together asked to arrange a list of N items in order. The solution: divide the N items equally among P people; each one arranges his/her share of roughly N/P items; finally, combine the individually sorted lists.

Merge Sort takes this idea to the extreme. Divide the original list into sublists (sometimes called runs) such that each sublist contains only one item (in which case, it is already sorted!) or two items (in which case, one comparison sorts the sublist).

Method I: Merge Sort

Input: List of items L

Let S = size of list L
if S = 1
then
Output L (no need to sort a list of size 1!)
else
1. Divide the list equally into two halves and call them F and G
2. call MergeSort on F
3. call MergeSort on G
4. Merge the two individually sorted lists F and G

(end of algorithm)

Output: Ordered List L

Merge procedure

Input: Two lists F and G

Let $L_f$ = size of F, $L_g$ = size of G
Make a new output list M = empty
if $L_f > L_g$
then
$l = L_f$
else
$l = L_g$
repeat until F or G is empty
1. if first item of F < first item of G
  then
  1. Add first item of F to M
  2. Delete the item from F
  else
  1. Add first item of G to M
  2. Delete the item from G

Output: The sorted output list M

Note the two highlighted lines in the algorithm for Merge Sort. These are the statements that solve the same sorting problem but on smaller lists and such statements are always found in divide-and-conquer algorithms.

Did you know ...

... that John von Neumann invented the Merge Sort in 1945?

Its analysis was first published by Goldstine and von Neumann in 1948.

von Neumann was not only the father of modern computing, developing an architecture that is the basis for even the modern computers that we build today, but also a mathematical genius and a child prodigy.

When he was sent to study advanced calculus, his teacher was so astounded by von Neumann's talent that he was brought to tears on their first meeting.

Nobel Laureate, Hans Bethe, wondered, "whether a brain like von Neumann's does not indicate a superior species".

On a lighter note, von Neumann liked to eat and drink and his wife said that "he could count everything except calories"!

How does Merge Sort work?

Divide-and-conquer algorithms are not as straightforward to analyse as the earlier, simpler ones. We first start Merge Sort with the single input list of 8 items. As S is not equal to 1 in Step 1, the list gets split into two lists F and G, each of size 4 items. We are doing Merge Sort on F in Step 2 (highlighted). Again, for this list, S is not equal to 1 and it gets split into two lists of size 2 items each. This happens until we have 8 sublists each of size 1 (shown below).

52
(A)

12
(B)

17
(C)

49
(D)

73
(E)

24
(F)

92
(G)

62
(H)

Once these eight sublists are formed, the Merge procedure (Step 4 of Merge Sort) comes into play. It first merges the pairs of lists (A, B), (C, D), (E, F), and (G, H). The procedure Merge procedure is followed for merging the sublists. At the end of the first Merge round, we get the following four lists:

12, 52
(AB)

17, 49
(CD)

24, 73
(EF)

62, 92
(GH)

In the next Merge round, Lists AB and CD are merged and also EF and GH.

12, 17, 49, 52
(ABCD)

24, 62, 73, 92
(EFGH)

In the third and final round of Merge, these two lists are merged to give the final list in ascending order.

How many comparisons did we do?

Notice carefully that all the comparisons are done only in the Merge procedure and nowhere else! As we split the input list into 8 sublists of one item each, there was no need for any comparisons.

Let us count the number of comparisons in Merge procedure. Each item is compared once when we merge the sublist in which it is present, with another sublist. Each item is merged into three sublists. For example, the item 17 is initially in (C). It is merged once to create (CD) and again for (ABCD) and then for the final list. Therefore, each item is compared three times or $\log_2{8}$ times. The total number of comparisons is \[3\times8=\fbox{24}\] comparisons. Make sure you understand this argument - it is critical!

It looks as if Merge Sort took more comparisons for our 8 item list. True, but see what happens as we work with larger input lists.

Merge sort takes $1000\times\log_2{1000} = 10,000$ comparisons for an input list of size 1000. Compare this against how many we needed for Selection or Bubble Sort!

What about a list of 100,000 items? If we approximate $\log_2{100,000} = 16$, then Merge Sort makes 1,600,000 comparisons for sorting the list. Bubble Sort took nearly 5 billion comparisons! So,

Merge Sort is more than 3000 times faster.

Logarithm Function

In mathematics, the logarithm function is defined as \[ \log_2 N = k, \mbox{if } 2^k = N\] Also, \[2^k = 2\times2\times2\times\ldots\times2, k \mbox{ times.}\] Given the above definitions, it is easy to see that $8 = 2\times2\times2 = 2^3$, and therefore, $\log_2{8} = 3$.

It also follows that if we divide N by 2, $\log_2{N}$ times, we get 1. Check: 8/2 = 4; 4/2 = 2; 2/2 = 1. After 3 divisions by 2, 8 becomes 1.

Some important logarithms

$\log_2{100} \approx 7$
$\log_2{500} \approx 9$
$\log_2{1000} \approx 10$
$\log_2{10000} \approx 14$
$\log_2{100,000} \approx 16$
$\log_2{1,000,000} \approx 20$
$\log_2{1,000,000,000} \approx 30$

$\approx$ symbol means "approximately equal to"

Remember that $\log_2$ of a number increases by 10 whenever the number is multiplied by 1000.

Quick Sort

Now we come to one of the most beautiful algorithms in Computer Science and one that is considered a classic, the Quick Sort. Quick Sort is also a divide-and-conquer algorithm that splits a given list of items into smaller lists and individually sorts them. It is similar to Merge Sort in the overall strategy but differs from it in a significant way.

Quick and Merge sorts are complementary in some ways. Merge Sort spends almost no time in splitting the list into sublists: it simply divides the list into two equal halves each time. Its comparisons occur in the Merge step which is time consuming. Quick Sort spends a lot of time in splitting the list into two parts such that one part contains small items and the other, large items. Its merge step is trivial: in fact, there is no real need for a merge.

The basic idea behind Quick Sort is to define a pivot item. All those items smaller than the pivot are moved to the left and those that are larger, to the right. Once the moves are completed, the list is split into two lists at the pivot. As the two sublists are already split in such a way that one contains only small items and the other, larger, there is no need for any merge step. The algorithm is given below. Note again the divide-and-conquer nature of the algorithm in Steps 3.3 and 3.4 in the Quick Sort algorithm.

Method II: Quick Sort

Input: List of items L

Let left = 0
Let right = length of the List L
if left < right
then
1. Let pivot = position for splitting L
2. Split list at pivot into F and S
3. call Quick Sort on F
4. call Quick Sort on S

(end of algorithm)
Output: Ordered List L

Partition procedure

Input: One list L
left: the leftmost position
right: the rightmost position

Let pivot = first item in L
Let llimit = left
Let hlimit = right
repeat forever
1. repeat while llimit item in L < pivot
2. repeat while rlimit item in L > pivot
3. if llimit >= rlimit
  output rlimit
  end the algorithm.
4. swap items at llimit and rlimit positions
(end of partition procedure)

How does Quick Sort work?

So, let us see how Quick Sort works. We first write the algorithm and start with the same list of eight numbers again.

The simplest version of Quick Sort chooses the first item as the pivot. Here the pivot is 52. Initially, Quick Sort is called with left as 0 and right as 7. When the partition procedure is called, llimit is initially 0 and rlimit is 7. llimit stays at 0 because the first element is the pivot and pivot is not less than pivot! rlimit decrements to 5 because 62 and 92 are both greater than the pivot (52) while 24 is smaller. Then Step 3.4 causes 52 to swap with 24 and we get

where the green colour is the llimit and pink colour is rlimit and the pivot remains as 52.

Step 3 starts again. Now, llimit stops at 4 in Step 3.1 because 12, 17 and 49 are smaller than the pivot while 73 is not. rlimit does not move and remains at 5 because 52 is itself the pivot. Step 3.4 swaps 73 and 52. The resulting list is

Step 3 restarts and now llimit stops at 5 where 73 is larger than the pivot 52. rlimit decrements to 4. Now, llimit is greater than rlimit and partition procedure returns the split position as 4.

Quick Sort splits the original list into two sublists F and S as shown below. You can see that the list F contains only numbers smaller than 52 while S comprises entirely of numbers larger than 52.

(F)

(G)

The procedure continues now with splitting list F with pivot as 24 and also splitting list G with pivot as 73. At the end, the entire list is sorted.

Quick Sort, as the name tells you, is really fast! It is almost 5 times faster than even Merge Sort for lists containing 2,00,000 items and it gets progressively better for larger lists. Quick Sort is the most widely used sorting algorithm in the world.

How many comparisons does Quick Sort do?

The analysis of Quick Sort is almost the same as that of Merge Sort. Note that there is no merge step and no comparisons are made except in the partition procedure. The partitioning splits the original list $\log_2{N}$ times. In each split, the items are compared once each to result in at most $\log_2{N}$ comparisons per item. Therefore the total number of comparisons is $N\times\log_2{N}$.

On the sidelines of Quick Sort

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”

Thus spoke the legendary computer scientist, Charles Antony Richard Hoare, the inventor of Quick Sort in 1959, while addressing a bunch of professional programmers.

C. A. R. Hoare is a British computer scientist and a Turing Award (computer science's equivalent of the Nobel Prize) winner in 1980 for his contributions to the design of programming languages.

Hoare is credited with Hoare Logic which allows formalization of interacting processes.