Sorting

By Gareth McCaughan

Converted to HTML by Steven Singer

This document is a copy of a set of postings Gareth McCaughan made on comp.sys.acorn.programmer in September 1996. Apart from formatting changes, and changes to the text to reflect the changes to the formatting, this is the original articles just joined together.

I. An overview of sorting methods

This is the first of four parts of a long and boring article about sorting. It's just a brief summary of most of the most important sorting algorithms.

For a pretty full account of this sort of thing, read volume 3 of Knuth. I'm just going to sketch things.

In all that follows we have an array of N items, numbered 1..N or 0..N-1 (whichever is more convenient), and arranged sequentially. (Sometimes linked lists are more efficient; not usually.)

The items could be anything, and the criterion for being "less than" could be anything (more or less). Depending on what these are, it might be very expensive to compare two items, or it might be very expensive to move an item from one place to another. Or both :-(.

In measuring the efficiency of a sorting technique, it's usual to say something like "it takes time of order N^2". This means that for large enough N the amount of work done by the sorting algorithm looks like some constant times N^2, plus smaller things. This work is made up partly of comparisons and partly of actually moving data around; in different applications these will have different relative costs. (So no one set of figures will be good enough for all cases.)

Here's a little summary of each of several important methods. I'll have more to say later.

Bubble sort

DON'T USE BUBBLE SORT. Everyone learns it because it's easy to write a bubble sort program, so it's become popular with the writers of textbooks. Unfortunately it's a very very bad algorithm. DON'T USE IT.

I'm certainly not going to tell you how it works. You don't need to know. DON'T USE IT.

Insertion sort

For each n:
- put the n'th item in the right place among items 1..n, by comparing it with each of the first n-1 in turn.

This takes time of order N^2 in the worst case, because if the list is in reverse order then you have to do a lot of comparing and moving things around. (Work it out.) It's actually of order N^2 on average too.

Insertion sort is very easy to program, and because of its simplicity it works well when N is very small. For moderate N it is one of the worst plausible algorithms.

Shellsort

The trouble with insertion sort is that everything it does is "local", so that if there are a lot of things in completely the wrong place insertion behaves very badly. So, a guy called Shell invented the following algorithm:

for d in some decreasing list ending with 1
- for each k in 0..d-1
  - do insertion sort on items k,k+d,k+2d,...

If you're careful, the code for this can be very nearly as short as that for insertion sort; and if you choose your list of values for d carefully it can perform very much better.

No one is quite sure what the theoretical performance of shellsort is. If you choose things carefully, for typical data the time it takes seems to go like N^1.3 or something.

Mergesort

If you're only sorting two items, do the obvious thing. Otherwise,
- sort the first half of the list using mergesort;
- sort the second half of the list using mergesort;
- go through the two lists in parallel, assembling the final sorted list.

Mergesort takes time of order N log N if you do it right. Unfortunately, it also requires quite a lot of extra storage.

Quicksort

Pick an item x somewhere in the middle of your range.
Divide your list into items <= x and items >x
Sort each of these sublists by quicksort.

Quicksort is for most applications the fastest known sorting method. It requires a small amount of extra storage, to keep track of the recursion. Its worst-case performance is awful, but if you do things right this more or less never happens. For typical data it takes time of order N log N.

Heapsort

The algorithm for heapsort is more complicated than any of those I've discussed, and won't fit into the amount of space I'm willing to use. You can find it discussed in any number of textbooks.

Heapsort, like quicksort and mergesort, takes time of order N log N. Unlike quicksort it always takes this long (there isn't that awful worst case). Unfortunately, it's usually slower than quicksort by a moderate-sized constant factor (2 or thereabouts). It also requires no extra storage at all.

Radix sort (trivial case)

This only works when your data are numbers in a restricted range, say 0..K-1.

Have K bins, numbered 0..K-1.
For each item of data,
- put it in the right bin.
Read out your items in order, one bin at a time.

This is clearly blindingly fast: of order N. Unfortunately, it's not common for K to be small enough to do this.

Radix sort (non-trivial case)

This only works where your data are numbers in a restricted range, but the range can be much larger. Say the range is 0..K^r-1, where K is small enough for the trivial case.

Do trivial radix sort on your numbers, looking only at the least significant digit (in base K).
Now do it again, looking at the next digit.
...
Finally, do it using the most significant digit.

This is very efficient, provided you have some way of adding numbers to bins without moving them around. Linked lists are good for this.

That concludes my brief outline of sorting methods. There are plenty more, but you are unlikely ever to need anything not in the list above.

II. An experiment

This is the second of four parts of a long and boring article about sorting. It describes an experiment I did to get some timings for several sorting routines, and briefly summarises the results.

The obvious experiment, really. For each sorting method, and for a range of values of N, make up an array of items and sort them. However, the performance of a sorting method can depend heavily on how the items are arranged (for instance, a really stupid implementation of quicksort may do really badly if it happens to be given an array that's already sorted!), so I did this for a variety of datasets for each N.

All my data items were ordinary 4-byte integers. This was just to make my job a bit easier. It would be interesting to do the same thing again, using some other kinds of data: this might make a significant difference.

Ascending order: Obvious.
Descending order: Obvious.
Random order: Completely random, that is. Just output from rand().
"Few swaps": Start with ascending order and do about sqrt(N) swaps.
"Small deviations": Nth item differs from N by random number of order sqrt(N).
"Shuffled": All the even-index numbers are smaller than all the odd numbers. Each of those sub-arrays is itself "shuffled".

The sorting routines I used were as follows. Ones written so as to allow sorting things other than integers are emphasised. Of course they take some performance loss for this.

A straightforward insertion sort.
A shellsort where the values of "d" go ...,16,8,4,2,1.
A shellsort where the values of "d" go ...,31,15,7,3,1.
A shellsort where the values of "d" go ...,40,13,4,1.
A fairly badly implemented quicksort.
A sensibly implemented quicksort. (Tuned experimentally.)
A "generic" quicksort capable of sorting any data, not just numbers.
A not-very-well-written heapsort.
A straightforward but quite highly tuned mergesort.
The Shared C Library's "qsort()" function.
The OS_HeapSort SWI, using the sorting-integers special case.
The OS_HeapSort SWI, using the generic case and a trivial comparator.

I'll say more about tuning quicksort later.

I didn't do a radix sort, because that really requires space to add link fields, and I wanted to work with simple arrays of integers.

The results

The complete figures are in the next part of this article. Here's a summary of the most interesting points.

Insertion sort is really efficient if your data already happen to be in ascending order. In the "small deviations" case, it's pretty good up to N=10000. Otherwise, it's a big loser.
If you're going to use shellsort, it really matters what sequence of "d" values you use. The three above are in increasing order of quality, and the differences are large. Do NOT give in to the temptation to use the powers of 2, even though it makes your program a couple of characters shorter: with "Shuffled" data that makes it run slower than insertion sort, and with any data it performs substantially worse than a better-chosen shellsort.
Shellsort is actually quite good: typically within a factor of 2 of the best of the routines.
OS_HeapSort is worse than my own (not very well-written, and in C rather than machine code) heapsort routine, by a factor of about 1.5 or worse.
On the other hand, OS_HeapSort doesn't suffer at all if required to sort integers using its generic code. This suggests to me that the special cases are actually implemented by calling the general case using suitable comparison functions. No wonder it's slow.
In any case, heapsort is worse than mergesort and quicksort, and not much better than shellsort, and the program is more complicated than any of those. It's probably never the best choice.
Mergesort is consistently outperformed by quicksort, though usually not by a large margin. (I have reason to believe that for certain kinds of data mergesort actually performs better than quicksort; specifically, when comparisons are expensive.)
The naive quicksort is significantly worse than the tuned one, but still better than everything else.
In more or less every case the tuned quicksort was the winner.
The "generic" quicksort pays a price for its generality: it takes about 1.7 times as long as the integer-only one in most cases.
qsort() is TERRIBLE. Looking at the code reveals that in my version of the Shared C Library it's actually a shellsort; and it pays a high price for its generality. For 300,000 random numbers it was slower than my generic quicksort by about a factor of 7.

For sorting 300,000 completely random integers, the ranking was as follows. The numbers are timings, in seconds. Blank lines indicate big drops in performance.

   Tuned quicksort     3.02
   Naive quicksort     3.46

   Generic quicksort   5.22
   Heapsort            6.20
   Mergesort           7.40

   OS_HeapSort         9.52
   Shellsort (good)    9.58
   Shellsort (OKish)  10.08

   qsort()            35.76
   Shellsort (bad)    47.67

   Insertion sort      huge (on the order of 20000, at a guess)

III. The results in full

This is the third of four parts of a long and boring article about sorting. It gives, in full gory horrible detail, the results of the experiment described in the second part.

Large datasets

Firstly, numbers obtained by running each sorting routine once on each sort of dataset. Small numbers should be taken with a pinch of salt for obvious reasons. All numbers are in seconds; blanks indicate huge times.

I'm afraid the figures for generic OS_HeapSort aren't here. That's because the experiment that produced this output didn't include that. Sorry. The results always seem to be within a few centiseconds of the non-generic OS_Heapsort anyway: see the previous part of this article.

1. Entirely random

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.01   0.11   1.16  14.09
Shellsort-2     0.00   0.01   0.01   0.06   0.34   1.52   9.36  47.67
Shellsort-2a    0.00   0.00   0.01   0.04   0.17   0.64   2.59  10.08
Shellsort-3     0.01   0.00   0.01   0.03   0.16   0.59   2.41   9.58
Mergesort       0.00   0.00   0.02   0.04   0.17   0.60   2.27   7.40
Quick(Naive)    0.00   0.00   0.00   0.02   0.08   0.29   1.06   3.46
Quick(M3,18I)   0.00   0.00   0.00   0.02   0.08   0.24   0.93   3.02
Quick(Gen.)     0.00   0.00   0.01   0.03   0.12   0.42   1.60   5.22
qsort()         0.00   0.00   0.04   0.13   0.54   2.08   8.79  35.76
OS_HeapSort     0.00   0.01   0.01   0.05   0.20   0.71   2.75   9.52
Heapsort        0.00   0.00   0.00   0.03   0.13   0.43   1.75   6.20

2. Ascending order

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.00   0.01   0.00   0.01   0.02   0.09   0.26
Shellsort-2     0.00   0.00   0.01   0.03   0.11   0.38   1.44   4.77
Shellsort-2a    0.00   0.00   0.01   0.02   0.10   0.35   1.35   4.53
Shellsort-3     0.00   0.00   0.00   0.01   0.06   0.23   0.89   2.96
Mergesort       0.00   0.01   0.01   0.02   0.10   0.34   1.25   4.03
Quick(Naive)    0.00   0.00   0.01   0.01   0.05   0.19   0.64   2.23
Quick(M3,18I)   0.00   0.00   0.00   0.01   0.04   0.14   0.57   1.82
Quick(Gen.)     0.00   0.00   0.01   0.02   0.08   0.26   1.03   3.27
qsort()         0.00   0.00   0.01   0.05   0.17   0.57   2.17   7.19
OS_HeapSort     0.00   0.00   0.02   0.05   0.20   0.64   2.40   7.86
Heapsort        0.00   0.01   0.00   0.02   0.11   0.38   1.38   4.52

3. Descending order

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.01   0.02   0.23   2.40  28.36
Shellsort-2     0.00   0.01   0.01   0.03   0.13   0.44   1.68   5.58
Shellsort-2a    0.00   0.00   0.01   0.03   0.12   0.40   1.57   5.24
Shellsort-3     0.00   0.00   0.00   0.03   0.09   0.32   1.18   3.91
Mergesort       0.00   0.00   0.01   0.04   0.14   0.48   1.81   5.82
Quick(Naive)    0.00   0.01   0.00   0.02   0.06   0.20   0.67   2.31
Quick(M3,18I)   0.00   0.00   0.00   0.02   0.05   0.16   0.60   1.90
Quick(Gen.)     0.00   0.00   0.01   0.02   0.08   0.28   1.06   3.35
qsort()         0.00   0.00   0.02   0.08   0.29   1.01   3.69  12.11
OS_HeapSort     0.00   0.00   0.02   0.04   0.18   0.61   2.28   7.56
Heapsort        0.00   0.00   0.01   0.03   0.11   0.36   1.33   4.39

4. Few swaps

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.00   0.01   0.05   0.32   1.67  11.10  56.22
Shellsort-2     0.00   0.00   0.01   0.03   0.16   0.61   2.54   7.91
Shellsort-2a    0.00   0.00   0.00   0.03   0.13   0.46   1.74   5.93
Shellsort-3     0.00   0.01   0.00   0.02   0.10   0.34   1.35   4.56
Mergesort       0.00   0.00   0.01   0.04   0.14   0.50   1.83   5.91
Quick(Naive)    0.00   0.00   0.01   0.02   0.05   0.19   0.65   2.23
Quick(M3,18I)   0.00   0.00   0.00   0.01   0.04   0.15   0.58   1.83
Quick(Gen.)     0.00   0.00   0.01   0.02   0.07   0.26   1.03   3.27
qsort()         0.00   0.00   0.03   0.08   0.33   1.12   4.40  15.54
OS_HeapSort     0.00   0.01   0.01   0.05   0.19   0.65   2.41   7.88
Heapsort        0.00   0.00   0.00   0.03   0.11   0.38   1.38   4.52

5. Small deviations

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.00   0.00   0.01   0.08   0.40   2.40  12.37
Shellsort-2     0.00   0.00   0.01   0.03   0.13   0.46   1.83   6.38
Shellsort-2a    0.00   0.00   0.01   0.03   0.11   0.41   1.57   5.30
Shellsort-3     0.00   0.00   0.00   0.02   0.09   0.30   1.16   3.93
Mergesort       0.00   0.00   0.01   0.04   0.13   0.44   1.68   5.47
Quick(Naive)    0.00   0.00   0.01   0.02   0.07   0.23   0.88   2.87
Quick(M3,18I)   0.00   0.00   0.01   0.02   0.07   0.21   0.77   2.52
Quick(Gen.)     0.00   0.01   0.01   0.02   0.11   0.37   1.36   4.47
qsort()         0.00   0.00   0.02   0.07   0.27   0.92   3.61  12.36
OS_HeapSort     0.00   0.00   0.01   0.05   0.19   0.63   2.37   7.87
Heapsort        0.00   0.00   0.01   0.03   0.10   0.37   1.37   4.55

6. Shuffled

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.01   0.12   0.57   9.38
Shellsort-2     0.00   0.00   0.11   0.54  10.86
Shellsort-2a    0.00   0.00   0.01   0.05   0.21   0.89   4.27  17.62
Shellsort-3     0.01   0.00   0.01   0.03   0.13   0.43   1.77   5.83
Mergesort       0.00   0.00   0.01   0.04   0.17   0.59   2.03   7.10
Quick(Naive)    0.00   0.01   0.01   0.02   0.08   0.29   1.11   3.52
Quick(M3,18I)   0.00   0.01   0.01   0.01   0.07   0.25   1.56   3.07
Quick(Gen.)     0.00   0.01   0.01   0.03   0.12   0.43   2.82   6.04
qsort()         0.00   0.01   0.03   0.11   0.45   1.50   6.11  19.99
OS_HeapSort     0.00   0.00   0.01   0.06   0.22   0.76   2.99  10.35
Heapsort        0.00   0.01   0.01   0.03   0.13   0.47   1.96   6.94

Small datasets

Now, corresponding results for smaller datasets; these numbers are in centiseconds, and are a bit more reliable. Each routine was run enough times to give reasonable accuracy.

1. Entirely random

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.12   1.02  11.34 117.35
Shellsort-2     0.06   0.28   1.45   6.71
Shellsort-2a    0.05   0.21   0.96   3.94
Shellsort-3     0.05   0.18   0.85   3.43
Mergesort       0.09   0.33   1.30   4.67
Quick(Naive)    0.05   0.17   0.66   2.29
Quick(M3,18I)   0.04   0.13   0.54   1.91
Quick(Gen.)     0.06   0.23   0.94   3.27
qsort()         0.17   0.70   3.23  12.97
OS_HeapSort     0.09   0.33   1.36   5.07
Heapsort        0.05   0.18   0.77   3.06

2. Ascending order

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.01   0.02   0.08   0.26
Shellsort-2     0.04   0.16   0.71   2.80
Shellsort-2a    0.04   0.14   0.63   2.55
Shellsort-3     0.03   0.10   0.44   1.72
Mergesort       0.06   0.20   0.78   2.70
Quick(Naive)    0.03   0.10   0.44   1.34
Quick(M3,18I)   0.02   0.06   0.29   1.14
Quick(Gen.)     0.03   0.13   0.51   2.00
qsort()         0.07   0.27   1.16   4.27
OS_HeapSort     0.09   0.34   1.40   5.02
Heapsort        0.05   0.19   0.80   2.92

3. Descending order

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.22   1.99  22.57 239.92
Shellsort-2     0.05   0.19   0.87   3.28
Shellsort-2a    0.04   0.17   0.78   3.04
Shellsort-3     0.04   0.14   0.61   2.32
Mergesort       0.08   0.28   1.05   3.80
Quick(Naive)    0.03   0.11   0.46   1.41
Quick(M3,18I)   0.02   0.07   0.31   1.22
Quick(Gen.)     0.03   0.14   0.54   2.09
qsort()         0.12   0.46   2.03   7.35
OS_HeapSort     0.08   0.30   1.27   4.60
Heapsort        0.04   0.17   0.72   2.67

4. Few swaps

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.04   0.17   0.97   5.43
Shellsort-2     0.06   0.22   0.95   3.76
Shellsort-2a    0.05   0.19   0.82   3.20
Shellsort-3     0.04   0.16   0.65   2.45
Mergesort       0.10   0.29   1.09   3.85
Quick(Naive)    0.04   0.12   0.45   1.37
Quick(M3,18I)   0.03   0.08   0.32   1.16
Quick(Gen.)     0.04   0.15   0.56   2.04
qsort()         0.14   0.52   2.24   8.18
OS_HeapSort     0.10   0.35   1.42   5.02
Heapsort        0.06   0.20   0.82   2.93

5. Small deviations

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.01   0.07   0.30   1.51
Shellsort-2     0.05   0.20   0.80   3.06
Shellsort-2a    0.04   0.18   0.69   2.78
Shellsort-3     0.03   0.15   0.55   2.26
Mergesort       0.08   0.26   0.99   3.46
Quick(Naive)    0.05   0.16   0.55   2.01
Quick(M3,18I)   0.03   0.12   0.43   1.53
Quick(Gen.)     0.05   0.20   0.80   2.89
qsort()         0.10   0.41   1.73   6.44
OS_HeapSort     0.10   0.36   1.36   4.78
Heapsort        0.05   0.20   0.80   2.89

6. Shuffled

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.11   1.00  11.20 115.92
Shellsort-2     0.15   1.14  11.75 116.92
Shellsort-2a    0.05   0.22   1.07   4.59
Shellsort-3     0.04   0.18   0.78   2.92
Mergesort       0.09   0.33   1.32   4.70
Quick(Naive)    0.05   0.16   0.68   2.36
Quick(M3,18I)   0.04   0.13   0.55   1.92
Quick(Gen.)     0.06   0.23   0.97   3.32
qsort()         0.16   0.69   2.90  10.26
OS_HeapSort     0.09   0.33   1.38   5.17
Heapsort        0.05   0.19   0.79   3.09

Sorting: IV. Tuning quicksort

This is the fourth and last part of a long and boring article about sorting. It describes how to make a quicksort routine go faster.

There are two problems with quicksort. Firstly, and more importantly, if you choose your "pivot" x badly then you can lose badly: if you chop the list into one very small part and one very big part then you haven't gained much. In the worst case, you chop the list into parts of size 1 and N-1. Ouch.

Secondly, the bookkeeping involved in doing all the recursive stuff gets a bit expensive when you're sorting very short lists. It pays to "truncate" and either (i) sort short lists by insertion, or (ii) not sort short lists at all until your recursion is finished, and then do them all at once by doing a single insertion-sort pass over the whole dataset.

To deal with the first problem, there are two common approaches: you can choose a random element of your list as a pivot each time (ugh), or you can examine more than one element and take something in the middle. What my tuned quicksort does is to look at the first, last and middle elements of the sublist being sorted, and choose the median of those three (i.e., if they're A,B,C and A<=B<=C then use B.)

To deal with the second, it turns out that on my machine it's best to truncate when you get to lists of size 18 or smaller, and to do a single sorting pass at the end. (On machines with virtual memory it's sometimes better to do lots of small insertion sorts instead, to improve locality. On machines without VM, like the Risc PC, that doesn't matter and the deciding factor is the decrease in the amount of bookkeeping that results from just doing one insertion sort. I'm not sure what happens on machines with moderately large caches; this might also make the lots-of-little-insertion-sorts approach a win.)

The other issue with quicksort is stack overflow. If you're really unlucky in your choice of pivots (or if you choose them stupidly) then the recursion can get very deep, and this can be a problem. It's therefore safest to do the recursion explicitly, maintaining a stack of things-to-be-done. Then, if the stack is about to overflow, you can just not bother stacking the relevant thing; that means that a sublist won't get sorted. That's sort of OK, because it will get fixed by the insertion sort at the end. This means that really bad choice of pivots just makes the sort take a long time, rather than crashing your machine. If you make your stack a reasonable size (which needn't actually be very big) then overflow is very very very unlikely ever to occur, anyway.

Original article by Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics, Cambridge University, England. Comments about the content of this document should go to him.

Comments about the formatting and other aspects of the HTML should go to Steven Singer.

Up to the welcome page.