Sorting

By Gareth McCaughan

Converted to HTML by Steven Singer

This document is a copy of a set of postings Gareth McCaughan made on comp.sys.acorn.programmer in September 1996. Apart from formatting changes, and changes to the text to reflect the changes to the formatting, this is the original articles just joined together.

I. An overview of sorting methods

This is the first of four parts of a long and boring article about sorting. It's just a brief summary of most of the most important sorting algorithms.

For a pretty full account of this sort of thing, read volume 3 of Knuth. I'm just going to sketch things.

In all that follows we have an array of N items, numbered 1..N or 0..N-1 (whichever is more convenient), and arranged sequentially. (Sometimes linked lists are more efficient; not usually.)

The items could be anything, and the criterion for being "less than" could be anything (more or less). Depending on what these are, it might be very expensive to compare two items, or it might be very expensive to move an item from one place to another. Or both :-(.

In measuring the efficiency of a sorting technique, it's usual to say something like "it takes time of order N^2". This means that for large enough N the amount of work done by the sorting algorithm looks like some constant times N^2, plus smaller things. This work is made up partly of comparisons and partly of actually moving data around; in different applications these will have different relative costs. (So no one set of figures will be good enough for all cases.)

Here's a little summary of each of several important methods. I'll have more to say later.

Bubble sort

DON'T USE BUBBLE SORT. Everyone learns it because it's easy to write a bubble sort program, so it's become popular with the writers of textbooks. Unfortunately it's a very very bad algorithm. DON'T USE IT.

I'm certainly not going to tell you how it works. You don't need to know. DON'T USE IT.

Insertion sort

This takes time of order N^2 in the worst case, because if the list is in reverse order then you have to do a lot of comparing and moving things around. (Work it out.) It's actually of order N^2 on average too.

Insertion sort is very easy to program, and because of its simplicity it works well when N is very small. For moderate N it is one of the worst plausible algorithms.

Shellsort

The trouble with insertion sort is that everything it does is "local", so that if there are a lot of things in completely the wrong place insertion behaves very badly. So, a guy called Shell invented the following algorithm:

If you're careful, the code for this can be very nearly as short as that for insertion sort; and if you choose your list of values for d carefully it can perform very much better.

No one is quite sure what the theoretical performance of shellsort is. If you choose things carefully, for typical data the time it takes seems to go like N^1.3 or something.

Mergesort

Mergesort takes time of order N log N if you do it right. Unfortunately, it also requires quite a lot of extra storage.

Quicksort

Quicksort is for most applications the fastest known sorting method. It requires a small amount of extra storage, to keep track of the recursion. Its worst-case performance is awful, but if you do things right this more or less never happens. For typical data it takes time of order N log N.

Heapsort

The algorithm for heapsort is more complicated than any of those I've discussed, and won't fit into the amount of space I'm willing to use. You can find it discussed in any number of textbooks.

Heapsort, like quicksort and mergesort, takes time of order N log N. Unlike quicksort it always takes this long (there isn't that awful worst case). Unfortunately, it's usually slower than quicksort by a moderate-sized constant factor (2 or thereabouts). It also requires no extra storage at all.

Radix sort (trivial case)

This only works when your data are numbers in a restricted range, say 0..K-1.

This is clearly blindingly fast: of order N. Unfortunately, it's not common for K to be small enough to do this.

Radix sort (non-trivial case)

This only works where your data are numbers in a restricted range, but the range can be much larger. Say the range is 0..K^r-1, where K is small enough for the trivial case.

This is very efficient, provided you have some way of adding numbers to bins without moving them around. Linked lists are good for this.

That concludes my brief outline of sorting methods. There are plenty more, but you are unlikely ever to need anything not in the list above.

II. An experiment

This is the second of four parts of a long and boring article about sorting. It describes an experiment I did to get some timings for several sorting routines, and briefly summarises the results.

The obvious experiment, really. For each sorting method, and for a range of values of N, make up an array of items and sort them. However, the performance of a sorting method can depend heavily on how the items are arranged (for instance, a really stupid implementation of quicksort may do really badly if it happens to be given an array that's already sorted!), so I did this for a variety of datasets for each N.

All my data items were ordinary 4-byte integers. This was just to make my job a bit easier. It would be interesting to do the same thing again, using some other kinds of data: this might make a significant difference.

Ascending order
Obvious.
Descending order
Obvious.
Random order
Completely random, that is. Just output from rand().
"Few swaps"
Start with ascending order and do about sqrt(N) swaps.
"Small deviations"
Nth item differs from N by random number of order sqrt(N).
"Shuffled"
All the even-index numbers are smaller than all the odd numbers. Each of those sub-arrays is itself "shuffled".

The sorting routines I used were as follows. Ones written so as to allow sorting things other than integers are emphasised. Of course they take some performance loss for this.

I'll say more about tuning quicksort later.

I didn't do a radix sort, because that really requires space to add link fields, and I wanted to work with simple arrays of integers.

The results

The complete figures are in the next part of this article. Here's a summary of the most interesting points.

For sorting 300,000 completely random integers, the ranking was as follows. The numbers are timings, in seconds. Blank lines indicate big drops in performance.

   Tuned quicksort     3.02
   Naive quicksort     3.46

   Generic quicksort   5.22
   Heapsort            6.20
   Mergesort           7.40

   OS_HeapSort         9.52
   Shellsort (good)    9.58
   Shellsort (OKish)  10.08

   qsort()            35.76
   Shellsort (bad)    47.67

   Insertion sort      huge (on the order of 20000, at a guess)

III. The results in full

This is the third of four parts of a long and boring article about sorting. It gives, in full gory horrible detail, the results of the experiment described in the second part.

Large datasets

Firstly, numbers obtained by running each sorting routine once on each sort of dataset. Small numbers should be taken with a pinch of salt for obvious reasons. All numbers are in seconds; blanks indicate huge times.

I'm afraid the figures for generic OS_HeapSort aren't here. That's because the experiment that produced this output didn't include that. Sorry. The results always seem to be within a few centiseconds of the non-generic OS_Heapsort anyway: see the previous part of this article.

1. Entirely random

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.01   0.11   1.16  14.09
Shellsort-2     0.00   0.01   0.01   0.06   0.34   1.52   9.36  47.67
Shellsort-2a    0.00   0.00   0.01   0.04   0.17   0.64   2.59  10.08
Shellsort-3     0.01   0.00   0.01   0.03   0.16   0.59   2.41   9.58
Mergesort       0.00   0.00   0.02   0.04   0.17   0.60   2.27   7.40
Quick(Naive)    0.00   0.00   0.00   0.02   0.08   0.29   1.06   3.46
Quick(M3,18I)   0.00   0.00   0.00   0.02   0.08   0.24   0.93   3.02
Quick(Gen.)     0.00   0.00   0.01   0.03   0.12   0.42   1.60   5.22
qsort()         0.00   0.00   0.04   0.13   0.54   2.08   8.79  35.76
OS_HeapSort     0.00   0.01   0.01   0.05   0.20   0.71   2.75   9.52
Heapsort        0.00   0.00   0.00   0.03   0.13   0.43   1.75   6.20

2. Ascending order

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.00   0.01   0.00   0.01   0.02   0.09   0.26
Shellsort-2     0.00   0.00   0.01   0.03   0.11   0.38   1.44   4.77
Shellsort-2a    0.00   0.00   0.01   0.02   0.10   0.35   1.35   4.53
Shellsort-3     0.00   0.00   0.00   0.01   0.06   0.23   0.89   2.96
Mergesort       0.00   0.01   0.01   0.02   0.10   0.34   1.25   4.03
Quick(Naive)    0.00   0.00   0.01   0.01   0.05   0.19   0.64   2.23
Quick(M3,18I)   0.00   0.00   0.00   0.01   0.04   0.14   0.57   1.82
Quick(Gen.)     0.00   0.00   0.01   0.02   0.08   0.26   1.03   3.27
qsort()         0.00   0.00   0.01   0.05   0.17   0.57   2.17   7.19
OS_HeapSort     0.00   0.00   0.02   0.05   0.20   0.64   2.40   7.86
Heapsort        0.00   0.01   0.00   0.02   0.11   0.38   1.38   4.52

3. Descending order

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.01   0.02   0.23   2.40  28.36
Shellsort-2     0.00   0.01   0.01   0.03   0.13   0.44   1.68   5.58
Shellsort-2a    0.00   0.00   0.01   0.03   0.12   0.40   1.57   5.24
Shellsort-3     0.00   0.00   0.00   0.03   0.09   0.32   1.18   3.91
Mergesort       0.00   0.00   0.01   0.04   0.14   0.48   1.81   5.82
Quick(Naive)    0.00   0.01   0.00   0.02   0.06   0.20   0.67   2.31
Quick(M3,18I)   0.00   0.00   0.00   0.02   0.05   0.16   0.60   1.90
Quick(Gen.)     0.00   0.00   0.01   0.02   0.08   0.28   1.06   3.35
qsort()         0.00   0.00   0.02   0.08   0.29   1.01   3.69  12.11
OS_HeapSort     0.00   0.00   0.02   0.04   0.18   0.61   2.28   7.56
Heapsort        0.00   0.00   0.01   0.03   0.11   0.36   1.33   4.39

4. Few swaps

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.00   0.01   0.05   0.32   1.67  11.10  56.22
Shellsort-2     0.00   0.00   0.01   0.03   0.16   0.61   2.54   7.91
Shellsort-2a    0.00   0.00   0.00   0.03   0.13   0.46   1.74   5.93
Shellsort-3     0.00   0.01   0.00   0.02   0.10   0.34   1.35   4.56
Mergesort       0.00   0.00   0.01   0.04   0.14   0.50   1.83   5.91
Quick(Naive)    0.00   0.00   0.01   0.02   0.05   0.19   0.65   2.23
Quick(M3,18I)   0.00   0.00   0.00   0.01   0.04   0.15   0.58   1.83
Quick(Gen.)     0.00   0.00   0.01   0.02   0.07   0.26   1.03   3.27
qsort()         0.00   0.00   0.03   0.08   0.33   1.12   4.40  15.54
OS_HeapSort     0.00   0.01   0.01   0.05   0.19   0.65   2.41   7.88
Heapsort        0.00   0.00   0.00   0.03   0.11   0.38   1.38   4.52

5. Small deviations

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.00   0.00   0.01   0.08   0.40   2.40  12.37
Shellsort-2     0.00   0.00   0.01   0.03   0.13   0.46   1.83   6.38
Shellsort-2a    0.00   0.00   0.01   0.03   0.11   0.41   1.57   5.30
Shellsort-3     0.00   0.00   0.00   0.02   0.09   0.30   1.16   3.93
Mergesort       0.00   0.00   0.01   0.04   0.13   0.44   1.68   5.47
Quick(Naive)    0.00   0.00   0.01   0.02   0.07   0.23   0.88   2.87
Quick(M3,18I)   0.00   0.00   0.01   0.02   0.07   0.21   0.77   2.52
Quick(Gen.)     0.00   0.01   0.01   0.02   0.11   0.37   1.36   4.47
qsort()         0.00   0.00   0.02   0.07   0.27   0.92   3.61  12.36
OS_HeapSort     0.00   0.00   0.01   0.05   0.19   0.63   2.37   7.87
Heapsort        0.00   0.00   0.01   0.03   0.10   0.37   1.37   4.55

6. Shuffled

                 100    300   1000   3000  10000  30000 100000 300000
              ------ ------ ------ ------ ------ ------ ------ ------
Insertion       0.00   0.01   0.12   0.57   9.38
Shellsort-2     0.00   0.00   0.11   0.54  10.86
Shellsort-2a    0.00   0.00   0.01   0.05   0.21   0.89   4.27  17.62
Shellsort-3     0.01   0.00   0.01   0.03   0.13   0.43   1.77   5.83
Mergesort       0.00   0.00   0.01   0.04   0.17   0.59   2.03   7.10
Quick(Naive)    0.00   0.01   0.01   0.02   0.08   0.29   1.11   3.52
Quick(M3,18I)   0.00   0.01   0.01   0.01   0.07   0.25   1.56   3.07
Quick(Gen.)     0.00   0.01   0.01   0.03   0.12   0.43   2.82   6.04
qsort()         0.00   0.01   0.03   0.11   0.45   1.50   6.11  19.99
OS_HeapSort     0.00   0.00   0.01   0.06   0.22   0.76   2.99  10.35
Heapsort        0.00   0.01   0.01   0.03   0.13   0.47   1.96   6.94

Small datasets

Now, corresponding results for smaller datasets; these numbers are in centiseconds, and are a bit more reliable. Each routine was run enough times to give reasonable accuracy.

1. Entirely random

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.12   1.02  11.34 117.35
Shellsort-2     0.06   0.28   1.45   6.71
Shellsort-2a    0.05   0.21   0.96   3.94
Shellsort-3     0.05   0.18   0.85   3.43
Mergesort       0.09   0.33   1.30   4.67
Quick(Naive)    0.05   0.17   0.66   2.29
Quick(M3,18I)   0.04   0.13   0.54   1.91
Quick(Gen.)     0.06   0.23   0.94   3.27
qsort()         0.17   0.70   3.23  12.97
OS_HeapSort     0.09   0.33   1.36   5.07
Heapsort        0.05   0.18   0.77   3.06

2. Ascending order

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.01   0.02   0.08   0.26
Shellsort-2     0.04   0.16   0.71   2.80
Shellsort-2a    0.04   0.14   0.63   2.55
Shellsort-3     0.03   0.10   0.44   1.72
Mergesort       0.06   0.20   0.78   2.70
Quick(Naive)    0.03   0.10   0.44   1.34
Quick(M3,18I)   0.02   0.06   0.29   1.14
Quick(Gen.)     0.03   0.13   0.51   2.00
qsort()         0.07   0.27   1.16   4.27
OS_HeapSort     0.09   0.34   1.40   5.02
Heapsort        0.05   0.19   0.80   2.92

3. Descending order

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.22   1.99  22.57 239.92
Shellsort-2     0.05   0.19   0.87   3.28
Shellsort-2a    0.04   0.17   0.78   3.04
Shellsort-3     0.04   0.14   0.61   2.32
Mergesort       0.08   0.28   1.05   3.80
Quick(Naive)    0.03   0.11   0.46   1.41
Quick(M3,18I)   0.02   0.07   0.31   1.22
Quick(Gen.)     0.03   0.14   0.54   2.09
qsort()         0.12   0.46   2.03   7.35
OS_HeapSort     0.08   0.30   1.27   4.60
Heapsort        0.04   0.17   0.72   2.67

4. Few swaps

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.04   0.17   0.97   5.43
Shellsort-2     0.06   0.22   0.95   3.76
Shellsort-2a    0.05   0.19   0.82   3.20
Shellsort-3     0.04   0.16   0.65   2.45
Mergesort       0.10   0.29   1.09   3.85
Quick(Naive)    0.04   0.12   0.45   1.37
Quick(M3,18I)   0.03   0.08   0.32   1.16
Quick(Gen.)     0.04   0.15   0.56   2.04
qsort()         0.14   0.52   2.24   8.18
OS_HeapSort     0.10   0.35   1.42   5.02
Heapsort        0.06   0.20   0.82   2.93

5. Small deviations

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.01   0.07   0.30   1.51
Shellsort-2     0.05   0.20   0.80   3.06
Shellsort-2a    0.04   0.18   0.69   2.78
Shellsort-3     0.03   0.15   0.55   2.26
Mergesort       0.08   0.26   0.99   3.46
Quick(Naive)    0.05   0.16   0.55   2.01
Quick(M3,18I)   0.03   0.12   0.43   1.53
Quick(Gen.)     0.05   0.20   0.80   2.89
qsort()         0.10   0.41   1.73   6.44
OS_HeapSort     0.10   0.36   1.36   4.78
Heapsort        0.05   0.20   0.80   2.89

6. Shuffled

                 100    300   1000   3000
              ------ ------ ------ ------
Insertion       0.11   1.00  11.20 115.92
Shellsort-2     0.15   1.14  11.75 116.92
Shellsort-2a    0.05   0.22   1.07   4.59
Shellsort-3     0.04   0.18   0.78   2.92
Mergesort       0.09   0.33   1.32   4.70
Quick(Naive)    0.05   0.16   0.68   2.36
Quick(M3,18I)   0.04   0.13   0.55   1.92
Quick(Gen.)     0.06   0.23   0.97   3.32
qsort()         0.16   0.69   2.90  10.26
OS_HeapSort     0.09   0.33   1.38   5.17
Heapsort        0.05   0.19   0.79   3.09

Sorting: IV. Tuning quicksort

This is the fourth and last part of a long and boring article about sorting. It describes how to make a quicksort routine go faster.

There are two problems with quicksort. Firstly, and more importantly, if you choose your "pivot" x badly then you can lose badly: if you chop the list into one very small part and one very big part then you haven't gained much. In the worst case, you chop the list into parts of size 1 and N-1. Ouch.

Secondly, the bookkeeping involved in doing all the recursive stuff gets a bit expensive when you're sorting very short lists. It pays to "truncate" and either (i) sort short lists by insertion, or (ii) not sort short lists at all until your recursion is finished, and then do them all at once by doing a single insertion-sort pass over the whole dataset.

To deal with the first problem, there are two common approaches: you can choose a random element of your list as a pivot each time (ugh), or you can examine more than one element and take something in the middle. What my tuned quicksort does is to look at the first, last and middle elements of the sublist being sorted, and choose the median of those three (i.e., if they're A,B,C and A<=B<=C then use B.)

To deal with the second, it turns out that on my machine it's best to truncate when you get to lists of size 18 or smaller, and to do a single sorting pass at the end. (On machines with virtual memory it's sometimes better to do lots of small insertion sorts instead, to improve locality. On machines without VM, like the Risc PC, that doesn't matter and the deciding factor is the decrease in the amount of bookkeeping that results from just doing one insertion sort. I'm not sure what happens on machines with moderately large caches; this might also make the lots-of-little-insertion-sorts approach a win.)

The other issue with quicksort is stack overflow. If you're really unlucky in your choice of pivots (or if you choose them stupidly) then the recursion can get very deep, and this can be a problem. It's therefore safest to do the recursion explicitly, maintaining a stack of things-to-be-done. Then, if the stack is about to overflow, you can just not bother stacking the relevant thing; that means that a sublist won't get sorted. That's sort of OK, because it will get fixed by the insertion sort at the end. This means that really bad choice of pivots just makes the sort take a long time, rather than crashing your machine. If you make your stack a reasonable size (which needn't actually be very big) then overflow is very very very unlikely ever to occur, anyway.


Original article by Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics, Cambridge University, England. Comments about the content of this document should go to him.

Comments about the formatting and other aspects of the HTML should go to Steven Singer.


[Up] Up to the welcome page.