1

While implementing improvements to quicksort partitioning,I tried to use Tukey's ninther to find the pivot (borrowing almost everything from sedgewick's implementation in QuickX.java)

My code below gives different results each time the array of integers is shuffled.

import java.util.Random;
public class TukeysNintherDemo{    
    public static int tukeysNinther(Comparable[] a,int lo,int hi){
        int N = hi - lo + 1;
        int mid = lo + N/2;
        int delta = N/8;
        int m1 = median3a(a,lo,lo+delta,lo+2*delta);
        int m2 = median3a(a,mid-delta,mid,mid+delta);
        int m3 = median3a(a,hi-2*delta,hi-delta,hi);
        int tn = median3a(a,m1,m2,m3);
        return tn;
    }

    // return the index of the median element among a[i], a[j], and a[k]
    private static int median3a(Comparable[] a, int i, int j, int k) {
        return (less(a[i], a[j]) ?
               (less(a[j], a[k]) ? j : less(a[i], a[k]) ? k : i) :
               (less(a[k], a[j]) ? j : less(a[k], a[i]) ? k : i));
    }

    private static boolean less(Comparable x,Comparable y){
        return x.compareTo(y) < 0;
    }
    public static void shuffle(Object[] a) {
    Random random = new Random(System.currentTimeMillis());
        int N = a.length;
        for (int i = 0; i < N; i++) {
            int r = i + random.nextInt(N-i);     // between i and N-1
            Object temp = a[i];
            a[i] = a[r];
            a[r] = temp;
        }
    }
    public static void show(Comparable[] a){    
        int N = a.length;
        if(N > 20){
            System.out.format("a[0]= %d\n", a[0]);
            System.out.format("a[%d]= %d\n",N-1, a[N-1]);
        }else{
            for(int i=0;i<N;i++){
                System.out.print(a[i]+",");
            }
        }
        System.out.println();

    }
    public static void main(String[] args) {
        Integer[] a = new Integer[]{17,15,14,13,19,12,11,16,18};
        System.out.print("data= ");
        show(a);
        int tn = tukeysNinther(a,0,a.length-1);
        System.out.println("ninther="+a[tn]);
    }
}

Running this a cuople of times gives

data= 11,14,12,16,18,19,17,15,13,
ninther=15

data= 14,13,17,16,18,19,11,15,12,
ninther=14

data= 16,17,12,19,18,13,14,11,15,
ninther=16

Will tuckey's ninther give different values for different shufflings of the same dataset? when I tried to find the median of medians by hand ,I found that the above calculations in the code are correct.. which means that the same dataset yield different results unlike a median of the dataset.Is this the proper behaviour? Can someone with more knowledge in statistics comment?

damon
  • 7,127
  • 15
  • 62
  • 105
  • A sorted array is a sorted array. If you shuffle data then sort it you should always end up with the same data. The Median is the middle value; in the same data it should also always be the same. Just like the Mean or Mode. Maybe look at [this](http://stackoverflow.com/questions/12545795/explanation-of-the-median-of-medians-algorithm) and [this](http://stackoverflow.com/questions/9489061/understanding-median-of-medians-algorithm). – Boris the Spider Jul 13 '13 at 08:23
  • On another note; I suppose this is some sort of exercise or test? Because Java has perfectly good methods in [`Arrays`](http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html) and [`Collections`](http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html) to do many of the things you have coded yourself here. `Arrays.toString`, `Collections.shuffle` and, of course, `Arrays.sort`. – Boris the Spider Jul 13 '13 at 08:29
  • I am trying to learn some algorithms stuff ..not trying to solve any problem using java library.. – damon Jul 13 '13 at 09:29
  • your logic for median will work only on sorted array. – crazypaladin Jul 13 '13 at 10:01
  • median3a() compares all the three elements to get the median.. doesn't need sorted array – damon Jul 13 '13 at 11:25

1 Answers1

2

Tukey's ninther examines 9 items and calculates the median using only those.

For different random shuffles, you may very well get a different Tukey's ninther, because different items may be examined. After all, you always examine the same array slots, but a different shuffle may have put different items in those slots.

The key here is that Tukey's ninther is not the median of the given array. It is an attempted appromixation of the median, made with very little effort: we only have to read 9 items and make 12 comparisons to get it. This is much faster than getting the actual median, and has a smaller chance of resulting in an undesirable pivot compared to the 'median of three'. Note that the chance still exists.

Does this answer you question?

On a side note, does anybody know if quicksort using Tukey's ninther still requires shuffling? I'm assuming yes, but I'm not certain.

Timo
  • 6,027
  • 4
  • 38
  • 52
  • I just realize that deterministic (i.e. *non*-shuffled) quicksort using Tukey's ninther is, of course, vulnerable to malicious input. So shuffling is indeed required. – Timo Nov 06 '15 at 10:25