9

I have a task to find the difference between every integer in an array of random numbers and return the lowest difference. A requirement is that the integers can be between 0 and int.maxvalue and that the array will contain 1 million integers.

I put some code together which works fine for a small amount of integers but it takes a long time (so long most of the time I give up waiting) to do a million. My code is below, but I'm looking for some insight on how I can improve performance.

for(int i = 0; i < _RandomIntegerArray.Count(); i++) {
  for(int ii = i + 1; ii < _RandomIntegerArray.Count(); ii++) {
    if (_RandomIntegerArray[i] == _RandomIntegerArray[ii]) continue;

    int currentDiff = Math.Abs(_RandomIntegerArray[i] - _RandomIntegerArray[ii]);

    if (currentDiff < lowestDiff) {
      Pairs.Clear();
    }

    if (currentDiff <= lowestDiff) {
      Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[ii]));
      lowestDiff = currentDiff;
    }
  }
}

Apologies to everyone that has pointed out that I don't sort; unfortunately sorting is not allowed.

halfer
  • 18,701
  • 13
  • 79
  • 158
Richard Banks
  • 2,786
  • 5
  • 30
  • 61
  • 4
    It seems that your code currently works, and you are looking to improve it. Generally these questions are too opinionated for this site, but you might find better luck at [CodeReview.SE](//codereview.stackexchange.com/tour). Remember to read [their requirements](//codereview.stackexchange.com/help/on-topic) as they are a bit more strict than this site. – kayess Aug 15 '17 at 11:01
  • 13
    Hint: suppose you were to sort the array first... – Jon Skeet Aug 15 '17 at 11:01
  • How about splitting up the array into ``arraysize/number of processors`` sized chunks and running each chunk in a different thread. – Neil Aug 15 '17 at 11:04
  • Maybe you just need to find the two lowest integers and find the difference between them? – Gabriel Aug 15 '17 at 11:07
  • @Gabriel No, if the array has `int.MaxValue` and `int.MaxValue-1`, that would be the smallest difference (assuming no duplicates) – DavidG Aug 15 '17 at 11:08
  • Hmm you're right. Thank you. @DavidG – Gabriel Aug 15 '17 at 11:09
  • 1
    I can't see from the requirements that the array s/b sorted. – Clay Aug 15 '17 at 11:17
  • What happens if you have two or more random integers with the same value, do you return 0 as the difference. @JonSkeet suggestion of sorting works very well. – Steve Ford Aug 15 '17 at 11:30
  • I'm afraid that any better solution will probably just be a slightly optimized O(N^2) algorithm, or something that resembles some kind of sort, can you use SortedSet or something similar by any chance? – 5ar Aug 19 '17 at 14:01
  • Have you tried Parallel.For? – Steve Ford Oct 03 '17 at 08:59
  • Look at https://stackoverflow.com/questions/12405938/save-time-with-parallel-for-loop – Steve Ford Oct 03 '17 at 09:05

6 Answers6

14

Imagine that you have already found a pair of integers a and b from your random array such that a > b and a-b is the lowest among all possible pairs of integers in the array.

Does an integer c exist in the array such that a > c > b, i.e. c goes between a and b? Clearly, the answer is "no", because otherwise you'd pick the pair {a, c} or {c, b}.

This gives an answer to your problem: a and b must be next to each other in a sorted array. Sorting can be done in O(N*log N), and the search can be done in O(N) - an improvement over O(N2) algorithm that you have.

Sergey Kalinichenko
  • 675,664
  • 71
  • 998
  • 1,399
2

As per @JonSkeet try sorting the array first and then only compare consecutive array items, which means that you only need to iterate the array once:

    Array.Sort(_RandomIntegerArray);
    for (int i = 1; i < _RandomIntegerArray.Count(); i++)
    {
        int currentDiff = _RandomIntegerArray[i] - _RandomIntegerArray[i-1];
        if (currentDiff < lowestDiff)
        {
            Pairs.Clear();
        }

        if (currentDiff <= lowestDiff)
        {
            Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[i-1]));
            lowestDiff = currentDiff;
        }

    }

In my testing this results in < 200 ms elapsed for 1 million items.

Steve Ford
  • 6,692
  • 15
  • 36
1

You've got a million integers out of a possible 2.15 or 4.3 billion (signed or unsigned). That means the largest possible min distance is either about 2150 or 4300. Let's say that the max possible min distance is D.

Divide the legal integers into groups of length D. Create a hash h keyed on integers with arrays of ints as values. Process your array by taking each element x, and adding it to h[x/D].

The point of doing this is that any valid pair of points is either contained in h(k) for some k, or collectively in h(k) and h(k+1).

Find your pair of points by going through the keys of the hash and checking the points associated with adjacent keys. You can sort if you like, or use a bitvector, or any other method but now you're dealing with small arrays (on average 1 element per array).

Dave
  • 4,994
  • 1
  • 18
  • 31
0

As elements of the array are b/w 0 to int.maxvalue, so I suppose maxvalue will be less than 1 million. If it is so you just need to initialise the array[maxvalue] to 0 and then as you read 1 million values increment the value in your array.

Now read this array and find the lowest value as described by others as if all the values were sorted. If at any element is present more than 1 than its value will be >1 so you could easily say that min. difference is 0.

NOTE- This method is efficient only if you do not use sorting and more importantly int.maxvalue<<<<<(less than) 10^6(1 million).

0

It helps a little if you do not count on each iteration

int countIntegers = _RandomIntegerArray.Count();
for(int i = 0; i < countIntegers; i++) {
    //...
    for(int ii = i + 1; ii < countIntegers; ii++) {
        //...

Given that Count() is only returning the number of Ints in an array on each successful count and not modifying the array or caching output until modifications.

Nitin
  • 871
  • 1
  • 8
  • 21
  • (Come on! You can see fully well the inner loop won't get entered if `i == countIntegers-1`, so why not terminate the outer loop before?) – greybeard Feb 17 '19 at 00:06
  • The probability of two values being the same is not known, unless i am missing something. – Nitin Feb 17 '19 at 16:00
  • It is you presenting code which lets `i` assume that value in the last trip of the outer loop - nothing probabilistic I can see. OTOH, that was just me trying to pull your leg for adding a "micro-efficiency" consideration to a question 18 months old. – greybeard Feb 17 '19 at 16:06
  • Haha, you got me! But i am wondering (lacking proof currently though), that it could save quite a lot of processing time if the array is not counted a million times and on each loop another million times. It seems excessive. – Nitin Feb 17 '19 at 17:16
  • There is also no information on how count is implemented. Better be safe;) – Nitin Feb 17 '19 at 17:42
  • One could avoid establishing the count multiply just counting down towards zero. Comparing to zero sure is *faster*, anyway. But wait, what if *count* changes during iteration? Better worried than sorry… – greybeard Feb 17 '19 at 18:27
  • You are right. Thanks for zero check, did not know that. – Nitin Feb 17 '19 at 20:55
  • 1
    (Still jesting (at least trying to).) That said, there *have been* (compilers &) architectures where that *is* true (after all, one immediate value less in the stream of machine code), and there *still may be* occasions where it holds: think of caching & branch prediction getting in the way. – greybeard Feb 17 '19 at 22:43
-1

How about splitting up the array into arraysize/number of processors sized chunks and running each chunk in a different thread. (Neil)

Assume three parts A, B and C of size as close as possible.
For each part, find the minimum "in-part" difference and that of pairs with the first component from the current part and the second from the next part (A being the next from C).
With a method taking O(n²) time, n/3 should take one ninth, done 2*3 times, this amounts to two thirds plus change for combining the results.
This calls to be applied recursively - remember Карацу́ба/Karatsuba multiplication?
Wait - maybe use two parts after all, for three fourth of the effort - very close to "Karatsuba". (When not seeing how to use an even number of parts, I was thinking multiprocessing with every processor doing "the same".)

greybeard
  • 2,015
  • 5
  • 20
  • 51
  • (Can't seem to wrap my head around using an even number of parts. (With *n* a power of 10, it may be wiser to try and use five parts instead of 3. Three fifth the effort with every split, but fewer splits to go.)) – greybeard Aug 15 '17 at 21:56
  • (calls to be applied recursively? calls for recursive application? asks? begs? arrgh) – greybeard Aug 15 '17 at 22:09