14

In my program, I have a set of points. For purposes of rescaling, I am searching for the two nodes that are furthest apart, and then computing a factor by which to multiply all coordinates so that the maximum distance is equal to some predefined one I define.

The algorithm I am using to find the two points furthest apart is, however, problematic for large sets of points, as it is O(n^2); pseudocode (distances that were already calculated are skipped):

 for each point in points:
     for each other point in points:
         if distance between point and other point > max
             max = distance between point and other point

Is there something faster?

houbysoft
  • 29,496
  • 22
  • 91
  • 153
  • possible duplicate of [Efficient algorithm for finding spheres farthest apart in large collection](http://stackoverflow.com/questions/2276488/efficient-algorithm-for-finding-spheres-farthest-apart-in-large-collection) – Beta Jun 29 '11 at 16:59
  • I posted an O(n) algorithm that includes some error margin that may be useful to your particular problem of scaling. – gradbot Jun 29 '11 at 18:17
  • i think this will skip repeated indexes and previously visited (but in reverse order) pairs of indexes: for (int i = 0, j; i < (count - 1); i++) for (j = i + 1; j < count ; j++) //for the first for loop you can use (count - 1) or count because the inner loop will not run at that point anyway. – FocusedWolf Jul 01 '15 at 21:07
  • See [John Feminella's excellent answer](https://stackoverflow.com/questions/2276488/efficient-algorithm-for-finding-spheres-farthest-apart-in-large-collection) to this similar question. You should get O(n log n) for the average case. – Roger Jun 29 '11 at 16:58
  • It's still n^2 worst-case, though, and it's easy to find instances where it's worse than the naive code. So it's good if you know your points are 'randomly' distributed but bad if they're adversarially chosen. – Charles Jun 29 '11 at 17:15
  • True. I'd agree with Martin that clustering is a big help, I use geohashing for this in a number of geo aware applications. – Roger Jun 29 '11 at 17:24
  • The convex hull can be calculated in O(n ln n) worst case. It takes O(n) time to find the diameter of the set by searching over antipodal points in the convex hull. More details are in my answer – Kipton Barros Jun 29 '11 at 17:27

7 Answers7

18

If you just need the scale and not the exact points you can do this in O(n) time with some error margin. Think about the simple case of making a bounding box. Calculate the minimum x value from all the points, the maximum x, the minimum y and the maximum y. These four numbers give you a maximum bounding box around your points with max error of 1 - (1/sqrt(2)) about 30%. You can reduce this by adding more sides to your square. Think about the case of an octagon. To calculate the min and max values for the other sides you have to rotate your coordinate system.

Error vs run time breaks down like this.

shape - run time - max error

  • square - 2N - 30%
  • octagon - 4N - 16%
  • 16 sides - 8N - 4%
  • 32 sides - 16N - 1%

Here's the equation for the max error I came up with.

angle = 180 / sides
max_error = (1 / cos angle) - cos angle

Let me know if I should add a diagram explaining this.

gradbot
  • 13,374
  • 5
  • 34
  • 68
  • Great idea! The points are actually in 3D, but the principle works too. I don't really care too much about the error (it's just so it's easier to see, basically), so a cube will be simple and fast to implement. Thanks! – houbysoft Jun 29 '11 at 18:24
16

The following may help to put the average case linear-time algorithms (for the diameter of a finite set) in a clearer light, as well as contrast multi-dimensional and plane geometry problems.

In 1983 Megiddo gave a deterministic linear-time algorithm for the smallest enclosing circle (or sphere in higher dimensions).

In general position the enclosing circle will have two or three points on its boundary, and thus finding the two farthest apart can be done "on average" in constant time once the bounding circle is known. In higher dimensions the number of points in general position needed on the boundary of the sphere increases (D+1 points for dimension D), and indeed the cost of computing distance between a single pair of points rises linearly with dimension.

The subset of points lying on the bounding circle or sphere is also found in linear time. In the theoretical worst-case all points would lie on the bounding circle or sphere, but this at least is stricter than just having all points on the convex hull. If the points on the sphere are independently perturbed, say along radial lines, then general position is assured with probability 1, and an approximate diameter can be found from just D+1 points on the revised enclosing sphere. This randomized approximation has quadratic dependence on dimension but only linear complexity in number of points.

If the points lying on a bounding circle are "sorted" (cyclically, of course), finding the pair farthest apart can be done in linear time, relying upon the "unimodality" of the circle (meaning that distances from a fixed point rise monotonically until the antipode and then fall) to amortize the cost of computing distances. Unfortunately that sorting would introduce a step with O(n log n) time complexity, and this proves to be worst-case optimal for exact deterministic methods in the planar case.

In 2001 Ramos succeeded in showing an O(n log n) deterministic algorithm for three-dimensional sets, but the technique is so involved that an implementation may be impractical or slower than brute force all-pairs search up to very large datasets.

For higher dimensions many authors have considered randomized or approximate algorithms. See Piotr Indyk's thesis (2000) for approximate methods with only polynomial dependence on dimension for various proximity problems.

Community
  • 1
  • 1
hardmath
  • 8,423
  • 2
  • 32
  • 60
13

As mentioned in this answer, you are seeking the "diameter" of the set of N points, a well known problem in computational geometry. There are basically two steps:

  1. Find the convex hull of the points. Algorithms exist that are O(N ln N), worst case. In practice, QuickHull is usually a fast choice, although potentially O(N^2) worst case. The QHull implementation is convenient to call from the command line. The CGAL library provides a C++ implementation

  2. Antipodal pairs on the convex hull are candidates for furthest points. One can search over the antipodal points using an algorithm like Rotating calipers in O(N) time.

The problem can be generalized to an "all-farthest pairs" problem: for each point i, find the most distant point j---we're now seeking N pairs of points. The solution again uses the convex hull, but now the second part can be done with a matrix searching algorithm.

Community
  • 1
  • 1
Kipton Barros
  • 20,144
  • 3
  • 63
  • 77
4

Not really - a common approach is to group points into clusters and then store the distances between clusters.

That way you don't need to check if a certain house in New York is furthest from Paris if you have already determiend that Australia is further away

Martin Beckett
  • 90,457
  • 25
  • 178
  • 252
3

The distance from A to B is the same as the distance from B to A. You can easily modify the algorithm to eliminate half of the computations this way. It'll still be O(n^2) but will be twice as fast.

That is, instead of computing all off-diagonal elements of the distance matrix P x P:

P = {A, B, C, D, ...}

  + A + B + C + D + ...
A |   | * | * | * | ...
B | * |   | * | * | ...
C | * | * |   | * | ...
D | * | * | * |   | ...
  |   |   |   |   |

you can compute either the upper triangle:

  + A + B + C + D + ...
A |   | * | * | * | ...
B |   |   | * | * | ...
C |   |   |   | * | ...
D |   |   |   |   | ...
  |   |   |   |   |

or the lower triangle:

  + A + B + C + D + ...
A |   |   |   |   | ...
B | * |   |   |   | ...
C | * | * |   |   | ...
D | * | * | * |   | ...
  |   |   |   |   |
Matt Ball
  • 332,322
  • 92
  • 617
  • 683
2

If you perform this query often but the points do not change much, you can perform precalculations that can speed things up.

Each point can store the farthest point from it and recheck on every point addition if the new point is farther.

When you query you just go thru all the points and look at their cached points.

You end up with O(n) for new point entry and O(n) for farthest apart query.

Variant
  • 16,291
  • 4
  • 37
  • 64
1

I'm not sure if putting the points into a spatial index and querying it leads to an O(n log n) algorithm.

Peter G.
  • 13,888
  • 6
  • 51
  • 75