10

I want to create a large set of random point cloud in 2D plane that are non-degenerate (no 3 points in a straight line in the whole set). I have a naive solution which generates a random float pair P_new(x,y) and checks with every pair of points (P1, P2) generated till now if point (P1, P2, P) lie in same line or not. This takes O(n^2) checks for each new point added to the list making the whole complexity O(n^3) which is very slow if I want to generate more than 4000 points (takes more than 40 mins). Is there a faster way to generate these set of non-degenerate points?

unkulunkulu
  • 10,526
  • 2
  • 27
  • 47
alpha_cod
  • 1,703
  • 3
  • 18
  • 40
  • 1
    What kind of random distribution, if any, does the point cloud have to adhere to? Should it be white, pink, Brownian or some other kind of noise? It's comparatively easy to design an algorithm such as any new point added is guaranteed not to lie on a line with two other points (just distribute them along the circumference of any positive-curvature curve, for example a circle), but this random point cloud might not be quite what you're looking for. – Martin Sojka Jul 26 '11 at 07:04
  • If you are generating points on a 2D plane of *real numbers* the possibility of aligning three random points on a line is theoretically zero in a cloud containing finite number of points. – Tugrul Ates Jul 26 '11 at 07:22
  • 2
    @junjanes: Practically, it's on the order of 1 / combinations( 2 ^ (amount of bits in the mantissa + 1), 3) for floating point numbers (for three points; for more, see -> Birthday Paradox). – Martin Sojka Jul 26 '11 at 07:32
  • i get it, practically the possibility is very low, but i want to check for the possibility to avoid degeneracy in my point set. – alpha_cod Jul 26 '11 at 10:23

5 Answers5

4

Instead of checking the possible points collinearity on each cycle iteration, you could compute and compare coefficients of linear equations. This coefficients should be store in container with quick search. I consider using std::set, but unordered_map could fit either and could lead to even better results.

To sum it up, I suggest the following algorithm:

  1. Generate random point p;
  2. Compute coefficients of lines crossing p and existing points (I mean usual A,B&C). Here you need to do n computations;
  3. Trying to find newly computed values inside of previously computed set. This step requires n*log(n^2) operations at maximum.
  4. In case of negative search result, add new value and add its coefficients to corresponding sets. Its cost is about O(log(n)) too.

The whole complexity is reduced to O(n^2*log(n)). This algorithm requires additional storing of n^2*sizeof(Coefficient) memory. But this seems to be ok if you are trying to compute 4000 points only.

eugene_che
  • 1,967
  • 10
  • 11
  • Great idea, but you have to pay attention to degenerate cases (points on the same vertical line). – grep Jul 26 '11 at 06:40
  • 1
    @unkulunkulu, @grep. Actually vertical lines are not very serious problem, because I suggest to use for linear equation form `Ax+By+C=0`, not `kx+b=0`. The vertical line equation is `Ax+C=0` (`B=0`) and there are not infinite numbers here. – eugene_che Jul 26 '11 at 06:50
  • 1
    there are several representations for each line - I guess you must normalize somehow otherwise searching might be difficult. – Tobias Langner Jul 26 '11 at 07:13
  • just to make sure, in the step-3, is the complexity n*log(n^2) or (n^2)*log(n^2) ?? – A. K. Jul 26 '11 at 07:17
  • @Torbias, nice catch. Still normalization does not seem to be difficult. We can always separate the equations into 3 groups. For example, `Ax+0y+C=0`, `0x+By+C=0` & `Dx+By+C=0` where 'D' is a given constant. – eugene_che Jul 26 '11 at 07:22
  • 2
    @Aditya. There are at most `n^2` elements in set. Thus binary search complexity for every pattern is `log(n*n)`. There are n patterns. So overall complexity is `O(n*log(n))`. – eugene_che Jul 26 '11 at 07:26
  • 1
    Normalization is bad idea as it introduces square root, which will yield very tricky precision problems. – unkulunkulu Jul 26 '11 at 07:29
  • Alternatively, store ɸ and y(x=0). This collapses `A` and `B` into a single angle ɸ. The advantage is that you calculate ɸ only O(n*n) times, instead of having to normalize `A` and `B` O(n*n*log n) times. – MSalters Jul 26 '11 at 07:51
  • @tzy: a) I think you could normalize with Ax+By+C=0 where C is a given constant or 0 (2 groups). Problems might arise where A or B get too large because of normalization. Your approach has the same problem with B or C getting too large because of normalization. This might or might not be an issue depending on the size of the plane. – Tobias Langner Jul 26 '11 at 08:38
  • No sqrt normalization: `max(|A|, |B|) = max(A, B) = 1` – salva Jul 26 '11 at 09:02
  • 1
    Use a hash to get rid of the O(N*log(N)) search. That would make the full algorithm O(N^2). – salva Jul 26 '11 at 09:12
  • @salva, I mentioned that unordered_set is a good approach but it is not so regular container. Thus I decided to explain algorithm using std::set. Now I clearly understand that it was a strange decision :-). – eugene_che Jul 26 '11 at 09:32
  • 1
    @tyz, anyway, IMO, we are ignoring the hardest to solve part of this problem: how to handle rounding errors right. Some criterion should be defined in advance to determine if three points are on the same line or not as for instance, `π - max_angle(P0, P1, P2) < ε`. And STL containers are not able to cope with that approximated-has directly. – salva Jul 26 '11 at 10:11
  • Thanks for the idea. I'll try coding it. I also want to scale the solution to compute 1 million points at least. – alpha_cod Jul 26 '11 at 10:35
  • @salva, you're right, my algorithm is ok with that, because it doesn't deal with angles and is a classical one, we implemented it on one of the ACM contests. – unkulunkulu Jul 26 '11 at 11:07
4

O(n^2 log n) algorithm can be easily constructed in the following way:

For each point P in the set:

  1. Sort other points by polar angle (cross-product as a comparison function, standard idea, see 2D convex hull gift-wrapping algorithm for example). In this step you should consider only points Q that satisfy

    Q.x > P.x || Q.y >= P.y
    
  2. Iterate over sorted list, equal points are lying on the same line.

Sorting is done in O(n log n), step 2. is O(n). This gives O(n^2 log n) for removing degenerate points.

unkulunkulu
  • 10,526
  • 2
  • 27
  • 47
  • @salva, how exactly do you use radix to sort vectors by polar angle? – unkulunkulu May 03 '12 at 15:07
  • briefly, you transform the doubles into a representation that allows comparing them as strings, then you perform a most significant digit radix sort. In practice, the recursive variant is probably the faster. See the [wikipedia page](http://en.wikipedia.org/wiki/Radix_sort#Most_significant_digit_radix_sorts) and its links section. – salva May 03 '12 at 16:02
  • @salva, sorry, but this should be some kind of joke, because converting doubles to strings would kill everything radix offers. Moreover, in this particular algorithm we don't have explicit doubles to compare: the cross-product sign serves as the generalized comparison function, so we have to use one of the general sorts that use comparisons only, that's why I wondered about the radix applicability to this task. – unkulunkulu May 04 '12 at 11:04
  • you don't have to convert the doubles to its string representation but to apply a transformation that allows to compare them byte by byte. In practice that means performing some basic operations (i.e. and, or, not, neg) over the doubles native binary representation. – salva May 04 '12 at 11:29
  • @salva, ok, can you just link me to a paper or a webpage where someone does this. Anyway, again, I don't have any doubles to sort, so this is of no help to this particular answer. – unkulunkulu May 04 '12 at 12:03
  • Google is your friend! Anyway, see [this](http://stackoverflow.com/questions/4640906/radix-sort-sorting-a-float-data). My own reply to the OP also references a Perl module implementing it. – salva May 04 '12 at 21:34
3

Determining whether a set of points is degenerate is a 3SUM-hard problem. (The very first problem listed is determining whether three lines contains a common point; the equivalent problem under projective duality is whether three points belong to a common line.) As such, it's not reasonable to hope that a generate-and-test solution will be significantly faster than n2.

What are your requirements for the distribution?

heavy
  • 31
  • 1
2
  1. generate random point Q

  2. for previous points P calculate (dx, dy) = P - Q

  3. and B = (asb(dx) > abs(dy) ? dy/dx : dx/dy)

  4. sort the list of points P by its B value, so that points that form a line with Q will be in near positions inside the sorted list.

  5. walk over the sorted list checking where Q forms a line with the current P value being considered and some next values that are nearer than a given distance.

Perl implementation:

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Math::Vector::Real;
use Math::Vector::Real::Random;
use Sort::Key::Radix qw(nkeysort);

use constant PI => 3.14159265358979323846264338327950288419716939937510;

@ARGV <= 2 or die "Usage:\n  $0 [n_points [tolerance]]\n\n";

my $n_points = shift // 4000;
my $tolerance = shift // 0.01;

$tolerance = $tolerance * PI / 180;
my $tolerance_arctan = 3 / 2 * $tolerance;
#     I got to that relation using no so basic maths in a hurry.
#     it may be wrong!

my $tolerance_sin2 = sin($tolerance) ** 2;

sub cross2d {
    my ($p0, $p1) = @_;
    $p0->[0] * $p1->[1] - $p1->[0] * $p0->[1];
}

sub line_p {
    my ($p0, $p1, $p2) = @_;
    my $a0 = $p0->abs2 || return 1;
    my $a1 = $p1->abs2 || return 1;
    my $a2 = $p2->abs2 || return 1;
    my $cr01 = cross2d($p0, $p1);
    my $cr12 = cross2d($p1, $p2);
    my $cr20 = cross2d($p2, $p0);
    $cr01 * $cr01 / ($a0 * $a1) < $tolerance_sin2 or return;
    $cr12 * $cr12 / ($a1 * $a2) < $tolerance_sin2 or return;
    $cr20 * $cr20 / ($a2 * $a0) < $tolerance_sin2 or return;
    return 1;
}

my ($c, $f1, $f2, $f3) = (0, 1, 1, 1);

my @p;
GEN: for (1..$n_points) {
    my $q = Math::Vector::Real->random_normal(2);
    $c++;
    $f1 += @p;
    my @B = map {
        my ($dx, $dy) = @{$_ - $q};
        abs($dy) > abs($dx) ? $dx / $dy : $dy / $dx;
    } @p;

    my @six = nkeysort { $B[$_] } 0..$#B;

    for my $i (0..$#six) {
        my $B0 = $B[$six[$i]];
        my $pi = $p[$six[$i]];
        for my $j ($i + 1..$#six) {
            last if $B[$six[$j]] - $B0 > $tolerance_arctan;
            $f2++;
            my $pj = $p[$six[$j]];
            if (line_p($q - $pi, $q - $pj, $pi - $pj)) {
                $f3++;
                say "BAD: $q $pi-$pj";
                redo GEN;
            }
        }
    }
    push @p, $q;
    say "GOOD: $q";
    my $good = @p;
    my $ratiogood = $good/$c;
    my $ratio12 = $f2/$f1;
    my $ratio23 = $f3/$f2;
    print STDERR "gen: $c, good: $good, good/gen: $ratiogood, f2/f1: $ratio12, f3/f2: $ratio23                                  \r";
}
print STDERR "\n";

The tolerance indicates the acceptable error in degrees when considering if three points are in a line as π - max_angle(Q, Pi, Pj).

It does not take into account the numerical instabilities that can happen when subtracting vectors (i.e |Pi-Pj| may be several orders of magnitude smaller than |Pi|). An easy way to eliminate that problem would be to also require a minimum distance between any two given points.

Setting tolerance to 1e-6, the program just takes a few seconds to generate 4000 points. Translating it to C/C++ would probably make it two orders of magnitude faster.

salva
  • 9,300
  • 3
  • 24
  • 55
1

O(n) solution:

  1. Pick a random number r from 0..1
  2. The point added to the cloud is then P(cos(2 × π × r), sin(2 × π × r))
Martin Sojka
  • 256
  • 2
  • 7
  • two of them could end up the same anyway – unkulunkulu Jul 26 '11 at 07:27
  • @unkulunkulu: A check for that would make the whole thing O(n log n) at most, though. :) Still I doubt that the result would be quite what alpha_cod thought of when he asked for a "random point cloud", but the question is a bit under-defined. – Martin Sojka Jul 26 '11 at 07:38
  • @Martin Sojka: It's O(N) if you use a hash or radix sort to eliminate duplicates. – salva Jul 27 '11 at 06:05