12

I am using the Rtree implementation of boost::geometry to store (lots of) 2D points. Now I need to do distance-based nearest neigbors queries.

However, the manual only describes queries as rectangular boxes (i.e. "Get me all the points that are inside this rectangle") or "KNN" queries ("Get me the nearest 'n' points from here).

What I want is actually "Get me the set of points that are at a distance less than 'n'".

I noticed that you can define a unary-predicate, but is is... unary (thus, not suitable for a condition on two points).

The manual documents some model::ring class that I thought at first might fit for a circle, but it is actually more a kind of a piece-wise line (a polygon). Is that assumption correct ?

Is there another way to process such a query ? Or is it simply not possible ?

kebs
  • 4,983
  • 3
  • 36
  • 61

2 Answers2

9

The last example in the documented "User-defined queries" shows how to use a lambda in the predicate. This lambda can bind other variables in the scope, for instance, the point whose neighbors you are looking for.

Here is a small example. The example looks for points that are closer to (5, 5) than 2 units, for a collection of points that lie on the straight y = x. It should be easy to modify in order to check first if the sought point is in the R-tree, or get it directly out of the R-tree.

#include <iostream>

#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/index/rtree.hpp>


namespace bg = boost::geometry;
namespace bgi = boost::geometry::index;

typedef bg::model::point<float, 2, bg::cs::cartesian> point;
typedef std::pair<point, unsigned> value;

int main(int argc, char *argv[])
{
    bgi::rtree< value, bgi::quadratic<16> > rtree;

    // create some values
    for ( unsigned i = 0 ; i < 10 ; ++i )
    {
        point p = point(i, i);
        rtree.insert(std::make_pair(p, i));
    }

    // search for nearest neighbours
    std::vector<value> returned_values;
    point sought = point(5, 5);
    rtree.query(bgi::satisfies([&](value const& v) {return bg::distance(v.first, sought) < 2;}),
                std::back_inserter(returned_values));

    // print returned values
    value to_print_out;
    for (size_t i = 0; i < returned_values.size(); i++) {
        to_print_out = returned_values[i];
        float x = to_print_out.first.get<0>();
        float y = to_print_out.first.get<1>();
        std::cout << "Select point: " << to_print_out.second << std::endl;
        std::cout << "x: " << x << ", y: " << y << std::endl;
    }

    return 0;
}

Compile and run with Boost installed via Macports on OS X:

$ c++ -std=c++11 -I/opt/local/include -L/opt/local/lib main.cpp -o geom && ./geom
Select point: 4
x: 4, y: 4
Select point: 5
x: 5, y: 5
Select point: 6
x: 6, y: 6
logc
  • 3,415
  • 1
  • 15
  • 28
  • 1
    For future readers, this compiles easily on Debian/Ubuntu with `g++ -o main.o -c main.cpp -std=c++0x`, with gcc 4.6.3 and Boost 1.54 installed through `apt-get` – kebs Apr 08 '14 at 06:55
  • 10
    This solution will work but not efficiently. satisfies() predicate is checked only for Values which means that the indexing features of the rtree are not used, all Values must be checked if there is no additional spatial or nearest predicate. You want to return ALL values within some distance, not the NEAREST values. This is not a kNN query but a spatial query. The simplest and most efficient way of doing it is to search in a Box enclosing your area and then check the distance or pass additional satisfies() predicate e.g.: – Adam Wulkiewicz Apr 09 '14 at 09:45
  • 3
    you could just add bgi::within() to the above query to actually spatially search in the index: rt.query(bgi::within(enc_box) && bgi::satisfies(distance_pred), out); – Adam Wulkiewicz Apr 09 '14 at 09:53
  • 1
    An alternative would be to use not officially released extension - nsphere (https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/extensions/nsphere). If passed with the spatial predicate the query would return all Values within a circle. Last time I checked (some time ago) something like this worked but I can't guarantee that it still works: bgi::query(bgi::intersects(my_circle), out); – Adam Wulkiewicz Apr 09 '14 at 10:03
  • @Adam: your comments seemed right and I gave it a go, by generating 100k points uniformly distributed between 0 and 10. However, I was unable to measure a significant speed improvement. The code I have used is [here in this Gist](https://gist.github.com/logc/10272165), which turns into my version by commenting out line 39. Compiled and measured by: `c++ -std=c++11 -I/opt/local/include -L/opt/local/lib main.cpp -o geom && time ./geom` – logc Apr 09 '14 at 13:52
  • @logc: You could store 1M Points, run the query 100 times, just in case update some dummy variable defined as external each time to prevent optimization. Then you should see the difference. Btw, did you compile with optimizations enabled? Running query() with single satisfies() predicate is slightly slower than simple iterating over all of the values stored in the std::vector. – Adam Wulkiewicz Apr 09 '14 at 23:25
  • @logc: You measure the time of the execution of the whole program, is that right? In your example the time is dominated by the insertion of values into the rtree. You should probably only measure the time of querying. This should be a time of performing some number of them, not just one. E.g. just put boost::timer in there. – Adam Wulkiewicz Apr 09 '14 at 23:39
  • 3
    Look at my [fork of the test code](https://gist.github.com/weidenrinde/89e835611721f0c1d907). Here you find that the bounding box actually makes a difference. – Weidenrinde Sep 10 '14 at 19:58
5

The manual documents some model::ring class that I thought at first might fit for a circle, but it is actually more a kind of a piece-wise line (a polygon). Is that assumption correct ?

I think that's correct.

I noticed that you can define a unary-predicate, but is is... unary (thus, not suitable for a condition on two points).

Would the 'second' (or reference) point not be fixed? Because then you can use a simple bind expression to supply the reference point.


Additionally you can use the KNN algorithm with a very large n and add some kind of breaking condition on the predicate:

Breaking or pausing the query

for ( Rtree::const_query_iterator it = tree.qbegin(bgi::nearest(pt, 10000)) ;
      it != tree.qend() ; ++it )
{
    // do something with value
    if ( has_enough_nearest_values() )
        break;
}

This could work pretty well, assuming that the algorithm already traverses the points in order of ascending distance (you will want to check that assumption of course).

sehe
  • 328,274
  • 43
  • 416
  • 565
  • Thanks for that answer, actually you are right, I mismatched things about the predicate, a unary predicate with proper binding should do the job, I will test this ASAP. Your second proposal is interesting too, but I don't think it applies because the points density is not (at all...) homogeneous, and some request will return 0 while others would returns a very large number of points. – kebs Apr 07 '14 at 12:33
  • 1
    @kebs I meant that you can use a standard NNK and cut it after the distance threshold. This assumes that points are visited in nearest-to-farthest order (which might be true, in a spatial index, but might not - I know little about this part of the library) – sehe Apr 07 '14 at 12:41
  • Unfortunatly, this latter point (order of visiting points) cannot be guaranteed, so it could "miss" some points. Thanks anyway for answering. – kebs Apr 08 '14 at 06:57
  • @cv_and_he Interesting code too, I suggest you add it as an answer ? – kebs Apr 08 '14 at 07:00
  • @kebs I was planning to, but it's basically what sehe proposes in this answer, so I decided to simply add the code in case it helps. – llonesmiz Apr 08 '14 at 07:27
  • 1
    Have in mind that kNN query would be slower since more things are done internally than in the case of a spatial query. – Adam Wulkiewicz Apr 09 '14 at 10:07
  • 1
    In the case of nearest query iterators it's guaranteed to get the closest points first. In the case of the query() function you may assume that the order of the output is random (though it's not entirely true). – Adam Wulkiewicz Apr 11 '14 at 00:09
  • @AdamWulkiewicz thanks for these additions, Adam. It's what I expected. Is there a place where you found this in the documentation (or did you discover this by analyzing the code)? – sehe Apr 11 '14 at 06:34
  • 4
    @sehe I wrote it. I'll put some additional info in the docs. Btw, Boost is now available on GitHub so if you think that something should be added/extended feel free to fork the repo and create a pull request. Docs are in the QuickBook format, e.g. the chapter about queries can be found here: https://github.com/boostorg/geometry/blob/develop/doc/index/rtree/query.qbk If you wanted to build the docs by yourself and had problems you might contact us on the mailing list: http://lists.boost.org/mailman/listinfo.cgi/geometry – Adam Wulkiewicz Apr 11 '14 at 11:12
  • @AdamWulkiewicz I guess that counts as "I analyzed the code" xD Thanks again! – sehe Apr 11 '14 at 11:47
  • @AdamWulkiewicz thank you for your comments, I didn't expect my question would raise so much interest ;-) – kebs Apr 11 '14 at 14:59