How should I write a FindClosestMatching function?

Question

Of all objects matching a condition, find the one closest to a position. Supposedly a very common problem. The current code looks like this:

protected Foo FindClosestFooMatching (Vec pos, Func<Foo, bool> matches)
{
    float bestDist = float.MaxValue;
    Foo bestFoo = null;

    foreach (Foo foo in AllFoos()) {
        if (matches (foo)) {
            float dist = pos.Dist (foo.Center);
            if (dist <= bestDist) {
                bestDist = dist;
                bestFoo = foo;
            }
        }
    }
    return bestFoo;
}

I'm thinking of several ways to refactor this code, but failed to find a really nice one yet. If you've got a minute, give it a shot. :-)

Edit: Regarding Eric's questions. It's a standard 3D space with Euclidian metric (= fast). Point clustering and junk query likelihood is unknown.

Don't delete it! rather post your solution as an answer, I quite like it :) — Paolo Tedesco, Oct 23 '09 at 12:52
Oops, sorry, I erased my earlier comment. Since you're interested I won't delete. — mafu, Oct 23 '09 at 12:58
So, are you looking to rewrite the function for maximum performance? Do you want it to have minimal lines of code? Do you need a suggestion of a better datastructure than a list? Specifically, what kind of refactor are you looking for? — Juliet, Oct 23 '09 at 13:36
Evenly balanced between performance, loc and maintainability. The code provided in the OP is too long in any case. — mafu, Oct 23 '09 at 15:50
This is an extremely naive "best matching" algorithm which might not perform well in metric spaces with high dimensionality or otherwise have expensive distance metrics. It also potentially performs poorly if the likelihood of the query point being "junk" -- ie, matching no point in the data set -- is high and there are a lot of points. Can you tell us more about the metric space, its dimensionality, the distance metric, the clustering of points in that space, and the likelihood of junk queries? I'm thinking that a euclidean distance metric might be an entirely wrong approach for you. — Eric Lippert, Oct 23 '09 at 16:40

score 1 · Answer 1 · answered Oct 23 '09 at 13:07

1

return (from f in AllFoos()
        where matches(f)
        orderby f.Center.Dist(pos)
        select f).FirstOrDefault();

Or

return AllFoos().Where(matches)
                .OrderBy(f => f.Center.Dist(pos))
                .FirstOrDefault();

answered Oct 23 '09 at 13:07

Romain Verdier

12,297
7
53
77

Ah, just thought of the same ;P – mafu Oct 23 '09 at 13:13
This is a sort then pick top most, it would perform slower than the original answer O(n). Unless there is some magic under the hood of LINQ. – Bill Yang Oct 23 '09 at 18:33
No, there's no magic. If what you want is the max then use the Max extension method; that's what it's for. – Eric Lippert Oct 23 '09 at 20:52
`Min` would provide the distance, not the element that is associated with that distance. – mafu Nov 02 '09 at 11:43

score 1 · Answer 2 · answered Oct 23 '09 at 18:50

I think your solution is fast (O(n)), simple & stupid(in term of KISS) enough that even a first year CS student could understand.

One problem I can see is that you should take declaration of

float dist;

out of the loop. It may cause memory fragmentation. This is really a minimal concern since it's just a float.

The other would be changing

if (dist <= bestDist)

to

if (dist < bestDist)

to save some CPU cycles of variable assignment. But this has some side impacts because it will change the returned object from last best match to first best match.

score 0 · Accepted Answer · edited May 23 '17 at 11:48

Can't compile right now, but I hope this does the job.

var foos = AllFoos().Where (x => matches (x));
return foos.OrderBy (x => pos.Dist (x.Center)).FirstOrDefault ();

The important point to notice here is that every operation, including OrderBy, is subject to deferred execution, so the performance should be ok.

This would be a nicer looking solution:

var res = from foo in AllFoos()
    where matches (foo)
    orderby pos.Dist (foo.Center)
    select foo;
return res.FirstOrDefault ();

Edit: I found this question to be a valuable source in this issue. Since OrderBy actually orders the whole list first, it's of course slow (as Eric said, O(n log n). A better solution would be to use Aggregate or Jon Skeet's propsed MinBy extension (see above link for examples).

I actually love the MoreLinq solution, as it reduces the code to:

return World.AllNodes
    .Where (x => matches (x.T))
    .MinBy (x => pos.DistSq (x.T.Volume.Center));

Deferred execution is not magic dust that eliminates costs. It just moves costs around. This algorithm is still a worst case O(n log n) algorithm whereas the original algorithm was O(n), worst case. — Eric Lippert, Oct 23 '09 at 20:54
Indeed, I assumed orderby would not sort the whole list before providing the first element. I addressed this issue in the edit. I'm going to accept this answer (since it's the most extensive) unless the other answers are updated. — mafu, Nov 02 '09 at 12:08

How should I write a FindClosestMatching function?

3 Answers3