Looking for a data structure that is optimized for finding the next closest element

Question

I have two classes, let's call them foo and bar, that both have a DateTime property called ReadingTime. I then have long lists of these classes, let's say foos and bars, where foos is List<foo>, bars is List<bar>.

My goal is for every element in foos to find the events in bars that happened right before and right after foo.

Some code to clarify:

var foos = new List<foo>();
var bars = new List<bar>();

...

foreach (var foo in foos)
  {
    bar before = bars.Where(b => b.ReadingTime <= foo.ReadingTime).OrderByDescending(b => b.ReadingTime).FirstOrDefault();
    bar after = bars.Where(b => b.ReadingTime > foo.ReadingTime).OrderBy(b => b.ReadingTime).FirstOrDefault();
    ...
  }

My issue here is performance. Is it possible to use some other data structure than a list to speed up the comparisons? In particular the OrderBy statement every single time seems like a huge waste, having it pre-ordered should also speed up the comparisons, right?

I just don't know what data structure is best, SortedList, SortedSet, SortedDictionary etc. there seem so many. Also all the information I find is on lookups, inserts, delets, etc., noone writes about finding the next closest element so I'm not sure if anything is optimized for that.

I'm on .net core 3.1 if that matters.

Thanks in advance!

Edit: Okay so to wrap this up: First I tried implementing @derloopkat's approach. For this I figured I needed a data type that could save the data in a sorted order so I just left it as IOrderedEnumerable (which is what linq returns). Probably not very smart, as that actually brought things to a crawl. I then tried going with SortedList. Had to remove some duplicates first which was no problem in my case. Thanks for the help @Olivier Rogier! This got me up to roughly 2x the original performance, though I suspect it's mostly the removed linq OrderBys. For now this is good enough, if/when I need more performance I'm going to go with what @CamiloTerevinto suggested. Lastly @Aldert thank you for your time but I'm too noob and under too much time pressure to understand what you suggested. Still appreciate it and might revisit this later.

Edit2: Ended up going with @CamiloTerevinto's suggestion. Cut my runtime down from 10 hours to a couple of minutes.

If the lists were ordered, knowing the index of the elements would give you a constant time for finding the previous and next items given that they'd be (index - 1) and (index + 1) — Camilo Terevinto, Sep 13 '20 at 07:48
@CamiloTerevinto my problem is a bit harder as the two lists have an unequal number of elements and the timestamps are irregular for both of them i.e. between foo[i] and foo[i+1] there might be 0, 1 or many entries of bar. But the idea behind it is not bad, i.e. moving through `bar` only once after sorting it. — R D, Sep 13 '20 at 08:08
Not only that, if `foo` is sorted, you know that the next `bar` items to take will certainly be after the last items you took, so you don't have to scan the entire list twice again — Camilo Terevinto, Sep 13 '20 at 08:12
Whether you sort both `foo` and `bar` or use a different data structure like in the answer provided, that's up to you :) We don't know enough of your system to tell you — Camilo Terevinto, Sep 13 '20 at 08:36

Olivier Rogier · Answer 1 · 2020-09-13T08:02:40.127

1

For memory performances and to have strong typing, you can use a SortedDictionary, or SortedList but it manipulates objects. Because you compare DateTime you don't need to implement comparer.

What's the difference between SortedList and SortedDictionary?

SortedList<>, SortedDictionary<> and Dictionary<>

Difference between SortedList and SortedDictionary in C#

For speed optimization you can use a double linked list where each item indicates the next and the previous items:

Doubly Linked List in C#

Linked List Implementation in C#

Using a linked list or a double linked list requires more memory because you store the next and the previous reference in a cell that embeed each instance, but you can have sometimes the most faster way to parse and compare data, as well as to search, sort, reorder, add, remove and move items, because you don't manipulate an array, but linked references.

You also can create powerfull trees and manage data in a better way than arrays.

edited Sep 13 '20 at 08:02

answered Sep 13 '20 at 07:57

Olivier Rogier

8,997
4
12
26

Thanks for the comprehensive answer. If I were to use something like `SortedList`, what is my key here? The `DateTime` column? Do I essentially copy it to use it as a key? Does this work if I have duplicate DateTimes? – R D Sep 13 '20 at 08:16
1

Yes, if you compare and sort by DateTime, use that: the datetime property of TValue is used as TKey. But no duplicates... https://stackoverflow.com/questions/5716423/c-sharp-sortable-collection-which-allows-duplicate-keys & https://stackoverflow.com/questions/11801314/equivalent-to-a-sorted-dictionary-that-allows-duplicate-keys & https://gist.github.com/Vaskivo/ce0d2f39ecbb91367aa7 & https://www.codeproject.com/Articles/274486/A-Better-Sorted-List-and-Dictionary – Olivier Rogier Sep 13 '20 at 08:23

Daniel Manta · Accepted Answer · 2020-09-13T09:20:38.357

You don't need to sort bars ascending and descending on each iteration. Order bars just once before the loop by calling .OrderBy(f => f.ReadingTime) and then use LastOrDefault() and FirstOrDefault().

foreach (var foo in foos)
{
    bar before = bars.LastOrDefault(b => b.ReadingTime <= foo.ReadingTime);
    bar after = bars.FirstOrDefault(b => b.ReadingTime > foo.ReadingTime);
    //...
}

This produces same output you get with your code and runs faster.

Aldert · Answer 3 · 2020-09-13T10:45:00.757

You can use the binary sort for quick lookup. Below the code where bars is sorted and foo is looked up. You can do yourself some reading on binary searches and enhance the code by also sorting Foos. In this case you can minimize the search range of bars...

The code generates 2 lists with 100 items. then sorts bars and does a binary search for 100 times.

using System;
using System.Collections.Generic;


namespace ConsoleApp2
{
    class BaseReading
    {
        private DateTime readingTime;

        public BaseReading(DateTime dt)
        {
            readingTime = dt;
        }

        public DateTime ReadingTime
        {
            get { return readingTime; }
            set { readingTime = value; }
        }

    }

    class Foo:BaseReading
    {
        public Foo(DateTime dt) : base(dt)
        { }
    }

    class Bar: BaseReading
    {
        public Bar(DateTime dt) : base(dt)
        { }
    }

    class ReadingTimeComparer: IComparer<BaseReading>
    {
        public int Compare(BaseReading x, BaseReading y)
        {
            return x.ReadingTime.CompareTo(y.ReadingTime);
        }
    }

    class Program
    {
        static private List<BaseReading> foos = new List<BaseReading>();
        static private List<BaseReading> bars = new List<BaseReading>();

        static private Random ran = new Random();


        static void Main(string[] args)
        {
            for (int i = 0; i< 100;i++)
            {
                
                
                foos.Add(new BaseReading(GetRandomDate()));

                bars.Add(new BaseReading(GetRandomDate()));

            }

            var rtc = new ReadingTimeComparer();

            bars.Sort(rtc);
            
            foreach (BaseReading br in foos)
            {
                int index = bars.BinarySearch(br, rtc);
            }

        }
        static DateTime GetRandomDate()
        {
            long randomTicks = ran.Next((int)(DateTime.MaxValue.Ticks >> 32));
            randomTicks = (randomTicks << 32) + ran.Next();
            return new DateTime(randomTicks);
        }

    }
}

score 0 · Answer 4 · answered Sep 21 '20 at 01:28

The only APIs available in the .NET platform for finding the next closest element, with a computational complexity better than O(N), are the List.BinarySearch and Array.BinarySearch methods:

// Returns the zero-based index of item in the sorted List<T>, if item is found;
// otherwise, a negative number that is the bitwise complement of the index of
// the next element that is larger than item or, if there is no larger element,
// the bitwise complement of Count.
public int BinarySearch (T item, IComparer<T> comparer);

These APIs are not 100% robust, because the correctness of the results depends on whether the underlying data structure is already sorted, and the platform does not check or enforce this condition. It's up to you to ensure that the list or array is sorted with the correct comparer, before attempting to BinarySearch on it.

These APIs are also cumbersome to use, because in case a direct match is not found you'll get the next largest element as a bitwise complement, which is a negative number, and you'll have to use the ~ operator to get the actual index. And then subtract one to get the closest item from the other direction.

If you don't mind adding a third-party dependency to your app, you could consider the C5 library, which contains the TreeDictionary collection, with the interesting methods below:

// Find the entry in the dictionary whose key is the predecessor of the specified key.
public bool TryPredecessor(K key, out SCG.KeyValuePair<K, V> res);

//Find the entry in the dictionary whose key is the successor of the specified key.
public bool TrySuccessor(K key, out SCG.KeyValuePair<K, V> res)

There are also the TryWeakPredecessor and TryWeakSuccessor methods available, that consider an exact match as a predecessor or successor respectively. In other words they are analogous to the <= and >= operators.

The C5 is a powerful and feature-rich library that offers lots of specialized collections, with its cons being its somewhat idiomatic API.

You should get excellent performance by any of these options.

Looking for a data structure that is optimized for finding the next closest element

4 Answers4