18

I've been looking for a way to store and retrieve values on more than the single key that C#'s generic Dictionary class provides.

Searching around the web (and on SO itself) has shown me a couple options:

Tuple Based Dictionaries

.NET 4.0 makes it easy to support a generic Tuple<,> class. This means you can make a Dictionary out of any arbitrary Tuple, i.e.,

  • var myDict = new Dictionary<Tuple<Char, Int>, MyClass>();

Nested Dictionaries

I've learned you can also nest Dictionaries within Dictionaries, which makes accessing the stored result similar to accessing an N-Dimensional array. For instance:

Dictionary<int, Dictionary<int, Dictionary<Char, MyClass>>>

which could then be accsessed by: MyClass foo = MyData[8][3]['W'];

Delimited Concatenated Key Dictionaries

But while both work well for complex data and custom classes, I wonder if they're always necessary. For primitive data, at least, it would seem that concatenating the keys with a delimiter is just as effective.

//keys are char + int
Dictionary<string, MyClass> myDict = New Dictionary<string, Myclass>();
String input = myChar + "|" + myInt
MyClass foo = myDict[input]

Are there any scenarios which make one of these methods superior to the other? Will they have similar performance times? Or should the focus be instead on which method provides the cleanest, easiest to maintain, code?

Thoughts?

Community
  • 1
  • 1
Raven Dreamer
  • 6,600
  • 13
  • 61
  • 96
  • 1
    You assume that everything that you would use as a key is easily convertable to a short string representation of that object. – Servy Aug 10 '12 at 20:47
  • Always go for the clean, maintainable code first. Then look at performance tweaks if prformance is an issue. – Sam Axe Aug 10 '12 at 20:48
  • This question is does not lend itself to a specific answer. http://stackoverflow.com/faq#dontask That being said I'd use multi-dictionaries as it's easier to pull out the keys if you need them. – Thinking Sites Aug 10 '12 at 20:48
  • 1
    @Servy Hence, "For primitive data, at least,..." :) – Raven Dreamer Aug 10 '12 at 21:16
  • This is very similar to [tuples-or-arrays-as-dictionary-keys-in-c-sharp](http://stackoverflow.com/questions/955982/tuples-or-arrays-as-dictionary-keys-in-c-sharp) – nawfal Jan 13 '14 at 12:14
  • char(31) works well as a delimiter in my experience for anyone reading. It is fairly unusual to occur (ymmv). – Simon P May 18 '16 at 00:24

4 Answers4

15

Delimited Concatenated Key Dictionaries

There are at least three reasons why I would avoid this approach:

  • It is magic. There is nothing in the type of the key that tells you how to construct it or what it represents.
  • If the delimiter accidentally appears as one of the values, your approach fails.
  • Conversion to strings, and comparison of these strings is likely to be (slightly) slower than using two primitive types.

Nested Dictionaries

This solves the problem with the delimiter, but introduce some new problems:

  • Insertion of new values is difficult because for each nested level you have to check whether that key already exists. If not, you would need to create a new dictionary as the value. This makes using the dictionary more difficult.
  • There will be a further memory and performance overhead.

Tuple Based Dictionaries

Of the approaches you posted, this is probably the best.

But you could take it one step further and create a named immutable struct for your key. This will make your dictionary easier to use because the parts of the key can have useful names.

Mark Byers
  • 719,658
  • 164
  • 1,497
  • 1,412
  • 14
    I worked at a company that used pipe separated values. One day, we got a customer called Acme || (yes, they thought using two pipes rather than II would be cool). – Eric J. Aug 10 '12 at 20:48
  • How is using a `struct` different than using a class (like the Tuple)? – Raven Dreamer Aug 10 '12 at 21:17
  • 3
    @RavenDreamer: The struct is not the important thing in "named immutable struct". There are two important things. a) It has a name. b) It is immutable. The less important thing is c) That it's a struct, which will give you a sensible `Equals` and `GetHashCode` for free, but there are also situations where a class is better. – Mark Byers Aug 10 '12 at 21:24
  • @RavenDreamer if you dont require the dictionary to be a class level member, then anonymous type is also an option as key. `var dict = new object[0].ToDictionary(x => new { char, int }, x => MyClass)` – nawfal Jan 12 '14 at 14:44
  • 1
    @MarkByers the free `Equals` and `GetHashCode` on `struct` use reflection internally and are the worst performing dictionary keys. When creating a `struct` for use as a key, always override this methods. – andrewjsaid Jan 21 '16 at 16:06
  • Note that with the advent of value tuples, it is possible to have meaningful names with the built-in tuple type as well. A named struct is fine as well, but if I were to create a simple named struct in today's .NET, I'd back the data with a value tuple and delegate properties, `Equals()`, and `GetHashCode()` to it. – Peter Duniho Aug 14 '19 at 21:33
5

Or should the focus be instead on which method provides the cleanest, easiest to maintain, code?

Unless your focus is on writing nightmarish, intimidating code you should be avoiding the string delimiting and concatenating approach which is evil which goes without saying.

Choosing between tuple and nested dictionaries based approaches depends on your context. Tweak for performance? Or tweak for readability? I will talk about latter first.

From maintainability point of view,

  • Its much easier to implement a functionality that looks like:

    var myDict = new Dictionary<Tuple<char, int>, MyClass>();
    

    than

    var myDict = new Dictionary<char, Dictionary<int, MyClass>>();
    

    from the callee side. In the second case each addition, lookup, removal etc require action on more than one dictionary.

  • Furthermore, if your composite key require one more (or less) field in future, you will need to change code a significant lot in the second case (nested dictionary) since you have to add further nested dictionaries and subsequent checks.

From performance perspective, the best conclusion you can reach is by measuring it yourself. But there are a few theoretical limitations which you can consider beforehand:

  • In the nested dictionary case, having an additional dictionary for every keys (outer and inner) will have some memory overhead (more than what creating a tuple would have).

  • In the nested dictionary case, every basic action like addition, updation, lookup, removal etc need to be carried out in two dictionaries. Now there is a case where nested dictionary approach can be faster, ie, when the data being looked up is absent, since the intermediate dictionaries can bypass the full hash code computation & comparison, but then again it should be timed to be sure. In presence of data, it should be slower since lookups should be performed twice (or thrice depending on nesting).

  • Regarding tuple approach, .NET tuples are not the most performant when they're meant to be used as keys in sets since its Equals and GetHashCode implementation causes boxing for value types.

On the whole, I find very little need for nested dictionary approach. Odds are one would not want it. I would prefer tuple based approach, but you should write one your own tuple with a better implementation, and in this one case of char and int keys, I prefer making it a (immutable) struct.

A very related question: Tuples( or arrays ) as Dictionary keys in C#

Community
  • 1
  • 1
nawfal
  • 62,042
  • 48
  • 302
  • 339
3

I wanted to add to the above answers, that there are some scenarios (depends on how the data is distributed) in which a nested dictionary is much better than a composite key dictionary in terms of memory footprint (which in turn might lead to better performance overall). The reason for that is that the nesting can save you the need of saving duplicated values for the keys, which in large dictionaries will make the footprint of the extra dictionaries negligible.

For example, say I need a dictionary with a composite key of (male/female),(baby/young/old),(age).

Let's save some values with a dictionary of composite keys:

(male, baby, 1)
(male, baby, 2)
(male, baby, 3)
(male, young, 21)
(male, young, 22)
(male, young, 23)
(male, old, 91)
(male, old, 92)
(male, old, 93)
(female, baby, 1)
(female, baby, 2)
(female, baby, 3)
(female, young, 21)
(female, young, 22)
(female, young, 23)
(female, old, 91)
(female, old, 92)
(female, old, 93)

Now let's save the same values in a dictionary of dictionaries:

male -> baby ->  1
                 2
                 3
        young -> 21
                 22
                 23
        old  ->  91
                 92
                 93
female -> baby ->1
                 2
                 3
        young -> 21
                 22
                 23
        old  ->  91
                 92
                 93

In the composite key approach I save a copy of "male" and "female" 9 times, as opposed to a single copy in the dictionary of dictionaries. In fact, I saved 54 items vs 26 items, getting twice the memory footprint. The example also helps to visualize the difference, see how much "empty" space there is in the second sample compared to the first, those are all values that we didn't need to save.

And for those which are still not convinced, here's a sample test:

    Dictionary<Tuple<int, int, int>, int> map1 = new Dictionary<Tuple<int, int, int>, int>();
    Dictionary<int, Dictionary<int, Dictionary<int, int>>> map2 = new Dictionary<int, Dictionary<int, Dictionary<int, int>>>();

    public void SizeTest()
    {
        for (int x = 0; x < 30; x++)
        {
            for (int y = 0; y < 100; y++)
            {
                for (int z = 0; z < 600; z++)
                {
                    addToMap1(x, y, z, 0);
                    addToMap2(x, y, z, 0);
                }
            }
        }
        int size1 = GetObjectSize(map1);
        int size2 = GetObjectSize(map2);

        Console.WriteLine(size1);
        Console.WriteLine(size2);
    }

    private void addToMap1(int x, int y, int z, int value)
    {
        map1.Add(new Tuple<int, int, int>(x, y, z), value);
    }

    private void addToMap2(int x, int y, int z, int value)
    {
        map2.GetOrAdd(x, _ => new Dictionary<int, Dictionary<int, int>>())
            .GetOrAdd(y, _ => new Dictionary<int, int>())
            .GetOrAdd(z, _ => value);
    }

    private int GetObjectSize(object TestObject)
    {
        BinaryFormatter bf = new BinaryFormatter();
        MemoryStream ms = new MemoryStream();
        byte[] Array;
        bf.Serialize(ms, TestObject);
        Array = ms.ToArray();
        return Array.Length;
    }

    public static TResult GetOrAdd<TKey, TResult>(this Dictionary<TKey, TResult> map, TKey key, Func<TKey, TResult> addIfMissing)
    {
        TResult result;
        if (!map.TryGetValue(key, out result))
        {
            result = addIfMissing(key);
            map[key] = result;
        }
        return result;
    }

This test returns ~30MB vs ~70MB in favor of the dictionary of dictionaries.

tzachs
  • 2,826
  • 24
  • 25
  • This is a useful comparison, but I wanted to point out that this can depend on how sparse the data you are storing is. In the scenario that I'm looking at, 99% of the keys are going to be unique. Only a very small number of the keys for a given `x` are going to have more than one value for `y`. In this case the overhead of the extra dictionary outweighs the extra storage used by duplicate keys. – Ben Randall Jun 22 '18 at 15:05
2

All of the options you have described are fairly similar - as for performance, you would need to test each for your specific usage scenarios, but for small collections they are unlikely to have much of a difference.

They also all suffer from readability - it is difficult to construct them and to tease out meaning out of the types.

Instead, it is better to create a type that directly describes the data - good naming goes a long way.

Oded
  • 463,167
  • 92
  • 837
  • 979