27

This is something I had not noticed until today. Apparently, the .NET implementation of the much used tuple classes (Tuple<T>, Tuple<T1, T2> etc) causes boxing penalties for value types when equality based operations are performed.

Here is how the class is kind of implemented in the framework (source via ILSpy):

public class Tuple<T1, T2> : IStructuralEquatable 
{
    public T1 Item1 { get; private set; }
    public T2 Item2 { get; private set; }

    public Tuple(T1 item1, T2 item2)
    {
        this.Item1 = item1;
        this.Item2 = item2;
    }

    public override bool Equals(object obj)
    {
        return this.Equals(obj, EqualityComparer<object>.Default);
    }

    public override int GetHashCode()
    {
        return this.GetHashCode(EqualityComparer<object>.Default);
    }

    public bool Equals(object obj, IEqualityComparer comparer)
    {
        if (obj == null)
        {
            return false;
        }

        var tuple = obj as Tuple<T1, T2>;
        return tuple != null 
            && comparer.Equals(this.Item1, tuple.Item1) 
            && comparer.Equals(this.Item2, tuple.Item2);
    }

    public int GetHashCode(IEqualityComparer comparer)
    {
        int h1 = comparer.GetHashCode(this.Item1);
        int h2 = comparer.GetHashCode(this.Item2);

        return (h1 << 5) + h1 ^ h2;
    }
}

The problem I see is it causes a two stage boxing-unboxing, say for Equals calls, one, at the comparer.Equals which boxes the item, two, the EqualityComparer<object> calls the non-generic Equals which in turn will internally have to unbox the item to orginal type.

Instead why wouldn't they do something like:

public override bool Equals(object obj)
{
    var tuple = obj as Tuple<T1, T2>;
    return tuple != null
        && EqualityComparer<T1>.Default.Equals(this.Item1, tuple.Item1)
        && EqualityComparer<T2>.Default.Equals(this.Item2, tuple.Item2);
}

public override int GetHashCode()
{
    int h1 = EqualityComparer<T1>.Default.GetHashCode(this.Item1);
    int h2 = EqualityComparer<T2>.Default.GetHashCode(this.Item2);

    return (h1 << 5) + h1 ^ h2;
}

public bool Equals(object obj, IEqualityComparer comparer)
{
    var tuple = obj as Tuple<T1, T2>;
    return tuple != null
        && comparer.Equals(this.Item1, tuple.Item1)
        && comparer.Equals(this.Item2, tuple.Item2);
}

public int GetHashCode(IEqualityComparer comparer)
{
    int h1 = comparer.GetHashCode(this.Item1);
    int h2 = comparer.GetHashCode(this.Item2);

    return (h1 << 5) + h1 ^ h2;
}

I was surprised to see equality implemented this way in .NET tuple class. I was using tuple type as a key in one of the dictionaries.

Is there any reason why this has to be implemented as shown in the first code? Its a bit discouraging to make use of this class in that case.

I dont think code refactoring and non-duplicating data should have been the major concerns. The same non-generic/boxing implementation has gone behind IStructuralComparable too, but since IStructuralComparable.CompareTo is less used its not a problem often.


I benchmarked the above two approaches with a third approach which is still less taxing, like this (only the essentials):

public override bool Equals(object obj)
{
    return this.Equals(obj, EqualityComparer<T1>.Default, EqualityComparer<T2>.Default);
}

public bool Equals(object obj, IEqualityComparer comparer)
{
    return this.Equals(obj, comparer, comparer);
}

private bool Equals(object obj, IEqualityComparer comparer1, IEqualityComparer comparer2)
{
    var tuple = obj as Tuple<T1, T2>;
    return tuple != null
        && comparer1.Equals(this.Item1, tuple.Item1)
        && comparer2.Equals(this.Item2, tuple.Item2);
} 

for a couple of Tuple<DateTime, DateTime> fields a 1000000 Equals calls. This is the result:

1st approach (original .NET implementation) - 310 ms

2nd approach - 60 ms

3rd approach - 130 ms

The default implementation is about 4-5 times slower than the optimal solution.

Community
  • 1
  • 1
nawfal
  • 62,042
  • 48
  • 302
  • 339
  • 1
    I don't see any reason why `IEqualityComparer` was used, but I'm not saying there is none. But you can still make it a little better: http://pastebin.com/tNA2FYjq – MarcinJuraszek Jan 13 '14 at 05:49
  • 1
    @MarcinJuraszek How is that better? `Tuple` implements `IStructuralEquatable` which has these definitions `bool Equals(object other, IEqualityComparer comparer); int GetHashCode(IEqualityComparer comparer);` – nawfal Jan 13 '14 at 05:54
  • 1
    Do a test on your second example where the `Tuple` types are different. Personally if you are using a tuple for a key, it is likely a candidate for creating a properly named class, with a definitive equality implementation. This would improve the readability. Also, can you provide the code you used to test the performance? I've encountered poor hash-code related performance issues like this in the past, and especially on `struct` keys, providing a custom type with equality implemented, or a custom equality comparer drastically improves performance as you've seen. – Adam Houldsworth Jan 14 '14 at 10:20
  • @AdamHouldsworth that's a great point, but I dont think it would make a difference since in my tests I have run it once already before benchmarking (which means all those static comparers are already loaded). Yes I would post the benchmarking later, but I hope its of no use since the boxing case is evident in the code itself. Hmmm, I have used it in a trivial case where it (tuple) formed the composite key of a multikey dictionary. It's in the callee side, so no way have a major effect on readability. But yes rolling one myself would be the best option. – nawfal Jan 14 '14 at 10:30
  • 1
    @nawfal We use a lot of dictionaries internally as a poor-man's cache on our server, we recently purged poorly defined composite keys (currently was our own implementation of `Tuple` on .NET 2.0) and reviewed all cases where composite dictionaries were used. We landed on two approaches. 1) A composite key in the dictionary, made from a custom class or struct with custom equality on the type itself, or on the dictionary. 2) A composite dictionary structure of nested dictionaries. Creates a lot of dictionaries depending on the key breadth, but very fast access. Performance jumped. – Adam Houldsworth Jan 14 '14 at 10:33
  • @AdamHouldsworth that's a little unbelievable, but the possibilities I see are, either you have pitted the nested dictionary approach against a poorly implemented composite key, say like a .NET tuple, or majority of the test inputs were absent in the dictionary (which means a full hash code computation wasn't necessary since intermediate dictionary lookup will by pass them). In positive cases, ie when data is present, I expect the performance to drop. – nawfal Jan 14 '14 at 10:41
  • @nawfal Which bit is unbelievable? Our original tuple implementation didn't have a very good hash code implementation and wasn't very readable when you wanted to know what the key was. The nested dictionaries approach works brilliantly for adding and retrieving keys, but is expensive on the amount of dictionaries it creates if you are not careful. It is surprising, but I did performance testing on some cases prior to migrating them. Other cases were using a `struct` which has a strange custom implementation of hash-code, providing a custom comparer for that improved performance a lot also. – Adam Houldsworth Jan 14 '14 at 10:47
  • 1
    @nawfal The difference between a class and struct custom key was negligible and depended on certain things like struct size and end usage, but what was commonly important was providing a good equality implementation. In answer to your question, the `Tuple` default code might be less than optimal, but unless the person who coded it / approved the code turns up, you are unlikely to understand the drive behind the resulting implementation. – Adam Houldsworth Jan 14 '14 at 10:50
  • @AdamHouldsworth That's surprising. I mean nested dictionary pattern beating a lateral composite key structure is surprising, lil' unbelievable. I will get back after some performance testing. All I can guesstimate now is that your composite key must not have had the best equals/hash code implementation. I cant imagine at all how even addition is more performant in nested dictionary case.. – nawfal Jan 14 '14 at 10:54
  • @nawfal It depends a lot on a large number of things, such as resulting dictionary size. Our general case was: take a composite key of 2 items; item 1 only has 5 distinct values meaning a nested dictionary structure of 5 + 1 dictionaries if item 1 is the first key. It benefited a lot when item 2 had many values. Fortunately our structure is this way in many key places. Yes, our first implementation (being many many years old) was very likely sub-optimal. Unfortunately it never appeared on hot tracks while tracing because in isolation it is "fast", collectively they had a non-trivial impact. – Adam Houldsworth Jan 14 '14 at 10:58

1 Answers1

13

You wondered if it 'has to' be implemented that way. In short, I would say no: there are many functionally equivalent implementations.

But why does the existing implementation make such explicit usage of EqualityComparer<object>.Default? It may just be a case of the person who wrote this mentally optimizing for the 'wrong', or at least different thing than your scenario of speed in an inner loop. Depending on their benchmark it may appear be the 'right' thing.

But what benchmark scenario could lead them to make that choice? Well the optimization they have targeted seems to be to optimize for the minimum number of EqualityComparer class template instantiations. They might likely choose this because template instantiation comes with memory or load-time costs. If so, we can guess their benchmark scenario could have been based on app-startup-time or memory usage rather than some tight looping scenario.

Here is one knowledge point to support the theory (found by using confirmation bias :) - EqualityComparer implementations method bodies cannot be shared if T is a struct. Excerpted from http://blogs.microsoft.co.il/sasha/2012/09/18/runtime-representation-of-genericspart-2/

When the CLR needs to create an instance of a closed generic type, such as List, it creates a method table and EEClass based on the open type. As always, the method table contains method pointers, which are compiled on the fly by the JIT compiler. However, there is a crucial optimization here: compiled method bodies on closed generic types that have reference type parameters can be shared. [...] The same idea does not work for value types. For example, when T is long, the assignment statement items[size] = item requires a different instruction, because 8 bytes must be copied instead of 4. Even larger value types may even require more than one instruction; and so on.

Tim Lovell-Smith
  • 13,077
  • 11
  • 67
  • 89