781

Right, so I have an enumerable and wish to get distinct values from it.

Using System.Linq, there's, of course, an extension method called Distinct. In the simple case, it can be used with no parameters, like:

var distinctValues = myStringList.Distinct();

Well and good, but if I have an enumerable of objects for which I need to specify equality, the only available overload is:

var distinctValues = myCustomerList.Distinct(someEqualityComparer);

The equality comparer argument must be an instance of IEqualityComparer<T>. I can do this, of course, but it's somewhat verbose and, well, cludgy.

What I would have expected is an overload that would take a lambda, say a Func<T, T, bool>:

var distinctValues = myCustomerList.Distinct((c1, c2) => c1.CustomerId == c2.CustomerId);

Anyone know if some such extension exists, or some equivalent workaround? Or am I missing something?

Alternatively, is there a way of specifying an IEqualityComparer inline (embarrass me)?

Update

I found a reply by Anders Hejlsberg to a post in an MSDN forum on this subject. He says:

The problem you're going to run into is that when two objects compare equal they must have the same GetHashCode return value (or else the hash table used internally by Distinct will not function correctly). We use IEqualityComparer because it packages compatible implementations of Equals and GetHashCode into a single interface.

I suppose that makes sense.

Grigory Zhadko
  • 860
  • 1
  • 12
  • 22
Tor Haugen
  • 18,547
  • 8
  • 40
  • 59
  • 2
    see http://stackoverflow.com/questions/1183403/how-to-get-distinct-instance-from-a-list-by-lamba-or-linq for a solution using GroupBy –  Sep 13 '11 at 17:47
  • 19
    Thanks for the Anders Hejlsberg update! – Tor Haugen Sep 28 '11 at 08:17
  • Nope, it doesn't make sense - how would two objects which contain identical values can return two different hash-codes?? – G.Y Nov 06 '15 at 13:15
  • It could help - [solution](http://stackoverflow.com/a/3719802/2122718) for `.Distinct(new KeyEqualityComparer(c1 => c1.CustomerId))`, and explain why GetHashCode() is important to work properly. – marbel82 Oct 12 '16 at 12:22
  • Related / possible duplicate of: [LINQ's Distinct() on a particular property](https://stackoverflow.com/q/489258/3258851) – Marc.2377 Dec 13 '18 at 01:55
  • @G.Y It make sense because equality is not absolute. One could for example regard "Hello" and "hello" as equal in a given context, and that is the whole point of being able to provide your own equality comparer: to provide a definition of equality tailored for the domain / context you are in. – AnorZaken Mar 03 '20 at 09:48

18 Answers18

1077
IEnumerable<Customer> filteredList = originalList
  .GroupBy(customer => customer.CustomerId)
  .Select(group => group.First());
Konrad Viltersten
  • 28,018
  • 52
  • 196
  • 347
Carlo Bos
  • 10,866
  • 1
  • 13
  • 2
  • 13
    Excellent! This is really easy to encapsulate in an extension method too, like `DistinctBy` (or even `Distinct`, since the signature will be unique). – Tomas Aschan Jul 27 '11 at 11:33
  • 3
    Doesn't work for me ! Even I tried 'FirstOrDefault' it didn't work. – JatSing Sep 25 '11 at 13:55
  • 68
    @TorHaugen: Just be aware that there's a cost involved in creating all those groups. This cannot stream the input, and will end up buffering all the data before returning anything. That may not be relevant for your situation of course, but I prefer the elegance of DistinctBy :) – Jon Skeet Sep 28 '11 at 11:52
  • 2
    @JonSkeet: This is good enough for VB.NET coders who do not want to import an additional libraries for just one feature. Without ASync CTP, VB.NET does not support the `yield` statement so streaming is technically not possible. Thanks for your answer though. I'll use it when coding in C#. ;-) – Alex Essilfie Dec 11 '11 at 14:46
  • @AlexEssilfie: It's technically possible, just not as easy as with language support. There's nothing to *stop* you from implementing `IEnumerable` yourself in VB... – Jon Skeet Dec 11 '11 at 17:59
  • @JatSing: You must be using Linq to Entities. – Josh Mouch Feb 14 '12 at 14:45
  • Would the performance of this be any better using the two-parameter GroupBy method as suggested here: http://stackoverflow.com/a/1183877/8479 ? – Rory Aug 08 '12 at 13:27
  • Fails if `CustomerId` is a reference type which doesn't have an implementation for value equality comparing, such as `cust.CustomerId=new[] { '1', '2', '3' }`. – Ken Kin Aug 04 '13 at 10:20
  • Inspired by your post, why not select first and then use distinct? Thinking it's less overhead. myCustomerList.Select(cust => cust.CustomerId).Distinct(); – Ben Gripka Sep 06 '14 at 18:59
  • 4
    @BenGripka: That's not quite the same. It only gives you the customer ids. I want the whole customer :) – ryanman Oct 16 '14 at 14:57
  • I'm choosing this approach over the HashSet approach, because I'm working with lists with generally less than 5 items, and almost always less than about 10, so I'd expect the overhead of hashing to be higher than the loops and comparison. – David Apr 24 '17 at 06:40
  • I prefer to use in this way using .KEY, (g => g.ProcessName).Select(s => s.Key).ToList(); – Lucas Rodrigues Sena May 02 '17 at 14:21
  • This solution doesn't answer the question, but proposes an alternative, or, at least, doesn't argument inadequateness of the question. People do not understand the difference between "what is the answer for my question" and "please, suggest an alternative". – impulsgraw Dec 14 '20 at 09:41
513

It looks to me like you want DistinctBy from MoreLINQ. You can then write:

var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);

Here's a cut-down version of DistinctBy (no nullity checking and no option to specify your own key comparer):

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
     (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> knownKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (knownKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}
Shimmy Weitzhandler
  • 92,920
  • 119
  • 388
  • 596
Jon Skeet
  • 1,261,211
  • 792
  • 8,724
  • 8,929
  • 16
    I knew the best answer would be posted by Jon Skeet simply by reading the title of the post. If its got anything to do with LINQ, Skeet's your man. Read 'C# In Depth' to attain God-like linq knowledge. – Shawn J. Molloy Jan 16 '14 at 03:03
  • 2
    great answer!!! also, for all VB_Complainers about the `yield` + extra lib, foreach can be re-written as `return source.Where(element => knownKeys.Add(keySelector(element)));` – denis morozov Mar 04 '14 at 16:51
  • 1
    I am getting Exception http://stackoverflow.com/questions/13405568/linq-unable-to-create-a-constant-value-of-type-xxx-only-primitive-types-or-enu, when using this DistinctBy in LinqToSQL query. the below code is working for me. public static IEnumerable DistinctBy(this IEnumerable list, Func propertySelector) { return list.GroupBy(propertySelector).Select(x => x.FirstOrDefault()); } – sudhAnsu63 Apr 15 '15 at 09:52
  • 6
    @sudhAnsu63 this is a limitation of LinqToSql (and other linq providers). The intent of LinqToX is to translate your C# lambda expression into the native context of X. That is, LinqToSql converts your C# into SQL and executes that command natively wherever possible. This means any method that resides in C# can't be "passed through" a linqProvider if there's no way to express it in SQL (or whatever linq provider you're using). I see this in extension methods to convert data objects to view models. You can work around this by "materializing" the query, calling ToList() before DistinctBy(). – Michael Blackburn Nov 16 '15 at 22:18
  • 1
    And whenever I come back to this question I keep on wondering why don't they adopt at least some of MoreLinq into the BCL. – Shimmy Weitzhandler May 22 '17 at 11:16
  • 2
    @Shimmy: I'd certainly welcome that... I'm not sure what the feasibility is. I can raise it in the .NET Foundation though... – Jon Skeet May 22 '17 at 11:39
  • @JonSkeet btw, I deleted my first comment. – Shimmy Weitzhandler May 22 '17 at 12:20
  • @JonSkeet this was really professional answer. Thanks – Selman Mar 17 '18 at 00:13
  • For anyone worried about VB.NET yield, it's been added (https://docs.microsoft.com/en-us/dotnet/visual-basic/language-reference/statements/yield-statement) – Brian J Jul 27 '18 at 18:00
  • 1
    Any solution for Linq to SQL? – Shimmy Weitzhandler Sep 03 '19 at 03:17
  • 2
    @Shimmy: The answer from Carlo may work in LINQ to SQL... I'm not sure. – Jon Skeet Sep 03 '19 at 05:37
42

To Wrap things up . I think most of the people which came here like me want the simplest solution possible without using any libraries and with best possible performance.

(The accepted group by method for me i think is an overkill in terms of performance. )

Here is a simple extension method using the IEqualityComparer interface which works also for null values.

Usage:

var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();

Extension Method Code

public static class LinqExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
    {
        GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
        return items.Distinct(comparer);
    }   
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
    private Func<T, TKey> expr { get; set; }
    public GeneralPropertyComparer (Func<T, TKey> expr)
    {
        this.expr = expr;
    }
    public bool Equals(T left, T right)
    {
        var leftProp = expr.Invoke(left);
        var rightProp = expr.Invoke(right);
        if (leftProp == null && rightProp == null)
            return true;
        else if (leftProp == null ^ rightProp == null)
            return false;
        else
            return leftProp.Equals(rightProp);
    }
    public int GetHashCode(T obj)
    {
        var prop = expr.Invoke(obj);
        return (prop==null)? 0:prop.GetHashCode();
    }
}
Anestis Kivranoglou
  • 6,398
  • 4
  • 38
  • 42
20

No there is no such extension method overload for this. I've found this frustrating myself in the past and as such I usually write a helper class to deal with this problem. The goal is to convert a Func<T,T,bool> to IEqualityComparer<T,T>.

Example

public class EqualityFactory {
  private sealed class Impl<T> : IEqualityComparer<T,T> {
    private Func<T,T,bool> m_del;
    private IEqualityComparer<T> m_comp;
    public Impl(Func<T,T,bool> del) { 
      m_del = del;
      m_comp = EqualityComparer<T>.Default;
    }
    public bool Equals(T left, T right) {
      return m_del(left, right);
    } 
    public int GetHashCode(T value) {
      return m_comp.GetHashCode(value);
    }
  }
  public static IEqualityComparer<T,T> Create<T>(Func<T,T,bool> del) {
    return new Impl<T>(del);
  }
}

This allows you to write the following

var distinctValues = myCustomerList
  .Distinct(EqualityFactory.Create((c1, c2) => c1.CustomerId == c2.CustomerId));
Uwe Keim
  • 36,867
  • 50
  • 163
  • 268
JaredPar
  • 673,544
  • 139
  • 1,186
  • 1,421
  • 8
    That has a nasty hash code implementation though. It's easier to create an `IEqualityComparer` from a projection: http://stackoverflow.com/questions/188120/can-i-specify-my-explicit-type-comparator-inline – Jon Skeet Aug 19 '09 at 14:05
  • 7
    (Just to explain my comment about the hash code - it's very easy with this code to end up with Equals(x, y) == true, but GetHashCode(x) != GetHashCode(y). That basically breaks anything like a hashtable.) – Jon Skeet Aug 19 '09 at 14:06
  • I agree with the hash code objection. Still, +1 for the pattern. – Tor Haugen Aug 19 '09 at 14:11
  • @Jon, yeah I agree the original implementation of GetHashcode is less than optimal (was being lazy). I switched it to essentially use now EqualityComparer.Default.GetHashcode() which is slightly more standard. Truthfully though, the only guaranteed to work GetHashcode implementation in this scenario is to simply return a constant value. Kills hashtable lookup but is guaranteed to be functionally correct. – JaredPar Aug 19 '09 at 14:13
  • 1
    @JaredPar: Exactly. The hash code has to be consistent with the equality function you're using, which presumably *isn't* the default one otherwise you wouldn't bother :) That's why I prefer to use a projection - you can get both equality and a sensible hash code that way. It also makes the calling code have less duplication. Admittedly it only works in cases where you want the same projection twice, but that's every case I've seen in practice :) – Jon Skeet Aug 19 '09 at 14:16
  • I only get this working if I replace `` by ``. Otherwise it has compilation errors. Am I missing something? – Uwe Keim Jul 10 '14 at 19:47
  • As was already mentioned in other comments, this implementation is broken because the `IEqualityComparer<>` produced uses a `GetHashCode(T)` which is incompatible with (i.e. in disagreement) with the `Equals(T, T)`. Using with `.Distinct` will not work correctly. For example, in the answer we put all instances with the same `.CustomerId` into the same equality class. But all those will not have the same hash code, So they will be "lost" in the hash table used by `Distinct`. As a result, different instances with the same `.CustomerId` will be yielded. – Jeppe Stig Nielsen Dec 18 '17 at 20:31
  • It won't work if you compare only a member of a object, the hash code will totally miss it. Better to force providing a lambda also for it, or instead use the lambda and reflection to get the member accessor – UberFace Nov 28 '18 at 14:18
18

Shorthand solution

myCustomerList.GroupBy(c => c.CustomerId, (key, c) => c.FirstOrDefault());
tdog
  • 466
  • 2
  • 17
Arasu RRK
  • 1,000
  • 15
  • 26
13

This will do what you want but I don't know about performance:

var distinctValues =
    from cust in myCustomerList
    group cust by cust.CustomerId
    into gcust
    select gcust.First();

At least it's not verbose.

Gordon Freeman
  • 147
  • 1
  • 2
12

Here's a simple extension method that does what I need...

public static class EnumerableExtensions
{
    public static IEnumerable<TKey> Distinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> selector)
    {
        return source.GroupBy(selector).Select(x => x.Key);
    }
}

It's a shame they didn't bake a distinct method like this into the framework, but hey ho.

David Kirkland
  • 2,343
  • 25
  • 27
4

Something I have used which worked well for me.

/// <summary>
/// A class to wrap the IEqualityComparer interface into matching functions for simple implementation
/// </summary>
/// <typeparam name="T">The type of object to be compared</typeparam>
public class MyIEqualityComparer<T> : IEqualityComparer<T>
{
    /// <summary>
    /// Create a new comparer based on the given Equals and GetHashCode methods
    /// </summary>
    /// <param name="equals">The method to compute equals of two T instances</param>
    /// <param name="getHashCode">The method to compute a hashcode for a T instance</param>
    public MyIEqualityComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
    {
        if (equals == null)
            throw new ArgumentNullException("equals", "Equals parameter is required for all MyIEqualityComparer instances");
        EqualsMethod = equals;
        GetHashCodeMethod = getHashCode;
    }
    /// <summary>
    /// Gets the method used to compute equals
    /// </summary>
    public Func<T, T, bool> EqualsMethod { get; private set; }
    /// <summary>
    /// Gets the method used to compute a hash code
    /// </summary>
    public Func<T, int> GetHashCodeMethod { get; private set; }

    bool IEqualityComparer<T>.Equals(T x, T y)
    {
        return EqualsMethod(x, y);
    }

    int IEqualityComparer<T>.GetHashCode(T obj)
    {
        if (GetHashCodeMethod == null)
            return obj.GetHashCode();
        return GetHashCodeMethod(obj);
    }
}
Kleinux
  • 1,441
  • 10
  • 21
  • @Mukus I'm not sure why you are asking about the class name here. I needed to name the class something in order to implement IEqualityComparer so I just prefixed the My. – Kleinux Jun 11 '15 at 14:20
4

All solutions I've seen here rely on selecting an already comparable field. If one needs to compare in a different way, though, this solution here seems to work generally, for something like:

somedoubles.Distinct(new LambdaComparer<double>((x, y) => Math.Abs(x - y) < double.Epsilon)).Count()
Dmitry Ledentsov
  • 3,490
  • 16
  • 27
3

Take another way:

var distinctValues = myCustomerList.
Select(x => x._myCaustomerProperty).Distinct();

The sequence return distinct elements compare them by property '_myCaustomerProperty' .

Bob
  • 59
  • 1
  • 1
    Came here to say this. *THIS* should be the accepted answer – Still.Tony Dec 18 '18 at 18:34
  • 8
    No, this should not be the accepted answer, unless all you want is distinct values of the custom property. The general OP question was how to return distinct *objects* based on a specific property of the object. – tomo May 27 '19 at 19:39
3

You can use LambdaEqualityComparer:

var distinctValues
    = myCustomerList.Distinct(new LambdaEqualityComparer<OurType>((c1, c2) => c1.CustomerId == c2.CustomerId));


public class LambdaEqualityComparer<T> : IEqualityComparer<T>
    {
        public LambdaEqualityComparer(Func<T, T, bool> equalsFunction)
        {
            _equalsFunction = equalsFunction;
        }

        public bool Equals(T x, T y)
        {
            return _equalsFunction(x, y);
        }

        public int GetHashCode(T obj)
        {
            return obj.GetHashCode();
        }

        private readonly Func<T, T, bool> _equalsFunction;
    }
2

You can use InlineComparer

public class InlineComparer<T> : IEqualityComparer<T>
{
    //private readonly Func<T, T, bool> equalsMethod;
    //private readonly Func<T, int> getHashCodeMethod;
    public Func<T, T, bool> EqualsMethod { get; private set; }
    public Func<T, int> GetHashCodeMethod { get; private set; }

    public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
    {
        if (equals == null) throw new ArgumentNullException("equals", "Equals parameter is required for all InlineComparer instances");
        EqualsMethod = equals;
        GetHashCodeMethod = hashCode;
    }

    public bool Equals(T x, T y)
    {
        return EqualsMethod(x, y);
    }

    public int GetHashCode(T obj)
    {
        if (GetHashCodeMethod == null) return obj.GetHashCode();
        return GetHashCodeMethod(obj);
    }
}

Usage sample:

  var comparer = new InlineComparer<DetalleLog>((i1, i2) => i1.PeticionEV == i2.PeticionEV && i1.Etiqueta == i2.Etiqueta, i => i.PeticionEV.GetHashCode() + i.Etiqueta.GetHashCode());
  var peticionesEV = listaLogs.Distinct(comparer).ToList();
  Assert.IsNotNull(peticionesEV);
  Assert.AreNotEqual(0, peticionesEV.Count);

Source: https://stackoverflow.com/a/5969691/206730
Using IEqualityComparer for Union
Can I specify my explicit type comparator inline?

Community
  • 1
  • 1
Kiquenet
  • 13,271
  • 31
  • 133
  • 232
1

A tricky way to do this is use Aggregate() extension, using a dictionary as accumulator with the key-property values as keys:

var customers = new List<Customer>();

var distincts = customers.Aggregate(new Dictionary<int, Customer>(), 
                                    (d, e) => { d[e.CustomerId] = e; return d; },
                                    d => d.Values);

And a GroupBy-style solution is using ToLookup():

var distincts = customers.ToLookup(c => c.CustomerId).Select(g => g.First());
Arturo Menchaca
  • 14,994
  • 1
  • 25
  • 48
0

I'm assuming you have an IEnumerable, and in your example delegate, you would like c1 and c2 to be referring to two elements in this list?

I believe you could achieve this with a self join var distinctResults = from c1 in myList join c2 in myList on

MattH
  • 3,867
  • 2
  • 27
  • 32
0

The Microsoft System.Interactive package has a version of Distinct that takes a key selector lambda. This is effectively the same as Jon Skeet's solution, but it may be helpful for people to know, and to check out the rest of the library.

Niall Connaughton
  • 14,009
  • 10
  • 50
  • 46
0

IEnumerable lambda extension:

public static class ListExtensions
{        
    public static IEnumerable<T> Distinct<T>(this IEnumerable<T> list, Func<T, int> hashCode)
    {
        Dictionary<int, T> hashCodeDic = new Dictionary<int, T>();

        list.ToList().ForEach(t => 
            {   
                var key = hashCode(t);
                if (!hashCodeDic.ContainsKey(key))
                    hashCodeDic.Add(key, t);
            });

        return hashCodeDic.Select(kvp => kvp.Value);
    }
}

Usage:

class Employee
{
    public string Name { get; set; }
    public int EmployeeID { get; set; }
}

//Add 5 employees to List
List<Employee> lst = new List<Employee>();

Employee e = new Employee { Name = "Shantanu", EmployeeID = 123456 };
lst.Add(e);
lst.Add(e);

Employee e1 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e1);
//Add a space in the Name
Employee e2 = new Employee { Name = "Adam  Warren", EmployeeID = 823456 };
lst.Add(e2);
//Name is different case
Employee e3 = new Employee { Name = "adam warren", EmployeeID = 823456 };
lst.Add(e3);            

//Distinct (without IEqalityComparer<T>) - Returns 4 employees
var lstDistinct1 = lst.Distinct();

//Lambda Extension - Return 2 employees
var lstDistinct = lst.Distinct(employee => employee.EmployeeID.GetHashCode() ^ employee.Name.ToUpper().Replace(" ", "").GetHashCode()); 
Quality Catalyst
  • 5,693
  • 7
  • 34
  • 57
Shantanu
  • 1
  • 1
0

Here's how you can do it:

public static class Extensions
{
    public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
                                                    Func<T, V> f, 
                                                    Func<IGrouping<V,T>,T> h=null)
    {
        if (h==null) h=(x => x.First());
        return query.GroupBy(f).Select(h);
    }
}

This method allows you to use it by specifying one parameter like .MyDistinct(d => d.Name), but it also allows you to specify a having condition as a second parameter like so:

var myQuery = (from x in _myObject select x).MyDistinct(d => d.Name,
        x => x.FirstOrDefault(y=>y.Name.Contains("1") || y.Name.Contains("2"))
        );

N.B. This would also allow you to specify other functions like for example .LastOrDefault(...) as well.


If you want to expose just the condition, you can have it even simpler by implementing it as:

public static IEnumerable<T> MyDistinct2<T, V>(this IEnumerable<T> query,
                                                Func<T, V> f,
                                                Func<T,bool> h=null
                                                )
{
    if (h == null) h = (y => true);
    return query.GroupBy(f).Select(x=>x.FirstOrDefault(h));
}

In this case, the query would just look like:

var myQuery2 = (from x in _myObject select x).MyDistinct2(d => d.Name,
                    y => y.Name.Contains("1") || y.Name.Contains("2")
                    );

N.B. Here, the expression is simpler, but note .MyDistinct2 uses .FirstOrDefault(...) implicitly.


Note: The examples above are using the following demo class

class MyObject
{
    public string Name;
    public string Code;
}

private MyObject[] _myObject = {
    new MyObject() { Name = "Test1", Code = "T"},
    new MyObject() { Name = "Test2", Code = "Q"},
    new MyObject() { Name = "Test2", Code = "T"},
    new MyObject() { Name = "Test5", Code = "Q"}
};
Matt
  • 21,449
  • 14
  • 100
  • 149
0

If Distinct() doesn't produce unique results, try this one:

var filteredWC = tblWorkCenter.GroupBy(cc => cc.WCID_I).Select(grp => grp.First()).Select(cc => new Model.WorkCenter { WCID = cc.WCID_I }).OrderBy(cc => cc.WCID); 

ObservableCollection<Model.WorkCenter> WorkCenter = new ObservableCollection<Model.WorkCenter>(filteredWC);
Jon Egerton
  • 36,729
  • 11
  • 90
  • 125