3

I'm filtering a list by grouping on two parameters and selecting the most recent type in the sub-grouplist based on createdate (using first()). This eliminates the duplicates on the x.application and x.externalid properties.

var list = ((List<SomeType>)xDic)
            .GroupBy(x => new {x.Application, x.ExternalID})
            .OrderByDescending(z => z.First().CreateDate)
            .Select(y => y.First()).ToList();

What I am having trouble with is defining yet another combination of properties (x.application and x.externaldisplayid) to filter with and group by to take the first one.

To summarize, I need to get a unique List of SomeTypes by filtering out any duplicates based on the ((x.application/x.externalid) OR (x.application/x.externaldisplayid)) combinations.

Example set:
{ "extID": 1234, "extDspID" : 111, "App" : "Test", "CreateDate": 2/01/2015}
{ "extID": 1234, "extDspID" : 5, "App" : "Test", "CreateDate": 1/01/2015}
{ "extID": 012, "extDspID" : 90, "App" : "Mono", "CreateDate": 6/06/2015}
{ "extID": 999, "extDspID" : 78, "App" : "Epic", "CreateDate": 8/08/2015}
{ "extID": 333, "extDspID" : 78, "App" : "Epic", "CreateDate": 8/12/2015}
{ "extID": 345, "extDspID" : 33, "App" : "Test", "CreateDate": 2/01/2015}
{ "extID": 666, "extDspID" : 33, "App" : "Test", "CreateDate": 1/01/2015}

desired result:
{ "extID": 1234, "extDspID" : 111, "App" : "Test", "CreateDate": 2/01/2015}
{ "extID": 012, "extDspID" : 90, "App" : "Mono", "CreateDate": 6/06/2015}
{ "extID": 333, "extDspID" : 78, "App" : "Epic", "CreateDate": 8/12/2015}
{ "extID": 345, "extDspID" : 33, "App" : "Test", "CreateDate": 2/01/2015}
Ben Smith
  • 16,081
  • 5
  • 55
  • 79
melmack
  • 85
  • 7
  • I am trying to understand your question. Can you provide the definition of your class (`SomeType`)? – Yacoub Massad Nov 10 '15 at 23:20
  • Looks like duplicate of http://stackoverflow.com/questions/11811110/select-distinct-by-two-properties-in-a-list (but not completely sure) – Alexei Levenkov Nov 10 '15 at 23:22
  • `public class SomeType { public string ExternalDisplayId { get; set; } public string ExternalID { get; set; } public string Application { get; set; } public DateTime CreateDate { get; set; } }` – melmack Nov 10 '15 at 23:27
  • Is "Distinct" in your future? My situation does differ, but this might give you an idea: List ItemDescs = _itemsForMonthYearList.Select(x => x.ItemDescription).Distinct().ToList(); – B. Clay Shannon Nov 10 '15 at 23:32
  • @AlexeiLevenkov - I am achieving the same result as a distictby(). The tricky part is defining an additional property set to define the uniqueness and filter on it further. – melmack Nov 10 '15 at 23:32
  • 1
    @B.ClayShannon - I have to account for the most recent createdate dimension, so the distinct doesn't address my problem fully. – melmack Nov 10 '15 at 23:34
  • 2
    Can you explain exactly what does `((x.application/x.externalid) OR (x.application/x.externaldisplayid)) combinations` mean? – Yacoub Massad Nov 10 '15 at 23:38
  • Also, I don't understand what does `.OrderByDescending(z => z.First().CreateDate)` do? Why are you choosing some random item from the group? Is `xDic` already sorted? – Yacoub Massad Nov 10 '15 at 23:39
  • @YacoubMassad - I need to filter out all duplicates in the List based on a combination of properties: Delete all type duplicates that have the same values for it's Application and ExternalID properties. Secondly, Delete all remaining type duplicates that have the same values for it's Application and ExternalDisplayID properties. – melmack Nov 10 '15 at 23:40
  • What about ordering? – Yacoub Massad Nov 10 '15 at 23:42
  • @YacoubMassad I need to keep the object - in the event of duplicates - that has the most recent createdate. – melmack Nov 10 '15 at 23:44
  • @YacoubMassad - I find duplicates based on a combination(2) of properties. Externalid and externaldisplayid will have different values. I have to find matched combinations with those two properties in each scenario. (application and externalid ) or (application and externaldisplayid) – melmack Nov 10 '15 at 23:48
  • 1
    I suspect that the fact you can't explain what you want is the problem :) Are all 3 of these duplicates {A1,id1,disp1}, {A1,id1,disp2}, {A1,id2,disp2}? – Alexei Levenkov Nov 10 '15 at 23:55
  • There are way too many similar questions to this on SO. – ataravati Nov 11 '15 at 00:10
  • @AlexeiLevenkov - { "extID": 1234, "extDID" : 111, "App" : "Test", "CreateDate": 2/1/2015} { "extID": 1234, "extDID" : 5, "App" : "Test", "CreateDate": 1/1/2015} { "extID": 012, "extDID" : 90, "App" : "Mono", "CreateDate": 6/6/2015} { "extID": 999, "extDID" : 78, "App" : "Epic", "CreateDate": 8/8/2015} { "extID": 333, "extDID" : 78, "App" : "Epic", "CreateDate": 8/12/2015} result: { "extID": 1234, "extDID" : 111, "App" : "Test", "CreateDate": 2/1/2015} { "extID": 012, "extDID" : 90, "App" : "Mono", "CreateDate": 6/6/2015} { "extID": 333, "extDID" : 78, "App" : "Epic", "CreateDate": 8/12/2015} – melmack Nov 11 '15 at 00:58
  • sorry for the formatting. copy paste and carriage return on each "}" :) – melmack Nov 11 '15 at 00:59
  • Can you please edit your post showing an example of duplicates? – Alexei Levenkov Nov 11 '15 at 01:02
  • @ataravati - please direct me in the way to fashion my search terms in order to find appropriate search results and not waste any of your precious time. – melmack Nov 11 '15 at 01:04
  • Are you sure the third item in the desired result set should be { "extID": 333, "extDspID" : 78, "App" : "Epic", "CreateDate": 8/12/2015}? – Ben Smith Nov 11 '15 at 02:14
  • @BenSmith - yes since it has the most recent createdate out of the two: { "extID": 999, "extDspID" : 78, "App" : "Epic", "CreateDate": 8/08/2015} { "extID": 333, "extDspID" : 78, "App" : "Epic", "CreateDate": 8/12/2015} – melmack Nov 11 '15 at 02:17

2 Answers2

3

First, declare two equality compareres to specify your two conditions like this:

public class MyEqualityComparer1 : IEqualityComparer<SomeType>
{
    public bool Equals(SomeType x, SomeType y)
    {
        return x.Application == y.Application && x.ExternalID == y.ExternalID;
    }

    public int GetHashCode(SomeType obj)
    {
        return (obj.Application + obj.ExternalID).GetHashCode();
    }
}

public class MyEqualityComparer2 : IEqualityComparer<SomeType>
{
    public bool Equals(SomeType x, SomeType y)
    {
        return x.Application == y.Application && x.ExternalDisplayId == y.ExternalDisplayId;
    }

    public int GetHashCode(SomeType obj)
    {
        return (obj.Application + obj.ExternalDisplayId).GetHashCode();
    }
}

Then, order your list by CreatedDate and then use Distinct to filter your list like this:

var result = xDic
    .OrderByDescending(x => x.CreateDate)
    .Distinct(new MyEqualityComparer1())
    .Distinct(new MyEqualityComparer2());

The Distinct method should remove the later items, so we should be able to depend on the fact that we used OrderByDescending to make sure that Distinct will remove items with the less recent CreatedTime.

However, since the documentation of Distinct do not guarantee this, you can use a custom distinct method like this:

public static class Extensions
{
    public static IEnumerable<T> OrderedDistinct<T>(this IEnumerable<T> enumerable, IEqualityComparer<T> comparer)
    {
        HashSet<T> hash_set = new HashSet<T>(comparer);

        foreach(var item in enumerable)
            if (hash_set.Add(item))
                yield return item;
    }
}

And use it like this:

var result = xDic
    .OrderByDescending(x => x.CreateDate)
    .OrderedDistinct(new MyEqualityComparer1())
    .OrderedDistinct(new MyEqualityComparer2());
Community
  • 1
  • 1
Yacoub Massad
  • 26,006
  • 2
  • 31
  • 56
  • Thanks for the response @Yacoub. I will try it as soon as I get the chance. – melmack Nov 11 '15 at 01:01
  • Interesting - does `Distinct(x => {x.p1}).Distinct(x=>{x.p2})` give the same result as `Distinct(x => {x.p2}).Distinct(x=>{x.p1})` (which should be true for what OP is looking for). – Alexei Levenkov Nov 11 '15 at 01:08
  • 1
    @AlexeiLevenkov, Yes. All duplicates are removed and this is guaranteed by invoking the two `Distinct` operation (regardless of the order of invocation). The only thing that remains is to make sure that we always delete the least recent item. Since we order the list, we are sure that any item that is removed by any of the `Distinct` operations has a similar item earlier in the sequence with a more recent `CreatedDate`. – Yacoub Massad Nov 11 '15 at 01:26
0

The current accepted answer will not sort your "SomeType" objects correctly and so won't produce your desired result set.

I've implemented a solution here:

https://dotnetfiddle.net/qBkIXo

I too based my solution on Distinct (see MSDN documentation here). The way I generate the hash is based on this neat approach which uses an anonymous type e.g.

public int GetHashCode(SomeType sometype)
{
 //Calculate the hash code for the SomeType.
 return new { sometype.Application, sometype.ExternalID }.GetHashCode();
}

To achieve the correct desired results a combination of grouping, ordering and use of distinct needs to be applied e.g.

    var noduplicates = products.GroupBy(x => new {x.Application, x.ExternalDisplayId})
        .Select(y => y.OrderByDescending(x => x.CreateDate).First())
        .ToList()
        .Distinct(new ApplicationExternalDisplayIdComparer())
        .GroupBy(x => new {x.Application, x.ExternalID})
        .Select(y => y.OrderByDescending(x => x.CreateDate).First())
        .ToList()
        .Distinct(new ApplicationExternalIDComparer());

As you'll see in the fiddle output, this gives the results you are expecting.

Community
  • 1
  • 1
Ben Smith
  • 16,081
  • 5
  • 55
  • 79
  • Why do you think that my answer does not sort the objects correctly? Can you please explain? – Yacoub Massad Nov 12 '15 at 15:30
  • Hi Yacoub. I implemented the solution in the same way as you, but I found that simply ordering the dates prior to executing the distinct statements wont give the OP's desired results. Try creating a .NET fiddle of your answer and use the OP's example data in his question, and you'll see that what you have wont give his expected results. – Ben Smith Nov 12 '15 at 16:50
  • Hello Ben. [Here](https://dotnetfiddle.net/xSkjaS) it is. It gives the same results. – Yacoub Massad Nov 12 '15 at 17:19
  • I thought the OP wants the results in the exact same order he specified in his question. To achieve that ordering you need to group, order by date and then apply distinct. I too had your result set which your fiddle gives but I thought the OP wanted the specified ordering; my answer gives the results in that order. – Ben Smith Nov 12 '15 at 17:24
  • I see. It doesn't seem to me that OP has such requirement. – Yacoub Massad Nov 12 '15 at 17:25
  • Regarding ordering, it seems to me that OP requirement is just to make sure that we don't remove the item with the most recent `CreatedDate`. – Yacoub Massad Nov 12 '15 at 17:27
  • I'm pretty sure he does as that's why he's also grouping and ordering his data. We'll find out when he gives us an update. Whatever, it's been an interesting question! – Ben Smith Nov 12 '15 at 17:27
  • @YacoubMassad is right. I didn't have the requirement to have the filtered set to be ordered. – melmack Nov 12 '15 at 20:46
  • @melmack You say you didn't have a requirement, so why were you grouping and ordering in your question? Your desired result had a distinct ordering which can only be achieved if you grouped and ordered like in my answer i.e. the desired result set is not simply ordered by date descending. – Ben Smith Nov 13 '15 at 17:35
  • @BenSmith - the main requirement was to remove duplicates. The secondary requirement was to only include the record that is the most recent - of each duplicate. – melmack Nov 15 '15 at 04:41