36

I've inherited a database that wasn't designed exactly optimally, and I need to manipulate some data. Let me give a more common analogy of the kind of thing I have to do:

Let's say we have a Student table, a StudentClass table keeping record of all the classes he attended, and a StudentTeacher table that stores all the teachers who taught this student. Yes, I know it's a dumb design and it would make more sense to store the teacher on the Class table - but that's what we're working with.

I now want to clean up the data, and I want to find all the places where a student has a teacher but no classes, or a class but no teachers. SQL thus:

select *
from StudentClass sc
full outer join StudentTeacher st on st.StudentID = sc.StudentID
where st.id is null or sc.id is null

How do you do that in Linq?

Shaul Behr
  • 33,989
  • 61
  • 233
  • 360
  • 11
    Note: this is not really a full outer join - since you want to exclude rows where the inner join succeeded. I'm just mentioning this since this is a top search result for 'full outer join linq' - so if that's what someone's looking for then the answers may not be right – Simon_Weaver Oct 30 '11 at 05:04

5 Answers5

28

I think I have the answer here, which is not as elegant as I'd hoped, but it should do the trick:

var studentIDs = StudentClasses.Select(sc => sc.StudentID)
  .Union(StudentTeachers.Select(st => st.StudentID);
  //.Distinct(); -- Distinct not necessary after Union
var q =
  from id in studentIDs
  join sc in StudentClasses on id equals sc.StudentID into jsc
  from sc in jsc.DefaultIfEmpty()
  join st in StudentTeachers on id equals st.StudentID into jst
  from st in jst.DefaultIfEmpty()
  where st == null ^ sc == null
  select new { sc, st };

You could probably squeeze these two statements into one, but I think you'd sacrifice code clarity.

Shaul Behr
  • 33,989
  • 61
  • 233
  • 360
19

Extension method:

public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(this IEnumerable<TOuter> outer, IEnumerable<TInner> inner, Func<TOuter,TKey> outerKeySelector, Func<TInner,TKey> innerKeySelector, Func<TOuter,TInner,TResult> resultSelector)
                where TInner : class
                where TOuter : class
            {
                var innerLookup = inner.ToLookup(innerKeySelector);
                var outerLookup = outer.ToLookup(outerKeySelector);

                var innerJoinItems = inner
                    .Where(innerItem => !outerLookup.Contains(innerKeySelector(innerItem)))
                    .Select(innerItem => resultSelector(null, innerItem));

                return outer
                    .SelectMany(outerItem =>
                        {
                            var innerItems = innerLookup[outerKeySelector(outerItem)];

                            return innerItems.Any() ? innerItems : new TInner[] { null };
                        }, resultSelector)
                    .Concat(innerJoinItems);
            }

Test:

[Test]
public void CanDoFullOuterJoin()
{
    var list1 = new[] {"A", "B"};
    var list2 = new[] { "B", "C" };

    list1.FullOuterJoin(list2, x => x, x => x, (x1, x2) => (x1 ?? "") + (x2 ?? ""))
         .ShouldCollectionEqual(new [] { "A", "BB", "C"} );
}
andrey.tsykunov
  • 2,758
  • 2
  • 30
  • 21
  • 1
    +1 for the concept of the extension method! I have a feeling it could be optimised internally, but a good answer nonetheless. – Shaul Behr Jan 12 '11 at 07:27
18

for the given 2 collections a and b, a required full outer join might be as following:

a.Union(b).Except(a.Intersect(b));

If a and b are not of the same type, then 2 separate left outer joins are required:

var studentsWithoutTeachers =
    from sc in studentClasses
    join st in studentTeachers on sc.StudentId equals st.StudentId into g
    from st in g.DefaultIfEmpty()
    where st == null
    select sc;
var teachersWithoutStudents =
    from st in studentTeachers
    join sc in studentClasses on st.StudentId equals sc.StudentId into g
    from sc in g.DefaultIfEmpty()
    where sc == null
    select st;

here is a one line option using Concat():

(from l in left
 join r in right on l.Id equals r.Id into g
 from r in g.DefaultIfEmpty()
 where r == null
 select new {l, r})
     .Concat(
     from r in right
     join sc in left on r.Id equals sc.Id into g
     from l in g.DefaultIfEmpty()
     where l == null
     select new {l, r});
Boris Lipschitz
  • 8,262
  • 5
  • 45
  • 56
  • It's a good semantic statement of the problem, but it doesn't help, because for that to work in Linq, a and b have to be of the same type, which is not the case here. – Shaul Behr Jan 19 '10 at 17:24
  • This is WRONG. Correct outer join statement is available here: http://msdn.microsoft.com/en-us/library/vstudio/bb397895.aspx. Null check is not necessary and another variable has to be introduced to select from group – grzegorz_p Feb 05 '13 at 14:23
  • 1
    @grzegorz_p msdn example shows LEFT Outer Join. The question was about FULL outer join – Boris Lipschitz Feb 06 '13 at 05:50
  • What about `a.Except(b).Concat(b.Except(a))` ? – tigrou Aug 13 '13 at 08:01
1

Based on Shaul's answer, but with a little streamlining:

var q =
  from id in studentIDs
  join sc in StudentClasses on id equals sc.StudentID into jsc
  join st in StudentTeachers on id equals st.StudentID into jst
  where jst.Any() ^ jsc.Any() //exclusive OR, so one must be empty

  //this will return the group with the student's teachers, and an empty group
  //   for the student's classes - 
  //   or group of classes, and empty group of teachers
  select new { classes = jsc, teachers = jst };

  //or, if you know that the non-empty group will always have only one element:
  select new { class = jsc.DefaultIfEmpty(), teacher = jst.DefaultIfEmpty() };

Note that for a full outer join, this can work, too. Leave out the where clause and use the first select above, rather than the second.

sq33G
  • 3,174
  • 1
  • 22
  • 35
1

A start...

 var q = from sc in StudentClass
            join st in StudentTeachers on sc.StudentID equals st.StudentID into g
            from st in g.DefaultIfEmpty()
            select new {StudentID = sc.StudentID, StudentIDParent = st == null ? "(no StudentTeacher)" : st.StudentID...........};

See also http://www.linqpad.net/ for more samples Good tool to play with

salgo60
  • 947
  • 5
  • 15
  • Creative, but not as elegant as I'd hoped. I'll give you +1 for the link to LinqPad, which looks like a pretty cool piece of software. :) – Shaul Behr Jan 19 '10 at 17:21
  • ;-)) You have more elegant examples in LinqPad It has a cool db connection and you have possibilities to link in your dll:s etc... The author has also written the best book C# in a nutshell http://www.youtube.com/watch?v=Z6-iUNfJsJw&feature=channel – salgo60 Jan 19 '10 at 23:24
  • 3
    Two points to note: 1) This generates a LEFT OUTER JOIN not a full outer join and 2) the st == null check is not required in linq-to-sql, instead you can just do st.StudentID ?? "(no StudentTeacher)" – Marty Neal Feb 20 '12 at 23:18
  • This would not create a full outer join as @Martin explained – Shyamal Parikh Dec 19 '15 at 12:50