8

I have a Linq query that basically counts how many entries were created on a particular day, which is done by grouping by year, month, day. The problem is that because some days won't have any entries I need to back fill those missing "calendar days" with an entry of 0 count. My guess is that this can probably be done with a Union or something, or maybe even some simple for loop to process the records after the query.

Here is the query:

from l in context.LoginToken
 where l.CreatedOn >= start && l.CreatedOn <= finish
 group l by
 new{l.CreatedOn.Year, l.CreatedOn.Month, l.CreatedOn.Day} into groups
 orderby groups.Key.Year , groups.Key.Month , groups.Key.Day
     select new StatsDateWithCount {
                                    Count = groups.Count(),
                                     Year =  groups.Key.Year,
                                    Month = groups.Key.Month,
                                      Day = groups.Key.Day
                                                                  }));

If I have data for 12/1 - 12/4/2009 like (simplified):

12/1/2009 20
12/2/2009 15
12/4/2009 16

I want an entry with 12/3/2009 0 added by code.

I know that in general this should be done in the DB using a denormalized table that you either populate with data or join to a calendar table, but my question is how would I accomplish this in code?
Can it be done in Linq? Should it be done in Linq?

Greg Roberts
  • 2,544
  • 1
  • 20
  • 23
  • Possible duplicate of [LINQ group by date - include empty days WITHOUT using join](http://stackoverflow.com/questions/17086120/linq-group-by-date-include-empty-days-without-using-join) – Robert Synoradzki May 10 '17 at 11:56

4 Answers4

2

I just did this today. I gathered the complete data from the database and then generated a "sample empty" table. Finally, I did an outer join of the empty table with the real data and used the DefaultIfEmpty() construct to deal with knowing when a row was missing from the database to fill it in with defaults.

Here's my code:

int days = 30;

// Gather the data we have in the database, which will be incomplete for the graph (i.e. missing dates/subsystems).
var dataQuery =
    from tr in SourceDataTable
    where (DateTime.UtcNow - tr.CreatedTime).Days < 30
    group tr by new { tr.CreatedTime.Date, tr.Subsystem } into g
    orderby g.Key.Date ascending, g.Key.SubSystem ascending
    select new MyResults()
    {
        Date = g.Key.Date, 
        SubSystem = g.Key.SubSystem,
        Count = g.Count()
    };

// Generate the list of subsystems we want.
var subsystems = new[] { SubSystem.Foo, SubSystem.Bar }.AsQueryable();

// Generate the list of Dates we want.
var datetimes = new List<DateTime>();
for (int i = 0; i < days; i++)
{
    datetimes.Add(DateTime.UtcNow.AddDays(-i).Date);
}

// Generate the empty table, which is the shape of the output we want but without counts.
var emptyTableQuery =
    from dt in datetimes
    from subsys in subsystems
    select new MyResults()
    {
        Date = dt.Date, 
        SubSystem = subsys,
        Count = 0
    };

// Perform an outer join of the empty table with the real data and use the magic DefaultIfEmpty
// to handle the "there's no data from the database case".
var finalQuery =
    from e in emptyTableQuery
    join realData in dataQuery on 
        new { e.Date, e.SubSystem } equals 
        new { realData.Date, realData.SubSystem } into g
    from realDataJoin in g.DefaultIfEmpty()
    select new MyResults()
    {
        Date = e.Date,
        SubSystem = e.SubSystem,
        Count = realDataJoin == null ? 0 : realDataJoin.Count
    };

return finalQuery.OrderBy(x => x.Date).AsEnumerable();
1

Essentially what I ended up doing here is creating a list of the same type with all the dates in the range and 0 value for the count. Then union the results from my original query with this list. The major hurdle was simply creating a custom IEqualityComparer. For more details here: click here

Greg Roberts
  • 2,544
  • 1
  • 20
  • 23
1

I made a helper function which is designed to be used with anonymous types, and reused in as generic way as possible.

Let's say this is your query to get a list of orders for each date.

var orders = db.Orders
             .GroupBy(o => o.OrderDate)
             .Select(o => new 
             {
                OrderDate = o.Key,
                OrderCount = o.Count(),
                Sales = o.Sum(i => i.SubTotal)
             }
             .OrderBy(o => o.OrderDate);

For my function to work please note this list must be ordered by date. If we had a day with no sales there would be a hole in the list.

Now for the function that will fill in the blanks with a default value (instance of anonymous type).

    private static IEnumerable<T> FillInEmptyDates<T>(IEnumerable<DateTime> allDates, IEnumerable<T> sourceData, Func<T, DateTime> dateSelector, Func<DateTime, T> defaultItemFactory)
    {
        // iterate through the source collection
        var iterator = sourceData.GetEnumerator();
        iterator.MoveNext();

        // for each date in the desired list
        foreach (var desiredDate in allDates)
        {
            // check if the current item exists and is the 'desired' date
            if (iterator.Current != null && 
                dateSelector(iterator.Current) == desiredDate)
            {
                // if so then return it and move to the next item
                yield return iterator.Current;
                iterator.MoveNext();

                // if source data is now exhausted then continue
                if (iterator.Current == null)
                {
                    continue;
                }

                // ensure next item is not a duplicate 
                if (dateSelector(iterator.Current) == desiredDate)
                {
                    throw new Exception("More than one item found in source collection with date " + desiredDate);
                }
            }
            else
            {
                // if the current 'desired' item doesn't exist then
                // create a dummy item using the provided factory
                yield return defaultItemFactory(desiredDate);
            }
        }
    }

The usage is as follows:

// first you must determine your desired list of dates which must be in order
// determine this however you want    
var desiredDates = ....; 

// fill in any holes
var ordersByDate = FillInEmptyDates(desiredDates, 

                               // Source list (with holes)
                               orders, 

                               // How do we get a date from an order
                               (order) => order.OrderDate,

                               // How do we create an 'empty' item 
                               (date) => new 
                               {
                                     OrderDate = date,
                                     OrderCount = 0,
                                     Sales = 0
                               });
  • Must make sure there are no duplicates in the desired dates list
  • Both desiredDates and sourceData must be in order
  • Because the method is generic if you are using an anonymous type then the compiler will automatically tell you if your 'default' item is not the same 'shape' as a regular item.
  • Right now I include a check for duplicate items in sourceData but there is no such check in desiredDates
  • If you want to ensure the lists are ordered by date you will need to add extra code
Simon_Weaver
  • 120,240
  • 73
  • 577
  • 618
  • I think this is such a specific 'business' scenario that I think trying to squeeze it into an 'elegant' linq construct is counterproductive - but this was the second most elegant thing I could come up with – Simon_Weaver Dec 11 '16 at 01:33
0

You can generate the list of dates starting from "start" and ending at "finish", a then step by step check the number of count for each date separately

Gacek
  • 9,526
  • 9
  • 51
  • 78