3

While there are a few cases where I'll write something using the method chains (especially if it's just one or two methods, like foo.Where(..).ToArray()), in many cases I prefer the LINQ query comprehension syntax instead ("query expressions" in the spec), so something like:

var query =
    from filePath in Directory.GetFiles(directoryPath)
    let fileName = Path.GetFileName(filePath)
    let baseFileName = fileName.Split(' ', '_').First()
    group filePath by baseFileName into fileGroup
    select new
    {
        BaseFileName = fileGroup.Key,
        Count = fileGroup.Count(),
    };

In some fairly sizable chunk of those, I need to take the resulting IEnumerable and eager-load it into a data structure (array, list, whatever). This usually means either:

  1. adding another local variable like var queryResult = query.ToArray(); or

  2. wrapping the query with parens and tagging on ToArray (or ToList or whatever).

var query = (
    from filePath in Directory.GetFiles(directoryPath)
    let fileName = Path.GetFileName(filePath)
    let baseFileName = fileName.Split(' ', '_').First()
    group filePath by baseFileName into fileGroup
    select new
    {
        BaseFileName = fileGroup.Key,
        Count = fileGroup.Count(),
    }
).ToArray();

I'm trying to find out what options others are either 1) already using or 2) could think of as feasible to have some additional "contextual keywords" added - just things that would transform to extension methods the same way the existing ones do, as if the LINQ keywords were 'natively' extensible :)

I realize that most likely this is going to mean either some sort of preprocessing (not sure what's out there in this realm for C#) or changing the compiler used to something like Nemerle (I think it would be an option, but not really sure?). I don't know enough about what Roslyn does/will support yet, so if someone knows whether it could allow someone to 'extend' C# like this, please chime in!

The ones I'd likely use most (although I'm sure there are many others, but just to get across the idea / what i'm hoping for):

ascount - transforms to Count()

int zFileCount =
    from filePath in Directory.GetFiles(directoryPath)
    where filePath.StartsWith("z")
    select filePath ascount;

This would "transform" (doesn't matter what the path is, as long as the end result is) into:

int zFileCount = (
    from filePath in Directory.GetFiles(directoryPath)
    where filePath.StartsWith("z")
    select filePath
).Count();

Similarly:

  • asarray - transforms to ToArray()
  • aslist - transforms to ToList()

(you could obviously keep going for First(), Single(), Any(), etc, but trying to keep question scope in check :)

I'm only interested in the extension methods that don't need parameters passed. I'm not looking for trying to do this kind of thing with (for instance) ToDictionary or ToLookup. :)

So, in summary:

  • want to add 'ascount', 'aslist', and 'asarray' into linq query expressions
  • don't know if this has already been solved
  • don't know if Nemerle is a good choice for this
  • don't know if the Roslyn story would support this kind of use
John Saunders
  • 157,405
  • 24
  • 229
  • 388
James Manning
  • 13,141
  • 2
  • 37
  • 63
  • 3
    Roslyn does *not* allow you to extend the C# language! See [Eric Lippert's blog](http://blogs.msdn.com/b/ericlippert/archive/2011/10/19/the-roslyn-preview-is-now-available.aspx). – user703016 Jan 07 '12 at 23:10
  • 1
    @Cicada: It would be more accurate to say that adding language extensions is not the *purpose* of Roslyn; the purpose of Roslyn is to expose the compiler's lexical, syntactic and semantic analysis engines to users. If they find some crazy way to use those engines to extend the language, good for them. But we are not designing Roslyn as a mechanism for adding language extensions; we are designing it as a mechanism for users to consume the same analysis engines the compiler and IDE teams consume. – Eric Lippert Jan 08 '12 at 15:49
  • Personally I'd separate your query from the counting: `var query = from [...];` `int count = query.Count();` I think it looks not as ugly as wrapping the whole query and moreover you can reuse the query if you need it for other things beside counting in the future. – ordag Jan 08 '12 at 16:49
  • Nemerle can compile C# code plus you can use Nemerle's metaprogramming features on top of C# code. – emperon Jan 11 '12 at 11:28
  • @emperon - thanks! That's awesome. :) – James Manning Jan 11 '12 at 17:43

3 Answers3

13

Not an answer to your question, but rather some musings on your design. We strongly considered adding such a feature to C# 4 but cut it because we did not have the time and resources available.

The problem with the query comprehension syntax is, as you note, that it is ugly to mix the "fluent" and "comprehension" syntaxes. You want to know how many different last names your customers have in London and you end up writing this ugly thing with parentheses:

d = (from c in customers 
     where c.City == "London" 
     select c.LastName)
    .Distinct()
    .Count();

Yuck.

We considered adding a new contextual keyword to the comprehension syntax. Let's say for the sake of argument that the keyword is "with". You could then say:

d = from c in customers 
    where c.City == "London" 
    select c.LastName
    with Distinct() 
    with Count();

and the query comprehension rewriter would rewrite that into the appropriate fluent syntax.

I really like this feature but it did not make the cut for C# 4 or 5. It would be nice to get it into a hypothetical future version of the language.

As always, Eric's musing about hypothetical features of unannounced products that might never exist are for entertainment purposes only.

Eric Lippert
  • 612,321
  • 166
  • 1,175
  • 2,033
  • Thanks for the info! In terms of potentially using Roslyn for adding this kind of keyword support, I realize it isn't a 'supported' operation as-is, but one thing you had posted about was that the Roslyn data structures needed to support the concepts of broken code. This made me wonder if Roslyn could treat an 'asarray' keyword on the end as a certain kind of broken code that the consumer code (whatever is invoking Roslyn) could then find and 'fix' the code such that it would do a compile-time 'rewrite' of the expression to (...).ToArray(), then let Roslyn continue along like normal? – James Manning Jan 11 '12 at 17:46
  • 1
    @JamesManning: The Roslyn parser will preserve the broken code in the grammatical analysis that it gives you, but we make no guarantees as to the fitness of that analysis for any particular purpose you might have for it. For example, when faced with "var q = from a in b select c garbage;" the syntactic analyzer would be perfectly within its rights to say there's a semicolon missing after c, insert a virtual semicolon there and then start analyzing "garbage;" as a new statement, and then give you an additional error that "garbage;" is not a legal expression statement. – Eric Lippert Jan 11 '12 at 18:04
  • 4
    @JamesManning: The broken code analyzer could choose some completely other interpretation of that code as well; it could instead decide that what is missing is the "into" that could legally follow the "select". Analyzing broken code is more art than science and we have a lot of wacky heuristics that can give unexpected results. – Eric Lippert Jan 11 '12 at 18:09
  • 1
    Years later (as seems the case with LINQ and C# a lot) it would be nice if the `with` keyword also removed the need for the function call parentheses. – NetMage Mar 21 '18 at 21:47
3

On idea is that you could write your own query provider that wraps the version in System.Linq and then calls ToArray in its Select method. Then you would just have a using YourNamespace; instead of using System.Linq.

Roslyn does not allow you to extend the syntax of C#, but you can write a SyntaxRewriter that changes the semantics of a C# program as a rebuild step.

Kevin Pilch
  • 11,179
  • 1
  • 36
  • 39
  • Calling `ToArray()` in `Select()` wouldn't work in general, because `Select()` isn't called if the final projection is identity. – svick Jan 08 '12 at 03:30
2

As others said, Roslyn is not what you probably think it is. It can't be used to extend C#.

All of the following code should be considered more brainstorming and less recommendation. It changes how LINQ behaves in unexpected ways and you should think really hard before using anything like it.

One way to solve this would be to modify the select clause like this:

int count = from i in Enumerable.Range(0, 10)
            where i % 2 == 0
            select new Count();

The implementation could look like this:

public  class Count
{}

public static class LinqExtensions
{
    public static int Select<T>(
        this IEnumerable<T> source, Func<T, Count> selector)
    {
        return source.Count();
    }
}

If you put anything that isn't Count in the select, it would behave as usual.

Doing something similar for arrays would take more work, since you need the select to specify both that you want an array and the selector of items you want in there, but it's doable. Or you could use two selects: one chooses the item and the other says you want an array.

Another option (similar to Kevin's suggestion) would be to have extension method like AsCount() which you could use like this:

int count = from i in Enumerable.Range(0, 10).AsCount()
            where i % 2 == 0
            select i;

You could implement it like this:

public static class LinqExtensions
{
    public static Countable<T> AsCount<T>(this IEnumerable<T> source)
    {
        return new Countable<T>(source);
    }
}

public class Countable<T>
{
    private readonly IEnumerable<T> m_source;

    public Countable(IEnumerable<T> source)
    {
        m_source = source;
    }

    public Countable<T> Where(Func<T, bool> predicate)
    {
        return new Countable<T>(m_source.Where(predicate));
    }

    public Countable<TResult> Select<TResult>(Func<T, TResult> selector)
    {
        return new Countable<TResult>(m_source.Select(selector));
    }

    // other LINQ methods

    public static implicit operator int(Countable<T> countable)
    {
        return countable.m_source.Count();
    }
}

I'm not sure I like it this way. Especially the implicit cast feels wrong, but I think there is no other way.

svick
  • 214,528
  • 47
  • 357
  • 477