Nested generic syntax ambiguity >>

Question

Apparently, C# is as susceptible to '>>' lexer dilemma as is C++.

This C# code is pretty valid, it compiles and runs just fine:

var List = new Dummy("List");
var Nullable = new Dummy("Nullable");
var Guid = new Dummy("Guid");

var x = List<Nullable<Guid>> 10;
var y =  List<Nullable<Guid>> .Equals(10,20);

You'd have to overload '<' and '>>' operators for the Dummy class above.

But the compiler manages to guess that in 'x' case the meaning is to use List, Nullable and Guid local variables. And in 'y' case it suddenly decides to treat them as names of well-known types.

Here's a bit more detailed description with another example: http://mihailik.blogspot.co.uk/2012/05/nested-generics-c-can-be-stinky.html

The question is: how does C# compiler resolve 'a<b<c>>' to arithmetic expression or generic type/method?

Surely it doesn't try to have multiple 'goes' over the text of the program until it succeeds, or does it? That would require unbounded look-ahead, and a very complex too.

"pick one meaning over another?" ... What do you mean? There is a single, clear meaning here.... The only lack of clarity is due to variable name choices. — Reed Copsey, May 16 '12 at 00:12
Meaning of 'a>' is so heavily context-dependent. It's unusual for programming languages to allow that. Well, actually the main question is how the compiler decides whether it's arithmetic shift or part of generic type specification. — Oleg Mihailik, May 16 '12 at 00:15
If you're crazy enough to overload '>' you should expect what you get. — Eric Dahlvang, May 16 '12 at 00:22
It's not about me being crazy enough to use the legitimate features of the language, it's more about how does the compiler resolves the riddle. You and me look at the syntax and apply a common sense, but the compiler needs a clear rule, it doesn't have a common sense. — Oleg Mihailik, May 16 '12 at 00:26
Well, I guess it tries both interpretations... if one fails to give anything meaningful, it tries the other. — Thomas Levesque, May 16 '12 at 00:34
In that case to succeed or fail of parsing here, it needs to parse Dummy class first. But then we might be able to construct the liar paradox: the meaning of one chunk of code would depend on another and vice versa, rendering the whole unparseable. — Oleg Mihailik, May 16 '12 at 01:00
@OlegMihailik In my answer I show you that the C# grammar is responsible for choosing the right option, and I explain you why. When the expression tree is built, it's macthed against the grammar in a univoque way. — JotaBe, May 16 '12 at 12:54
This might provide some helpful information on the internals of the c# compiler: http://blogs.msdn.com/b/ericlippert/archive/2010/02/04/how-many-passes.aspx — Eric Dahlvang, May 16 '12 at 13:38

score 7 · Accepted Answer · edited Jun 20 '20 at 09:12

I've been directed to the paragraph 7.6.4.2 in C# language spec:

http://download.microsoft.com/download/0/B/D/0BDA894F-2CCD-4C2C-B5A7-4EB1171962E5/CSharp%20Language%20Specification.htm

The productions for simple-name (§7.6.2) and member-access (§7.6.4) can give rise to ambiguities in the grammar for expressions.

...

If a sequence of tokens can be parsed (in context) as a simple-name (§7.6.2), member-access (§7.6.4), or pointer-member-access (§18.5.2) ending with a type-argument-list (§4.4.1), the token immediately following the closing > token is examined. If it is one of

( ) ] } : ; , . ? == != | ^

then the type-argument-list is retained as part of the simple-name, member-access or pointer-member-access and any other possible parse of the sequence of tokens is discarded. Otherwise, the type-argument-list is not considered to be part of the simple-name, member-access or pointer-member-access, even if there is no other possible parse of the sequence of tokens. Note that these rules are not applied when parsing a type-argument-list in a namespace-or-type-name (§3.8).

So, there may indeed an ambiguity arise when type-argument-list is involved, and they've got a cheap way to resolve it, by looking one token ahead.

It's still an unbound look ahead, because there might be a megabyte worth of comments between '>>' and following token, but at least the rule is more or less clear. And most importantly there is no need for speculative deep parsing.

JotaBe · Answer 2 · 2012-05-16T12:49:34.623

-2

EDIT: I insist there's no ambiguity: In your example there's no ambiguity at all. This could never be evaluated as a List<Guid?>. The context (the extra 10) shows the compiler how to interpret it.

var x = List<Nullable<Guid>> 10;

Would the compiler compile this?:

var x = List<Guid?> 10;

It's clear it wouldn't. So Im'm still looking for the ambiguity.

OTOH, the second expression:

var y =  List<Nullable<Guid>> .Equals(10,20);

has to be evaluated as a List<Guid?>, because you're invoking the .Equals method. Again, this can be interpreted in any other way.

There is no paradox at all. The compiler parses it perfectly. I'm still wondering which the apradox is.

You're on a big mistake. The compiler interprets whole expressions, and uses the language grammar to understand them. It doesn't look at a fragment of code, like you're doing, without taking the rest of the expression into account.

These expressions are parsed according to C# grammar. And the grammar is clear enough to interpret the code correctly. I.e. in

var x = List<Nullable<Guid>> 10;

It's clear that 10 is a literal. If you follow up the grammar you'll find this: 10 is a *literal, so it's *primary-no-array-creation-expression, which is a *primary-expression, which is a *unary-expression, which is a *multiplicative-expression, which is an *additive-expression. If you look for an additive expression on the right side of a *>>, you find it must be a *shift-expression, so the left side of the *>> must be interpreted as an *additive-expression and so on.

If you were able to find a different way to use the grammar and get a different result for the same expression, then, I would have to agree with you, but let me disagree!

Finally:

is very confusing for humans
is absolutely clear and unambiguous to the compiler

Because:

we humans identify patterns taking fragments of the whole text which are familiar to us, like List<Nullable<Guid>> and interpret them like we want
compilers doesn't interptret the code like us, taking familiar fragments like List<Nullable<Guid>>. They take the whole expression and match it to the language grammar.

edited May 16 '12 at 12:49

answered May 16 '12 at 01:20

JotaBe

34,736
7
85
109

I am saying it compiles because it does. This very code, I compiled and ran it in Visual Studio 2010. I know it doesn't sound plausible and that's why it's such a disturbing thing. The Dummy class is there following that link, I skipped it for the brevity here. – Oleg Mihailik May 16 '12 at 07:25
I've tried the code, and `var x = List < Nullable < Guid >> 10;` gives the error 'Struct name is nto valid at this point' referring to Nullable. I'd really like to see it working, but I cannot. Can you show the whole code, including the "using" part? I can't see it in your blog either. – JotaBe May 16 '12 at 12:12
Mmmm... too strange. It was Rehsarper which showed up the error. But VS compiles it. – JotaBe May 16 '12 at 12:19
@OlegMihailik: I apologize for thinking it couldn't be compiled. And challenging you to do it. In fact, Resharper also misinterprets the code and shows errors on it, but VS compiles it. (Also VS 2008). – JotaBe May 16 '12 at 12:57
1

No problem at all, if we didn't make mistakes we wouldn't need StackOverflow :-) – Oleg Mihailik May 16 '12 at 18:54
The meaning you deduce is by trial and error parsing of the expression under different interpretations. That is normal for human language, where ambiguity is a part of the code. Compilers cannot go back and forth trying to reinterpret things. They cannot do it not because it's impossible, but because it doesn't scale. The parsing is left-to-right, only a very limited fallback (or rather look-ahead) to re-parse being practically possible. – Oleg Mihailik May 16 '12 at 19:00
Do you know for sure that C# compiler doesn't have backtracking abilities? I have not been able to find its implementation, but I dare say it can backtrack. When parsing from left to right it can find to possible paths. If it gets to the end of a path and the found code isn't correct, it can go back and try the other path. You must consider that today computers have enough power to do that. It could even walk both paths in parallel until it finds the correct one. continues... – JotaBe May 16 '12 at 23:47
Try this: type `var z = List`. Put the mouse pointer on List: it's Dummy. Type the `. Now it's a generic List again... mmm... the compiler isn't simply going left to right and taking the "firs guess" but, in some way, perhaps backtracking, it's changing its decision. Although my answer is simplistic, if backtracking is applied, my answer explains the compiler behaviour and the lack of ambiguity. This isn'tan old compiler, but a modern, hi-tech one. – JotaBe May 16 '12 at 23:51
That's not a compiler, that's VS intellisense. Intellisense is much more powerful and cunning, because it needs to return reasonably good results for broken syntax. – Oleg Mihailik May 17 '12 at 09:48
Mainstream compilers do quite limited look ahead. Most importantly, the tokenization does not rely on meaning of the tokens deduced by the later phases of parsing, name binding etc. In this specific case, as you can see from the spec they use a trivial character-based rule. – Oleg Mihailik May 17 '12 at 09:51
Have you checked how trivial the compiler is in this reference posted by Mehrdad: http://blogs.msdn.com/b/ericlippert/archive/2010/02/04/how-many-passes.aspx ? Just a singel pass from left to right... Do you really think that Intellisense uses a different compiler engine? Would anyone build two different compilers for the same thing? – JotaBe May 17 '12 at 10:55

Nested generic syntax ambiguity >>

2 Answers2