1740

When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:

foreach (var s in strings)
{
   query = query.Where(i => i.Prop == s); // access to modified closure
   ...
}

Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.

As explained here, this happens because the s variable declared in foreach loop above is translated like this in the compiler:

string s;
while (enumerator.MoveNext())
{
   s = enumerator.Current;
   ...
}

instead of like this:

while (enumerator.MoveNext())
{
   string s;
   s = enumerator.Current;
   ...
}

As pointed out here, there are no performance advantages to declaring a variable outside the loop, and under normal circumstances the only reason I can think of for doing this is if you plan to use the variable outside the scope of the loop:

string s;
while (enumerator.MoveNext())
{
   s = enumerator.Current;
   ...
}
var finalString = s;

However variables defined in a foreach loop cannot be used outside the loop:

foreach(string s in strings)
{
}
var finalString = s; // won't work: you're outside the scope.

So the compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.

Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable, or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?

Community
  • 1
  • 1
StriplingWarrior
  • 135,113
  • 24
  • 223
  • 283
  • 4
    What's wrong with `String s; foreach (s in strings) { ... }`? – Brad Christie Jan 17 '12 at 17:27
  • 6
    @BradChristie the OP is not really talking about `foreach`but about lamda expressions resulting in similar code as shown by the OP... – Yahia Jan 17 '12 at 17:30
  • 22
    @BradChristie: Does that compile? (_Error: Type and identifier are both required in a foreach statement_ for me) – Austin Salonen Jan 17 '12 at 17:32
  • I think it is important to note that the ASM generated for the two cases you propose here will probably be equal; the reference to string 's' will occupy 1 slot of the stack no matter which of the two scopes it is declared in. The question is if the compiler (or JIT'ter?) figures out to reuse the stack slot in the first case better than in the second case. – jakobbotsch Jan 17 '12 at 17:44
  • 32
    @JakobBotschNielsen: It's a closed-over outer local of a lambda; why are you assuming that it is going to be on the stack at all? It's lifetime is *longer than the stack frame*! – Eric Lippert Jan 21 '12 at 04:10
  • 4
    @EricLippert : I'm confused. I understand that lambda captures a reference to the foreach variable (which is internally declared *outside* the loop) and therefore you end up comparing against its final value; that I get. What I don't understand is how declaring the variable *inside* the loop will make any difference at all. From a compiler-writer point of view I am only allocating one string reference (var 's') on the stack regardless of whether the declaration is inside or outside the loop; I certainly wouldn't want to push a new reference onto the stack every iteration! – Anthony Sep 18 '13 at 20:44
  • 1
    @EricLippert : (continued) So is this treated as a special case where the compiler sees that what would normally be a local variable 's' is captured by a Lambda expression and therefore 'lifts' it off the stack and creating a new variable on the heap during each iteration? This is the only way I can see this working... – Anthony Sep 18 '13 at 20:47
  • What exactly is closure ? – Aditya Bokade Dec 23 '15 at 09:22
  • 1
    @AdityaBokade: See http://programmers.stackexchange.com/questions/40454/what-is-a-closure and http://stackoverflow.com/questions/36636/what-is-a-closure – StriplingWarrior Dec 23 '15 at 23:15
  • 2
    @Anthony: Since the number of variables in a method is known at compile-time, the execution stack space allocated for the method is constant--you're not pushing and popping values as the method advances. Plus, there are compiler optimizations that make it [not](https://stackoverflow.com/a/2388644/120955) [matter](https://stackoverflow.com/a/7383090/120955) where the variable is declared if it's not captured by a closure. The creation of a closure copies a reference of the variable to the heap. The variable's scope will determine *which* reference is copied. – StriplingWarrior Jun 29 '17 at 15:01

4 Answers4

1442

The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.

Your criticism is entirely justified.

I discuss this problem in detail here:

Closing over the loop variable considered harmful

Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?

The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.

I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.

The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.

Eric Lippert
  • 612,321
  • 166
  • 1,175
  • 2,033
  • So, no chance of a syntax to close on values? (Of course, that has a "no-one would use it" issue, because the syntax to close on variables is more natural and makes no difference 90% of the time) – Random832 Jan 17 '12 at 18:38
  • 7
    @Random832: Unlikely. We are, however, considering for Roslyn adding a static analyzer that determines if the closed-over variable is ever written to after the construction of the closure; if it is not, then we could close over the value rather than the variable. – Eric Lippert Jan 17 '12 at 19:13
  • 6
    Actually, there is an indirect reference in the 1.x spec; if you look at the definite assignment rules, it gives an example of the compiler interpretation, and IIRC it is declared **inside** the loop. This is indirect and circumstantial, though. Not an explicit spec statement. – Marc Gravell Jan 17 '12 at 19:13
  • 185
    We did in fact push back on this change in C# 3 and C# 4. When we designed C# 3 we did realize that the problem (which already existed in C# 2) was going to get worse because there would be so many lambdas (and query comprehensions, which are lambdas in disguise) in foreach loops thanks to LINQ. I regret that we waited for the problem to get sufficiently bad to warrant fixing it so late, rather than fixing it in C# 3. – Eric Lippert Jan 17 '12 at 19:21
  • 81
    And now we will have to remember `foreach` is 'safe' but `for` is not. – leppie Jan 18 '12 at 05:58
193

What you are asking is thoroughly covered by Eric Lippert in his blog post Closing over the loop variable considered harmful and its sequel.

For me, the most convincing argument is that having new variable in each iteration would be inconsistent with for(;;) style loop. Would you expect to have a new int i in each iteration of for (int i = 0; i < 10; i++)?

The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:

foreach (var s in strings)
{
    var s_for_closure = s;
    query = query.Where(i => i.Prop == s_for_closure); // access to modified closure

My blog post about this issue: Closure over foreach variable in C#.

Krizz
  • 10,788
  • 1
  • 26
  • 42
  • 19
    Ultimately, what people actually _want_ when they write this isn't to have multiple variables, it's to close over the _value_. And it's hard to think of a usable syntax for that in the general case. – Random832 Jan 17 '12 at 17:44
106

Having been bitten by this, I have a habit of including locally defined variables in the innermost scope which I use to transfer to any closure. In your example:

foreach (var s in strings)
    query = query.Where(i => i.Prop == s); // access to modified closure

I do:

foreach (var s in strings)
{
    string search = s;
    query = query.Where(i => i.Prop == search); // New definition ensures unique per iteration.
}        

Once you have that habit, you can avoid it in the very rare case you actually intended to bind to the outer scopes. To be honest, I don't think I have ever done so.

Godeke
  • 15,383
  • 3
  • 56
  • 83
  • 25
    That is the typical workaround Thanks for the contribution. Resharper is smart enough to recognize this pattern and bring it to your attention too, which is nice. I haven't been bit by this pattern in a while, but since it is, in Eric Lippert's words, "the single most common incorrect bug report we get," I was curious to know the *why* more than the *how to avoid it*. – StriplingWarrior Jan 17 '12 at 17:53
65

In C# 5.0, this problem is fixed and you can close over loop variables and get the results you expect.

The language specification says:

8.8.4 The foreach statement

(...)

A foreach statement of the form

foreach (V v in x) embedded-statement

is then expanded to:

{
  E e = ((C)(x)).GetEnumerator();
  try {
      while (e.MoveNext()) {
          V v = (V)(T)e.Current;
          embedded-statement
      }
  }
  finally {
      … // Dispose e
  }
}

(...)

The placement of v inside the while loop is important for how it is captured by any anonymous function occurring in the embedded-statement. For example:

int[] values = { 7, 9, 13 };
Action f = null;
foreach (var value in values)
{
    if (f == null) f = () => Console.WriteLine("First value: " + value);
}
f();

If v was declared outside of the while loop, it would be shared among all iterations, and its value after the for loop would be the final value, 13, which is what the invocation of f would print. Instead, because each iteration has its own variable v, the one captured by f in the first iteration will continue to hold the value 7, which is what will be printed. (Note: earlier versions of C# declared v outside of the while loop.)

Paolo Moretti
  • 47,973
  • 21
  • 95
  • 89
  • 1
    Why this early version of C# declared v inside of the while loop?http://msdn.microsoft.com/en-GB/library/aa664754.aspx – colinfang Apr 29 '13 at 16:00
  • 5
    @colinfang Be sure to read [Eric's answer](http://stackoverflow.com/a/8899347/63011): The C# 1.0 specification (_in your link we are talking about VS 2003, i.e. C# 1.2_) actually **did not say** whether the loop variable was inside or outside the loop body, as it make no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop. – Paolo Moretti Apr 29 '13 at 18:35
  • 1
    So you are saying that the examples in the link were not definitive spec at that time? – colinfang Apr 29 '13 at 18:47
  • 5
    @colinfang They were definitive specifications. The problem is that we are talking about a feature (i.e. function closures) that was introduced later (with C# 2.0). When C# 2.0 came up they decided to put the loop variable outside the loop. And than they changed their mind again with C# 5.0 :) – Paolo Moretti Apr 29 '13 at 22:01