Does capture by reference turn into capture by value when the reference goes out of context in Javascript?

Question

The following Javascript program:

function f() {
  function g() { console.log(x); }
  let x = 0;
  g();  // prints 0
  x = 1;
  g();  // prints 1
  return g;
}

let g = f();
g();  // prints 1

outputs:

0
1
1

So it seems that g first captures x by reference (since inside f, g() prints 0 then 1 when x is rebound), which means that g closure environment looks something like {'x': x}, and then by value (since outside f, g() prints 1 when x goes out of context at the end of f body), which means that g closure environment looks something like {'x': 1}.

I am trying to relate this behaviour with C++ lambdas which provide capture by reference and by value, but contrary to Javascript, do not allow a capture by reference to outlives the scope of the reference by turning into a capture by value (instead, calling the lambda becomes undefined behaviour).

Is it a correct interpretation of Javascript captures?

If that interpretation is correct, that would explain clearly how captures of block scope variables (let) work in for loops:

let l = [];

for (let x = 0; x < 3; ++x) {
  l.push(function () { console.log(x); });
}

l[0]();  // prints 0
l[1]();  // prints 1
l[2]();  // prints 2

Pointy · Answer 1 · 2020-08-11T14:04:02.897

2

In JavaScript, there's really no difference between what happens when g() refers to variable x in an expression whether g() is called from inside f() or not. There's just one variable x, and getting to it is the same internal operation whenever the code of g() runs.

JavaScript is quite a bit different from C++; superficial cosmetic similarity can be deceptive. Also the term "capture" is seldom used (in my experience, here on Stack Overflow for example) when discussing JavaScript semantics, though the spec uses it in its thorough description what happens upon entry into a scope. The relevant word here is closure, as in "x is in the closure of g(). (I'm sloppy with terminology so someone might improve on my phrasing.)

More: note that we can modify g() to demonstrate that x can still be not only accessed to obtain its value, but also modified:

    function f() {
      function g() { console.log(x = x + 1); }
      let x = 0;
      g();  // prints 1
      x = 1;
      g();  // prints 2
      return g;
    }
    
    g = f();
    g();
    g();
    g();

The variable x continues to behave as an ordinary variable always behaves.

edited Aug 11 '20 at 14:04

answered Aug 11 '20 at 12:27

Pointy

371,531
55
528
584

1

But `x` is undefined outside `f`, so how can `g` access it? "Capture" is actually just a synonym of "binding", and "closure" is a pair of a function and its environment (functions are implemented using closures in Javascript). – Maggyero Aug 11 '20 at 12:41
Because `x` is in the lexical scope of `g()`, and when `f()` is called memory is allocated for it. That memory is not returned to the heap because `g()` "leaks out" of the call to `f()`. – Pointy Aug 11 '20 at 12:44
I'll extend the answer, hold on. It's a good idea to resist the temptation to try and find explanations for JavaScript semantics in terms of C++. – Pointy Aug 11 '20 at 12:45
To be fair the term "binding" _is_ used in the specification. @Maggyero Keep in mind the exact implementation used to create closures is left up to the browser as long as it's observably equivalent to what is in the specification. – Patrick Roberts Aug 11 '20 at 12:51
@PatrickRoberts yea "binding" is fine. And yes, my wording about allocating memory is intended to be a conceptual statement; the variable "lives" *somewhere*, and it's not necessary to worry about precisely how the mechanism works. – Pointy Aug 11 '20 at 12:53
But since every function is a closure in Javascript (a function–environment pair) and `x` is a free variable in `g`, the environment holding a *reference* to `x` is created at the first call `g()` inside `f`, not later when `f` returns, right? – Maggyero Aug 11 '20 at 12:55
It's created when `f()` is called. Forget the C++ terminology; it does not apply in any direct way. The environment available to code inside function `g()` has a variable called `x`, and that's that. – Pointy Aug 11 '20 at 12:56
@Maggyero you're forgetting that JavaScript has a garbage collector and C++ does not – Patrick Roberts Aug 11 '20 at 12:56
Are you sure that the closure is only created at the end of function `f` and not before? For instance [in Python](https://www.python.org/dev/peps/pep-0227/#implementation) the closure is created as long as there is a free variable: "The implementation for C Python uses flat closures [1]. Each def or lambda expression that is executed will create a closure if the body of the function or any contained function has free variables." – Maggyero Aug 11 '20 at 12:59
@Pointy by the way, you should update the comments in your answer, first two outputs are `1` and `2`. – Patrick Roberts Aug 11 '20 at 13:00
@Maggyero he didn't say the output was created at the end of function `f`, he said it was created when `f()` is called. – Patrick Roberts Aug 11 '20 at 13:01
1

@Maggyero the closure is created when `f()` is *called* because that's when the lexical scope of `f()` is entered at runtime. Similarly, in your `for` loop, that scope is entered on each iteration of the loop. Honestly trying to think about it in too much detail can get confusing; it's a lot easier to just believe in the magic :) – Pointy Aug 11 '20 at 13:02
So when the closure `g` is created (a pair function–environment) when the execution point reaches the definition of `g` during evaluating the call `f()`, what is the contents of `g` environment? `{'x': x}`? And what does it become after the execution reaches the end of `f`? `{'x': 2}`? Hence my question: does the capture (binding) starts *by reference* (inside `f`: `{'x': x}`) and becomes *by value* (outside `f`: `{'x': 2}`) like I think (since outside `f`, `x` does not exist anymore)? – Maggyero Aug 11 '20 at 13:08
1

Well the scope of `g()` itself has nothing in it; it's the scope of `f()` that contains the variable in this example. If `g()` also had a local variable, that would be allocated on the calls to `g()` (both inside and outside `f()`). As you can see if you run the code snippet in my answer, `x` most certainly does exist and can be used like any other variable. – Pointy Aug 11 '20 at 13:15
We are talking about closures (the function–environment pair for implementing nested functions that can capture their lexical context in lexical scoped languages), so the environment attached to a function is for binding free variables of the function, not local variables of the function—the latter are stored on the call stack, not attached to any function. `f` and `g` are closures, so they both have an environment attached to them for capturing their lexical context. `f` has no free variables so its environment is `{}`. `g` has a free variable `x` so its environment is `{'x': something}`. – Maggyero Aug 11 '20 at 13:31
1

**One more time**: C++ and JavaScript are **very different**. All JavaScript variables in all functions work exactly the same way. – Pointy Aug 11 '20 at 13:32
1

I was talking about Javascript, not C++. Do you agree that Javascript implement all functions as closures (function–environment pairs)? Almost all lexically scoped languages that allow nested functions do this. – Maggyero Aug 11 '20 at 13:35
Ah OK sorry, I read it wrong. Yes exactly, in JavaScript the "allocation" of space for variables is the same for all variables, at least conceptually and in terms of specified behavior. What the actual runtime does internally is hard to say, but unless you're actually working on a JavaScript runtime that doesn't matter. – Pointy Aug 11 '20 at 13:53
1

Was looking through the specification, looks like it does use the term "capture" by the way: https://tc39.es/ecma262/#sec-abstract-closure – Patrick Roberts Aug 11 '20 at 13:54
1

Oh cool, yea I've read through that (or an older version) a long time ago, and it was probably for a question like this :) I'll edit @PatrickRoberts – Pointy Aug 11 '20 at 14:01

score 1 · Accepted Answer · edited Aug 18 '20 at 13:36

1

In short

You are almost correct except for how it works when it goes out of scope.

More details

How are variables "captured" in JavaScript?

JavaScript uses lexical environments to determine which function uses which variable. Lexical environments are represented by environment records. In your case:

there is a global environment;
the function f() defines its lexical environment, in which x is defined, even if it is after g();
the inner function g() defines its lexical environment which is empty.

So g() uses x. Since there is no binding for x there, JavaScript looks for x in the enclosing environment. Since it is found therein, the x in g() will use the binding of x in f(). This looks like lexically scoped binding.

If later you define an x in the environment where g() is invoked, g() would still be bound to the x in f():

function f() {
  function g() { console.log(x); }
  let x = 0;
  g();  // prints 0
  x = 1;
  g();  // prints 1
  return g;
}

let x = 4;
let g = f();
g();  // prints 1 (the last known value in f before returning)

Online demo

This shows that the binding is static and will always refer to the x known in the lexical scope where g() was defined.

This excellent article explains in detail how this works, with very nice graphics. It is meant for closures (i.e. anonymous functions with their execution context) but is also applicable to normal functions.

How come that the value of a variable gone out of scope is preserved?

How to explain this very special behavior that JavaScript will always take the current value of x as long as x remains in scope (like a reference in C++) whereas it will take the last known value when x is out of scope (when an out of scope reference in C++ would be UB)? Does JavaScript copies the value into the closure when the variable deceases? No, it is simpler than that!

This has to do with garbage collection: g() is returned to an outer context. Since g() uses the x in f(), the garbage collector will realize that this x object of f() is still in use. So, as long as g() is accessible, the x in f() will be kept alive and remain accessible for its still active bindings. So no need to copy the value: the x object will just stay (unmodified).

As a proof that it is not a copy, you can study the following code. It defines a second function in the context of f() that is able to change the (same) x:

let h;

function f() {
  function g() { console.log(x); }
  h = function () { x = 27; }
  let x = 0;
  g();  // prints 0
  x = 1;
  g();  // prints 1
  x = 3;
  return g;
}

let x = 4;
let g = f();
g();  // prints 3
h();
g();  // prints 27

Online demo

Edit: Additional bonus article that explains this phenomenon, in a slightly more complex context. Interestingly it explains that this situation can lead to memory leaks if no precaution is taken.

edited Aug 18 '20 at 13:36

Maggyero

3,120
2
24
43

answered Aug 11 '20 at 21:19

Christophe

54,708
5
52
107

Awesome answer Chris! The proof that `x` is not copied is excellent. The first article conclusion is important: "All functions at the creation stage statically (lexically) captures the outer binding of their parent environment. This allows the nested function to access the outer binding **even if the parent context is wiped out from the execution stack**. This mechanism is the foundation of closures in JavaScript." So `g` can access the lexical environment of `f` through `g` outer reference. Do lexical environments live the whole program or only until they are garbage collected (no outer ref)? – Maggyero Aug 12 '20 at 14:58
@Maggyero Thank you. I’m not an expert of JS, but from what I understand, I’d make a difference between the lexical context and the bound variables: the first is needed to parse/interpret/JIT compile the blocks and make the bindings, whereas only the second seems needed for the execution itself (i.e. once all variables are bound). If this understanding is correct, only the needed variables would survive the exited scope. But I cannot think of an experiment that could verify this hypotheses. – Christophe Aug 12 '20 at 15:22
1

This excellent [article](http://dmitrysoshnikov.com/ecmascript/es5-chapter-3-1-lexical-environments-common-theory/) states: "The main difference, is that in contrast with C, ECMAScript does not remove the activation object from the memory if there is a closure. And the most important case is when this closure is some inner function which uses variables of the parent function in which it’s created, and this inner function is returned upwards to the outside." – Maggyero Aug 12 '20 at 23:01
… "That means that the activation object should be stored not in the stack itself, but rather in the heap (a dynamically allocated memory; sometimes such languages are called heap-based languages — in contrast with stack-based languages). And it is stored there until there are references from closures which use (free) variables from this activation object. Moreover, not only one activation object is saved, but if needed (in case of several nested levels) — all parent activation objects." – Maggyero Aug 12 '20 at 23:01
… "[…] A closure is a pair consisting of the function code and the environment in which the function is created. As we mentioned above, closures are invented as a solution for the “funarg problem”. Let’s recall it in order to have the complete understanding. […] An alternative way to save all free variables is to create one big environment frame which contains all, but only needed free variables collected from different enclosing environments." – Maggyero Aug 12 '20 at 23:09
… "[…] I.e. the main difference is that model of chained environment frames (used in ECMAScript) optimizes the moment of function creation, however at the identifier resolution the whole scope chain, considering all environment frames (until the needed binding will be found or the ReferenceError will be thrown), should be traversed." – Maggyero Aug 12 '20 at 23:10
… "Meanwhile the model of the single environment frame optimizes the execution (all identifiers are resolved in the nearest single frame without long scope chain lookup), however requires more complex algorithm of the function creation with parsing all inner function and determining which variables should be saved and which are not." – Maggyero Aug 12 '20 at 23:10
@Maggyero Interesting and precise! Thanks for this feedback. – Christophe Aug 12 '20 at 23:20
You are welcome. So for supporting nested functions with lexical scoping, it seems that each lexical environment with a function definition holding free variables is created on the heap instead of on the stack, and a closure is created at the function definition with a reference to the heap-allocated outer lexical environment. The outer lexical environment is finally reclaimed by the garbage collector—for garbage collected languages—when there is no reference to it. – Maggyero Aug 12 '20 at 23:46
Now what about non garbage collected languages supporting nested functions with lexical scoping like C++ (with lambdas)? The outer lexical environments of lambdas capturing their free variables by value or by reference do not seem to be automatically stored on the heap with the lambda closure keeping a reference to it, like in Javascript or Python. Since with capture by value/reference a lambda stores a copy of the value/a reference to the value of the outer lexical environment, I think one can emulate Javascript or Python behaviour by capturing by reference a manually heap-allocated value. – Maggyero Aug 13 '20 at 00:09
In other words, Javascript and Python closures can be explained in terms of C++ lambda capture by reference of heap-allocated values of the outer lexical environment (the only difference is that contrary to C++, the allocation/deallocation is automatic, thanks to the garbage collector). So C++ lambda capture gives you more control (capture by value and manual allocation/deallocation for capture by reference). Thus I think all this validates your interpretation in your answer (Javascript captures by reference all the way and the referee is not garbage collected). – Maggyero Aug 13 '20 at 00:32
Do you agree with all this? If so, could you update your answer by mentioning the automatic allocation on the heap instead of the stack of lexical environments holding closures in Javascript? I am quite satisfied by our understanding now, so I am going to accept it. – Maggyero Aug 13 '20 at 00:35
@Maggyero I agree in principle with this. However, heap and stack are implementation means. For example for JS I think it is possible to the lexical environment (or a reference to it on the stack when the function is invoked and use a mix of stack and heap to manage them. And there is no lexical environment at runtime in c++ – Christophe Aug 13 '20 at 08:11
@Maggyero I think the answer already adresses the question since it explains the questioned behavior based on specifications and provides experimental observable confirmation. I would feel inconfortable to add now implementation specific hypotheses, which may be right for some implementations but could be done differently in another. Moreover I am not enclined to claim things in answers that I cannot verify in any way. If you edit your question to raise the link to c++ I could edit the answer to confirm your view. – Christophe Aug 13 '20 at 08:22
Okay, that is reasonable. Accepted! Thanks again for this answer, I have learnt a lot. – Maggyero Aug 13 '20 at 12:42

Does capture by reference turn into capture by value when the reference goes out of context in Javascript?

2 Answers2

In short

More details

How are variables "captured" in JavaScript?

How come that the value of a variable gone out of scope is preserved?