Lifespan of JS closure context objects? - javascript

Background
I'm trying to port the elixir's actor model language primitives into JS. I came up with a solution (in JS) to emulate the receive elixir keyword, using a "receiver" function and a generator.
Here's a simplified implementation and demo to show you the idea.
APIs:
type ActorRef: { send(msg: any): void }
type Receiver = (msg: any) => Receiver
/**
* `spawn` takes a `initializer` and returns an `actorRef`.
* `initializer` is a factory function that should return a `receiver` function.
* `receiver` is called to handle `msg` sent through `actorRef.send(msg)`
*/
function spawn(initializer: () => Receiver): ActorRef
Demo:
function* coroutine(ref) {
let result
while (true) {
const msg = yield result
result = ref.receive(msg)
}
}
function spawn(initializer) {
const ref = {}
const receiver = initializer()
ref.receive = receiver
const gen = coroutine(ref)
gen.next()
function send(msg) {
const ret = gen.next(msg)
const nextReceiver = ret.value
ref.receive = nextReceiver
}
return { send }
}
function loop(state) {
console.log('current state', state)
return function receiver(msg) {
if (msg.type === 'ADD') {
return loop(state + msg.value)
} else {
console.log('unhandled msg', msg)
return loop(state)
}
}
}
function main() {
const actor = spawn(() => loop(42))
actor.send({ type: 'ADD', value: 1 })
actor.send({ type: 'BLAH', value: 1 })
actor.send({ type: 'ADD', value: 1 })
return actor
}
window.actor = main()
Concern
Above model works. However I'm a bit concern about the performance impact of this approach, I'm not clear about the memory impact of all the closure contexts it creates.
function loop(state) {
console.log('current state', state) // <--- `state` in a closure context <─┐ <─────┐
return function receiver(msg) { // ---> `receiver` closure reference ──┘ │
if (msg.type === 'ADD') { │
return loop(state + msg.value) // ---> create another context that link to this one???
} else {
console.log('unhandled msg', msg)
return loop(state)
}
}
}
loop is the "initializer" that returns a "receiver". In order to maintain a internal state, I keep it (state variable) inside the closure context of the "receiver" function.
When receive a message, the current receiver can modifies the internal state, and pass it to loop and recursively create a new receiver to replace current one.
Apparently the new receiver also has a new closure context that keeps the new state. This process seems to me may create a deep chain of linked context objects that prevents GC?
I know that context objects referenced by closure could be linked under some circumstance. And if they're linked, they are obviously not released before the inner-most closure is released. According to this article V8 optimization is very conservative on this regard, the picture doesn't look pretty.
Questions
I'd be very grateful if someone can answer these questions:
Does the loop example creates deeply linked context objects?
What does the lifespan of context object look like in this example?
If current example does not, can this receiver creates receiver mechanism ends up creating deeply linked context objects under other situation?
If "yes" to question 3, can you please show an example to illustrate such situation?
Follow-Up 1
A follow-up question to #TJCrowder.
Closures are lexical, so the nesting of them follows the nesting of the source code.
Well said, that's something obvious but I missed 😅
Just wanna confirm my understanding is correct, with an unnecessarily complicated example (pls bear with me).
These two are logically equivalent:
// global context here
function loop_simple(state) {
return msg => {
return loop_simple(state + msg.value)
}
}
// Notations:
// `c` for context, `s` for state, `r` for receiver.
function loop_trouble(s0) { // c0 : { s0 }
// return r0
return msg => { // c1 : { s1, gibberish } -> c0
const s1 = s0 + msg.value
const gibberish = "foobar"
// return r1
return msg => { // c2 : { s2 } -> c1 -> c0
const s2 = s1 + msg.value
// return r2
return msg => {
console.log(gibberish)
// c3 is not created, since there's no closure
const s3 = s2 + msg.value
return loop_trouble(s3)
}
}
}
}
However the memory impact is totally different.
step into loop_trouble, c0 is created holding s0; returns r0 -> c0.
step into r0, c1 is created, holding s1 and gibberish, returns r1 -> c1.
step into r1, c2 is created, holding s2, returns r2 -> c2
I believe in the above case, when r2 (the inner most arrow function) is used as the "current receiver", it's actually not just r2 -> c2, but r2 -> c2 -> c1 -> c0, all three context objects are kept (Correct me if I'm already wrong here).
Question: which case is true?
All three context objects are kept simply because of the gibberish variable that I deliberately put in there.
Or they're kept even if I remove gibberish. In other word, the dependency of s1 = s0 + msg.value is enough to link c1 -> c0.
Follow-Up 2
So environment record as a "container" is always retained, as of what "content" is included in the container might vary across engines, right?
A very naive unoptimized approach could be blindly include into the "content" all local variables, plus arguments and this, since the spec didn't say anything about optimization.
A smarter approach could be peek into the nest function and check what exactly is needed, then decide what to include into content. This is referred as "promotion" in the article I linked, but that piece of info dates back to 2013 and I'm afraid it might be outdated.
By any chance, do you have more up-to-date information on this topic to share? I'm particularly interested in how V8 implements such strategy, cus my current work heavily relies on electron runtime.

Note: This answer assumes you're using strict mode. Your snippet doesn't. I recommend always using strict mode, by using ECMAScript modules (which are automatically in strict mode) or putting "use strict"; at the top of your code files. (I'd have to think more about arguments.callee.caller and other such monstrosities if you wanted to use loose mode, and I haven't below.)
Does the loop example creates deeply linked context objects?
Not deeply, no. The inner calls to loop don't link the contexts those calls create to the context where the call to them was made. What matters is where the function loop was created, not where it was called from. If I do:
const r1 = loop(1);
const r2 = r1({type: "ADD", value: 2});
That creates two functions, each of which closes over the context in which it was created. That context is the call to loop. That call context links to the context where loop is declared — global context in your snippet. The contexts for the two calls to loop don't link to each other.
What does the lifespan of context object look like in this example?
Each of them is retained as long as the receiver function referring to it is retained (at least in specification terms). When the receiver function no longer has any references, it and the context are both eligible for GC. In my example above, r1 doesn't retain r2, and r2 doesn't retain r1.
If current example does not, can this receiver creates receiver mechanism ends up creating deeply linked context objects under other situation?
It's hard to rule everything out, but I wouldn't think so. Closures are lexical, so the nesting of them follows the nesting of the source code.
If "yes" to question 3, can you please show an example to illustrate such situation?
N/A
Note: In the above I've used "context" the same way you did in the question, but it's probably worth noting that what's retained is the environment record, which is part of the execution context created by a call to a function. The execution context isn't retained by the closure, the environment record is. But the distinction is a very minor one, I mention it only because if you're delving into the spec, you'll see that distinction.
Re your Follow-Up 1:
c3 is not created, since there's no closure
c3 is created, it's just that it isn't retained after the end of the call, because nothing closes over it.
Question: which case is true?
Neither. All three contexts (c0, c1, and c2) are kept (at least in specification terms) regardless of whether there's a gibberish variable or an s0 parameter or s1 variable, etc. A context doesn't have to have parameters or variables or any other bindings in order to exist. Consider:
// ge = global environment record
function f1() {
// Environment record for each call to f1: e1(n) -> ge
return function f2() {
// Environment record for each call to f2: e2(n) -> e1(n) -> ge
return function f3() {
// Environment record for each call to f3: e3(n) -> e2(n) -> e1(n) -> ge
};
};
}
const f = f1()();
Even though e1(n), e2(n), and e3(n) have no parameters or variables, they still exist (and in the above they'll have at least two bindings, one for arguments and one for this, since those aren't arrow functions). In the code above e1(n) and e2(n) are both retained as long as f continues to refer to the f3 function created by f1()().
At least, that's how the specification defines it. In theory those environment records could be optimized away, but that's a detail of the JavaScript engine implementation. V8 did some closure optimization at one stage but backed off most of it because (as I understand it) it cost more in execution time than it made up for in memory reduction. But even when they were optimizing, I think it was the contents of the environment records they optimized (removing unused bindings, that sort of thing), not whether they continued to exist. See below, I found a blog post from 2018 indicating that they do leave them out entirely sometimes.
Re Follow-Up 2:
So environment record as a "container" is always retained...
In specification terms, yes; that isn't necessarily what engines literally do.
...as of what "content" is included in the container might vary across engines, right?
Right, all the spec dictates is behavior, not how you achieve it. From the section on environment records linked above:
Environment Records are purely specification mechanisms and need not correspond to any specific artefact of an ECMAScript implementation.
...but that piece of info dates back to 2013 and I'm afraid it might be outdated.
I think so, yes, not least because V8 has changed engines entirely since then, replacing Full-codegen and Crankshaft with Ignition and TurboFan.
By any chance, do you have more up-to-date information on this topic to share?
Not really, but I did find this V8 blog post from 2018 which says they do "elide" context allocation in some cases. So there is definitely some optimization that goes on.

Related

Memory handling vs. performance

I'm building a WebGL game and I've come so far that I've started to investigate performance bottlenecks. I can see there are a lot of small dips in FPS when there are GC going on. Hence, I created a small memory pool handler. I still see a lot of GC after I've started to use it and I might suspect that I've got something wrong.
My memory pool code looks like this:
function Memory(Class) {
this.Class = Class;
this.pool = [];
Memory.prototype.size = function() {
return this.pool.length;
};
Memory.prototype.allocate = function() {
if (this.pool.length === 0) {
var x = new this.Class();
if(typeof(x) == "object") {
x.size = 0;
x.push = function(v) { this[this.size++] = v; };
x.pop = function() { return this[--this.size]; };
}
return x;
} else {
return this.pool.pop();
}
};
Memory.prototype.free = function(object) {
if(typeof(object) == "object") {
object.size = 0;
}
this.pool.push(object);
};
Memory.prototype.gc = function() {
this.pool = [];
};
}
I then use this class like this:
game.mInt = new Memory(Number);
game.mArray = new Memory(Array); // this will have a new push() and size property.
// Allocate an number
var x = game.mInt.allocate();
<do something with it, for loop etc>
// Free variable and push into mInt pool to be reused.
game.mInt.free(x);
My memory handling for an array is based on using myArray.size instead of length, which keeps track of the actual current array size in an overdimensioned array (that has been reused).
So to my actual question:
Using this approach to avoid GC and keep memory during play-time. Will my variables I declare with "var" inside functions still be GC even though they are returned as new Class() from my Memory function?
Example:
var x = game.mInt.allocate();
for(x = 0; x < 100; x++) {
...
}
x = game.mInt.free(x);
Will this still cause memory garbage collection of the "var" due to some memcopy behind the scenes? (which would make my memory handler useless)
Is my approach good/meaningful in my case with a game that I'm trying to get high FPS in?
So you let JS instantiate a new Object
var x = new this.Class();
then add anonymous methods to this object and therefore make it a one of a kind
x.push = function...
x.pop = function...
so that now every place you're using this object is harder to optimize by the JS engine, because they have now distinct interfaces/hidden classes (equal ain't identical)
Additionally, every place you use these objects, will have to implement additional typecasts, to convert the Number Object back into a primitive, and typecasts ain't for free either. Like, in every iteration of a loop? maybe even multiple times?
And all this overhead just to store a 64bit float?
game.mInt = new Memory(Number);
And since you cannot change the internal State and therefore the value of a Number object, these values are basically static, like their primitive counterpart.
TL;DR:
Don't pool native types, especially not primitives. These days, JS is pretty good at optimizing the code if it doesn't have to deal with surprizes. Surprizes like distinct objects with distinct interfaces that first have to be cast to a primitive value, before they can be used.
Array resizing ain't for free either. Although JS optimizes this and usually pre-allocates more memory than the Array may need, you may still hit that limit, and therefore enforce the engine to allocate new memory, move all the values to that new memory and free the old one.
I usually use Linked lists for pools.
Don't try to pool everything. Think about wich objects can really be reused, and wich you are bending to fit them into this narrative of "reusability".
I'd say: If you have to do as little as adding a single new property to an object (after it has been constructed), and therefore you'd need to delete this property for clean up, this object should not be pooled.
Hidden Classes: When talking about optimizations in JS you should know this topic at least at a very basic level
summary:
don't add new properties after an object has been constructed.
and to extend this first point, no deletes!
the order in wich you add properties matters
changing the value of a property (even its type) doesn't matter! Except when we talk about properties that contain functions (aka. methods). The optimizer may be a bit picky here, when we're talking about functions attached to objects, so avoid it.
And last but not least: Distinct between optimized and "dictionary" objects. First in your concepts, then in your code.
There's no benefit in trying to fit everything into a pattern with static interfaces (this is JS, not Java). But static types make the life easier for the optimizer. So compose the two.

Javascript overwrite the scope of a callback

I stumbled into this problem recently and after a while of reading around I couldn't find an answer that satisfies this use case in particular.
I am trying to achieve the following behaviour in javascript
// Lets assume we have some variable defined in global scope
var a = {val: 0}
// What I want here is a function that sets a.val = newVal
// and then calls the callback.
var start = function(newVal, cb) {
???
}
// such that
start(1, function() {
setTimeout(function() {
console.log(a.val) // 1
}, 1000)
})
// and
start(2,function () {
console.log(a.val) // 2
})
// but in the original scope
console.log(a.val) // 0
In other words i am looking for a way to "wrap" a callback in a different global scope. I am aware that you can do something similar passing an environment around or using this; but such methods always force the callback functions to refer to the environment explicitly, turning the callback code into something like
start(2,function () {
console.log(env.a.val) // 2
})
I am specifically looking for a solution that preserves the possibility to use the global reference directly from within the callback of start.
Feel free to use any ES6/ES7 feature that can somehow be shimmed in or is compatible with node, this is not meant for production code just a fun exercise.
EDIT:
I will explain the general problem since many people suggested this solution might not be what I am actually looking for.
I recently learned about STM (https://wiki.haskell.org/Software_transactional_memory)
and wanted to play around with a similar idea in js.
Of course js runs on a single thread but the idea was to provide the same level of isolation to different callbacks running in atomic blocks.
The user has some kind of shared transactional variable. Operations on this
variable must be wrapped in atomically blocks. What happens under the hood is that operations in the atomically block are not performed on the actual TVar but on some MockTVar which simply records all the reads and writes in a log.
When you call the done the log is checked to see if the operations performed are consistent with the current state of the TVars; if it is the updates now performed on the actual TVars and we are done (this is called a commit). If it is not the log is discarded and the callback is run again. This is a small example of the code
var x = new TVar(2)
// this is process a
process.nextTick(function() {
atomically(x, function(x, done) {
a = x.readTVar()
setTimeout(function() {
x.writeTVar(a+1)
console.log('Process a increased, x = ', x.readTVar())
done()
}, 2000)
})
})
// this is process b
process.nextTick(function() {
atomically(x, function(x, done) {
var a = x.readTVar()
x.writeTVar(a+1)
console.log('Process b increased, x = ', x.readTVar())
done()
})
})
In this example process a will try to commit but since process b changed the value of x (and committed that change before a) the commit will fail and the callback will run once more.
As you can see I am returning the mockTVars in the callback, but i find this a bit ugly for two reasons:
1) If you want to lock more than one variable (and you generally do) i have no choice but to return an array of mockTVars forcing the user to extract them one by one if he wants to use them cleanly.
2) It is up to the user to make sure that the name of the mockTVar which is passed to the callback matches the name of the actual TVar if he wants to be able to reason about whats happening without losing his mind. What I mean is that in this line
atomically(x, function(x, done) {..})
It is up to the user to use the same name to refer both to the actual TVar and the mocked TVar (the name is x in this example).
I hope this explanation is helpful. Thanks to everybody that took the time to help me out
I still wish you would describe the actual problem you're trying to solve, but here's one idea, that makes a copy of the global object, passes it to the callback and the callback can use the same name as the global and then it will "override" access to the global for that scope only.
var a = {val: 0, otherVal: "hello"} ;
function start(newVal, cb) {
var copy = {};
Object.assign(copy, a);
copy.val = newVal;
cb(copy);
}
log("Before start, a.val = " + a.val);
start(1, function(a) {
// locally scoped copy of "a" here that is different than the global "a"
log("Beginning of start, a.val = " + a.val) // 1
a.val = 2;
log("End of start, a.val = " + a.val) // 2
});
log("After start, a.val = " + a.val);
function log(x) {
document.write(x + "<br>");
}

I was reading about javascript functions/prototypes, and (fiddle provided)

I stumbled across a section of code1 (at the bottom of the page, in the function prototype section) that I'm curious about:
function Employee(name, salary)
{
this.name=name;
this.salary=salary;
}
/*line7*/ Employee.prototype.getSalary=function getSalaryFunction()
{
return this.salary;
}
/*line12*/ Employee.prototype.addSalary=function addSalaryFunction(addition)
{
this.salary=this.salary+addition;
}
1)I'm wondering if the same thing could be written as followed, and whether either one would be more functional?:
i) add
this.getSalary = getSalaryFunction;
this.addSalary = addSalaryFunction;
to what I'm assuming is now the prototype (after the name and salary properties),
and
ii) replace the original lines 7 and 12 respectively with:
function getSalaryFunction()
and
function addSalaryFunction(addition)
. Also, /*return?*/Employee.salary+=addition; would be shorthand for this.salary = this.salary + addition; in this case, right?
jsfiddle that represents my idea: http://jsfiddle.net/4fL8v69b/1/
1http://www.permadi.com/tutorial/jsFunc/index.html Web Developers Introduction To Features of JavaScript "Function" Objects.
In your example, you are adding the functions getSalaryFunction and addSalaryFunction to the global (or at least outer) scope, where they can be called independently (and probably return undefined.)
In the original example, the functions only exist as part of their parent object, so they are more likely to do the right thing* when called.
* I'm hand-waving over the complexities of this in JS.

Persistence framework for JavaScript / Google v8

Is there any kind of persistence framework for JavaScript and/or the Google v8 engine?
I want to store (serialize) a whole graph of objects (including, e.g., functions) and re-load it later. JSON is not sufficient, since it does not permit functions to be stored and permits only a tree-like structure (i.e. no two objects referencing the same object).
I need to be able to do that generically (i.e. without knowing the JavaScript code at the time at which I write my program embedding v8), since I want the user of my program to be able to customize it with JavaScript, but I need to store the state of my program (including the state of the customization) and re-load it later. Hence I need to store the state of the JavaScript engine.
Edit:
Example:
Suppose we have the following code:
var obj = { a: 4, b: function (x) { return x + this.a; } }
// ...
if ( ... ) { obj.a = 5; }
// ...
if ( ... ) { var c = 1; obj.b = function (x) { return x + this.a + c; } }
// ...
// now I want to serialize obj
Then is it (without any meta-information about the logic of the program) possible to serialize obj and later deserialize it such that obj.b (2) delivers the same result after deserialization as it did before serialization?
Second Edit: Note the closure.
Unfortunately, what you're trying to do is not currently possible in Javascript. The reason is that closures are not just objects, they're objects bound to an execution context.
Getting past the "this can't be done in javascript" issue and moving into the "what if wrote a patch for V8 to allow this" phase of the answer, this is conceptually difficult. Essentially, for every closure you'd serialize, you would have to serialize the Context object that the closure exists in. It'd be nice to be able to just serialize the HandleScope, but the nature of closures is that you can't reach inside them.
Okay, so let's say you've written a function that can serialize the Context that the closure exists in, and you can even deserialize it. What do you do with it?
The answer to that is 'not much'. Javascript can only be executed in a single context at a time. The closure that you've deserialized doesn't exists in the context that you're trying to pull it back into. You can't really pass data between contexts, and if your function has data bound to free variables, do you use the ones that exist in the deserializer-invoking context, or do you overwrite it with the deserialized context? Conceptually, this is a nightmare.
Ecmascript Harmony had considered giving us nearly-first-class continuations, but it's been pushed form the discussion which I rant about here, but this isn't going to happen any time soon.
HTML5 local storage allows persistence at client level through javascript.
I'm not sure if it will fit your needings, as to being able to store a function you'll need to somewhat give it some markup that allows you to deserialize it when retrieving it from storage (or maybe just store it as plain text and try to eval it on retrieval)
http://diveintohtml5.info/storage.html
I don't think persisting functions is a good practice. I can suggest you the below approach. Turn your JSON data to lets say some class like "MyData". You can find two functions fromJSON, toJSON which will do the magic you want.
var MyData = function(props){
this.temp = "a";
this.getTemp = function(){
return this.temp;
}
this.fromJSON = function(props){
if(props){
this.temp = props.temp;
}
}
this.toJSON = function(){
var props = {};
props.temp = this.temp;
return props;
}
this.fromJSON(props);
}
var obj = new MyData({"temp" : "b"});
var state = obj.toJSON();
// persist state about the object as JSON string
LOCALSTORAGE.put(state); // You can write some HTML5 local storage stuff to persist
var persistedState = LOCALSTORAGE.get(); // You can use the above HTML5 local storage stuff to read the persisted stuff
var newBornObj = new MyData(persistedState);

Dynamic Object Creation

I have a function that takes a string object name and I need the function to create an new instance of a object that has the same name as the value of the string
For example,
function Foo(){}
function create(name){
return new name();
}
create('Foo'); //should be equivalent to new Foo();
While I know this would be possible via eval, it would be good to try and avoid using it. I am also interested if anyone has an alternative ideas to the problem (below)
I have a database and a set of (using classical OO methodology) classes, roughly one for each table that define common operations on that table. (Very similar to Zend_Db for those who use PHP). As everything is asynchronous doing tasks based on the result of the last one can lead to very indented code
var table1 = new Table1Db();
table1.doFoo({
success:function(){
var table2 = new Table2Db();
table2.doBar({
notFound:function(){
doStuff();
}
});
}
});
The obvious solution is to create helper methods that abstracts the asynchronous nature of the code.
Db.using(db) //the database object
.require('Table1', 'doFoo', 'success') //table name, function, excpected callback
.require('Table2', 'doBar', 'notFound')
.then(doStuff);
Which simplifies things. However the problem is that I need to be able to create the table classes, the names of which can be inferred from the first augment passed to require which leads me to the problem above...
Why not simply pass the constructor function into the require method? That way you sidestep the whole issue of converting from name to function. Your example would then look like:
Db.using(db) //the database object
.require(Table1Db, 'doFoo', 'success') //table constructor, function name, expected callback
.require(Table2Db, 'doBar', 'notFound')
.then(doStuff);
However, if you really want to use a string...
Why are you deadset on avoiding using eval? It is a tool in the language and every tool has its purpose (just as every tool can be misused). If you're concerned about allowing arbitrary execution, a simple regular expression test should render your usage safe.
If you're dead-set on avoiding eval and if all of your constructor functions are created in the default global scope (i.e. the window object), this would work:
function create(name) {
return new window[name]();
}
If you want to get fancy and support namespace objects (i.e. create('MyCompany.MyLibrary.MyObject'), you could do something like this:
function create(name) {
var current,
parts,
constructorName;
parts = name.split('.');
constructorName = parts[parts.length - 1];
current = window;
for (var i = 0; i < parts.length - 1; i++) {
current = current[parts[i]];
}
return new current[constructorName]();
}
You were at the gate of completeness. While Annabelle's solution let's you to do what's you've just wanted in the way you wanted (passing strings), let me offer you an alternative. (passing function references)
function Foo(){}
function create(name){
return new name();
}
create(Foo); // IS equivalent to new Foo();
And voila, it works :) I told you. You were at the doorsteps of the solution.
What happened is that you've try to do this
new 'Foo'()
Which doesn't makes much sense, does it? But now you pass the function by reference so the line return new name(); will be transformed into return new Foo(); just how you would expect.
And now the doors are opened to abstract the asynchronousness of your application. Have fun!
Appendix: Functions are first-class objects, which means that they can be stored by reference, passed as an argument by reference or returned by another function as values.

Categories