y21 / dash Goto Github PK
View Code? Open in Web Editor NEWExperimental JavaScript implementation in Rust
License: MIT License
Experimental JavaScript implementation in Rust
License: MIT License
A major problem with the current GC is that the vm has no way of telling that a variable in a native function is holding a reference to a JavaScript object. If a GC cycle happens to trigger (e.g. due to calling a JS function), the object referenced by the variable is (mistakenly) deallocated.
Fixing this likely requires major changes wrt how values are represented all over the vm and how native code works with these.
One possible way to fix this is to have a separate vector of "external references" (i.e. objects referenced by bindings on the native stack). The current vm actually already has something similar, but it requires manually adding values to it. It would be nice if we could make this sort of pattern a compile error (pseudocode):
fn native_function() {
let o = create_object();
js_function(); // calls some JavaScript function, GC triggers in here, `o` is not marked and gets deallocated
print(o); // using deallocated object
}
We could perhaps make some kind of macro that also adds the assigned value to the external refs vector, used like so:
fn native_function() {
letroot!(o = create_object()); // allocate object AND add to vector of external refs
js_function(); // GC triggers, but `o` is in external refs vector and is not deallocated
print(o); // ok
}
Currently, we cannot do a lot with JS values coming from WASM. VM.prototype.exec
immediately inspects the value and returns it as a string, so the end user doesn't even get to decide what to do with a value. Instead of automatically inspecting values when executing a VM, we should return a Handle<Value>
that users can manually inspect, or do something else with. The downside is that we can't really do RAII patterns in JS (FinalizationRegistry
?), so users would have to manually free JS values.
Code:
import * as http from '@std/http';
let _t = 0;
http.listen(3000, async function (ctx) {
_t++;
ctx.respond({});
});
Output:
dash has unexpectedly panicked! this is a bug!
panicked at 'internal error: entered unreachable code', crates/dash_vm/src/dispatch.rs:1280:55
set RUST_BACKTRACE=1 to print a backtrace
dash has unexpectedly panicked! this is a bug!
panicked at 'called `Result::unwrap()` on an `Err` value: RecvError(())', crates/dash_rt_http/src/lib.rs:107:45
set RUST_BACKTRACE=1 to print a backtrace
dash has unexpectedly panicked! this is a bug!
panicked at 'assertion failed: `(left == right)`
left: `ThreadId(1)`,
right: `ThreadId(2)`', crates/dash_middle/src/util.rs:153:9
set RUST_BACKTRACE=1 to print a backtrace
thread panicked while panicking. aborting.
Aborted
Currently, type inference runs "at the same time" as const eval.
This leads to similar problems as #48, because some optimizations are only realized to be invalid after they occurred at some later point due to the type becoming more general. One such misoptimization:
let i = 0;
// `i` is inferred as number to this point,
// so this optimizes to: `() => 'number'`
let x = () => typeof i;
// Type is inferred as number | boolean only **here**
i = true;
// The closure should return 'boolean', but returns 'number'
console.log(x())
To fix this, type inference needs to run entirely before const eval, so we know that i
is number | boolean
and not number
at any early point
... to be able to properly show errors past the parsing stage to the user (location)
;
returns undefined (as there is no value), but is usable in expressions such as let x = (;);
or while (;) {}
Following code segfaults (in release mode; panics in debug mode):
if (true) {} else {}
Seems like it wrongly pops the index off the stack required by a shortjmp to jump over the else branch if the condition is true
(Originally found in a test262 case: annexB/language/global-code/if-decl-else-decl-a-global-existing-fn-update.js)
Stack overflows are not handled well. There is only an assert!()
that makes sure we don't overflow (which would normally panic anyway). Stack::push
should return an Option, and on the call site in VM we should handle it by throwing an error.
Lines 69 to 74 in 949a588
We should attempt to walk the AST after parsing and apply certain optimizations by evaluating the value of constant expressions at compile time
Currently ;
is a value-producing expression
To speed up startup times, it may be useful to make it possible to snapshot the JavaScript heap and compiled bytecode.
This code causes a panic:
for (let i = 0; i < 2; i++) {
(function () {
i;
});
}
dash has unexpectedly panicked! this is a bug!
panicked at 'internal error: entered unreachable code', crates/dash_vm/src/dispatch.rs:1338:55
set RUST_BACKTRACE=1 to print a backtrace
Normally it shouldn't emit ipostinclocal
for external variables, but here it does because the incrementing part of the loop does not know about external-ness of the local variable i
. This can probably be fixed by tracking external variables in the type infer pass (so it actually gets inferred as External(Number)
instead of Number
and specialization can avoid optimizing them).
dash/crates/dash_vm/src/gc/handle.rs
Lines 123 to 125 in 530e010
This is wrong for object types with a higher alignment than usize
/ the max alignment of all other fields. It assumes that GcNode<()>
and GcNode<ConcreteType>
can be used interchangeably for computing the offset of the value
field, which can be totally wrong due to padding after the preceding field to align the T
correctly.
std::any::TypeId
will be (is, on nightly) 128 bits, to mitigate collisions (iiuc).
I'm not sure if this will have a perf impact (I've seen u128 be significantly slower than u64 in specific benchmarks, probably because it's emulated using two u64s), but we should move away from it if at all possible. It also doesn't fully fix the issue: collisions are still possible, just far less likely than they already were with a u64.
We currently use it a lot to, for example, downcast an object to specific object types to access "internal" object state (e.g. underlying buffer of an ArrayBuffer
).
We alread have a BuiltinCapable
trait that we can add methods like as_arraybuffer
and that should allow us to get rid of most instances.
We probably can't fully get rid of it because external crates can define their own object types and store custom stuff in them that they may want to cast JS objects to, so we need to keep it, but we should avoid using it in dash itself
Assignments are currently incorrectly implemented, and there are quite a few bugs because of it, which are hard to solve with the current implementation. One of them being setting new properties. Assignment works by taking the last value on the stack and updating it. That's fine for normal variables that actually exist, but not for setting a new property on an object that didn't exist before. The property lookup will return undefined and assigning to undefined is a no op. For now, Object.defineProperty(O, K, P)
can be used to get around this.
This seems to be a feature that is not very easy to implement given the current way of how parameters are handled, but it should be worth it, especially considering that we already support rest parameters and tsc desugars those to arguments
in <ES2015
dash_
core
crate name collides with Rusts core crate. Instead, it should be named dash_core
dash_sys
. These bindings also make it possible to use dash in C code (just needs an interface on the C end), so it's not strictly related to WebAssembly.Code:
'\�'
Crash:
dash has unexpectedly panicked! this is a bug!
panicked at 'Invalid UTF8: Utf8Error { valid_up_to: 0, error_len: Some(1) }', crates/dash_middle/src/util.rs:72:28
set RUST_BACKTRACE=1 to print a backtrace
The first part where things start going wrong:
dash/crates/dash_lexer/src/lib.rs
Lines 242 to 246 in 059cc81
Declaration hoisting partly works already (you can refer to a var
before it is used, containing the undefined value if uninitialized). Function declarations work different however in that they are immediately initialized with the function and you can call them even before their declarations are "reached":
(function() {
return b();
var a = c();
function b() {
return c();
}
function c() {
return 3;
}
})()
Semantics should be equivalent to this:
(function() {
// Functions declarations
var b;
var c;
// Variable declarations
var a;
// Function initialization
b = function() {
return c();
}
c = function() {
return 3;
}
return b();
// Definition stays
a = c();
})();
When the other end disconnects, TcpStream
will continue yielding empty ArrayBuffer
s. This is probably wrong.
Code:
// foo.js
function foo(a) {
if (a === 0) throw new Error("bad");
const b = {}; // <-- leaks. memory is never freed
return foo(a - 1);
}
foo(1);
Running it through valgrind:
$ valgrind dashjs foo.js
# --- snip ---
==4513== HEAP SUMMARY:
==4513== in use at exit: 312 bytes in 3 blocks
==4513== total heap usage: 452 allocs, 449 frees, 106,823 bytes allocated
==4513==
==4513== LEAK SUMMARY:
==4513== definitely lost: 104 bytes in 1 blocks
==4513== indirectly lost: 208 bytes in 2 blocks
==4513== possibly lost: 0 bytes in 0 blocks
==4513== still reachable: 0 bytes in 0 blocks
==4513== suppressed: 0 bytes in 0 blocks
Emit a fast path for static property access calls (e.g. x.push()
) under certain conditions, such as if x
is likely an instance of Array
(possibly through type inference).
The builtin/intrinsic operation can be translated to an index into an intrinsic op table, similar to how we already translate a variable reference to a stack index.
If the prediction turns out to be true (i.e. x instanceof Array
) and no builtins were mutated, we can skip the slow property lookup process and execute the intrinsic op right away. Otherwise, fall back to slow/unlikely path.
We might also be able to skip prediction guards entirely if we can prove at compile time that a variable is always going to be of a certain type, for example here
const x = [];
for (let i = 0; i < 10000; i++)
x.push(i);
x
is declared const
, so its type won't change. Even if it wasn't declared const
, we can tell that the code never assigns to x
, so its type will always be array
.
In this case, it would be nice if it could skip all of the property dispatch involved when looking up a property and directly execute Array.prototype.push
.
I believe this can make a significant difference in programs that make use of a lot of builtins.
Inputs:
let x = 3; 3x
3 x
5y
Expected Outputs:
Syntax Error: Invalid or Unexpected token
Syntax Error: Unexpected identifier 'x'
Syntax Error: Invalid or Unexpected token
Actual Outputs:
3
3
ReferenceError: y is not defined
It seems that (integer)(identifier)
is getting parsed as (integer), (identifier)
, effectively disposing the integer. None of these should pass the lexing stage, except in the case where there is whitespace between the two (both are valid tokens individually)
try {
(() => {
[1].map(() => { throw new Error('x') });
})();
} catch (e) {}
message: panicked at 'assertion failed: self.1 > 0', core/src/vm/stack.rs:141:9
set `BACKTRACE=1` environment variable to display a backtrace
try {
[1].map(() => { throw new Error('x') });
} catch (e) {}
message: panicked at 'internal error: entered unreachable code', core/src/vm/mod.rs:910:8
set `BACKTRACE=1` environment variable to display a backtrace
... when it should be treated as setting its internal prototype slot. E.g.
let b = { __proto__: null };
console.log(b.__proto__); // not null!!
Something like console.logg();
(intentional typo) currently throws this error:
TypeError: undefined is not a function
at <anonymous>
that's pretty useless because there's no way to find out where it happened in the source code. "console.logg is not a function" would be a better error.
let x = {
get f() { console.log(this); }
};
x.f;
This should log the object itself, but instead logs undefined
let f;
function F() {
this.a = 1;
f = () => console.log(this);
}
new F();
f();
should print F { a: 1 }
but prints undefined
WeakSet/WeakMap entries should only exist for as long as the key is still a GC root/still has strong references. In the current implementation, entries are never removed even when a key gets GC'd (last strong Rc gets dropped). This will probably be easier to fix with a proper GC and we no longer use Rc:
Repro
let x;
{
let y = {};
x = new WeakSet([y]);
}
console.log(x) // WeakSet { <1 items> }
(This wouldn't necessarily be observable with a real GC where values don't get freed immediately once unreachable but we are using Rc
, and because Weak<T>
is used as key, the WeakSet count should be at 0)
var x;
var A;
x = function () {
A;
}
A = function () {
}
console.log(new A())
This logs undefined
when it should be logging {}
. Strangely enough, commenting out A;
or doing other seemingly unrelated changes in function x (which is totally unused) gives the correct result
let z = {};
let y = {__proto__:z};
z.__proto__ = y;
y.a
running this crashes dash with a stack overflow because it gets stuck in a recursive "loop" trying to look up property 'a' in its parent __proto__
object if it can't find it
Other JS engines have guards against that. V8 throws Cyclic __proto__ value
The various cli components should use the inspect function (defined in crates/dash_rt/js/inspect.js).
Right now they all simply call .toString()
on the return value. It's fine for primitive values such as numbers and strings, but for objects it just prints [object Object]
, which is not really useful. The inspect function would also return the object entries.
Right now, instructions have data associated to them, which makes the enum big. Ideally, generated bytecode should be a Vec<u8>
. We can do this by having a separate value pool and storing indices into the pool in bytecode, rather than directly storing values which may be up to sizeof(usize)
bytes big.
Currently, property lookup for builtin types is basically a bunch of match statements that, depending on the value type try to find a property. That's not correct. Property lookups should go through the prototype chain. About like:
function findProperty(o, p) {
while (o !== null) {
if (Object.prototype.hasOwnProperty.call(o, p)) return o[p];
o = o.__proto__;
}
}
foo[bar] === findProperty(foo, bar)
foo.bar === findProperty(foo, "bar")
Invoking a function with new
should set this
in the function to a new object and return that object. __proto__
of that object should point to fn.prototype
. fn.prototype.__proto__
should be Object.prototype
. Object.prototype.__proto__
should be null.
The majority of the codebase is undocumented. This makes it very hard to understand certain parts of the code for people who have never seen the code before. We can later enforce public API documentation with #![deny(missing_docs)]
console.log(/.+a/.test('bba'));
This should return true, but currently returns false, because .+
eagerly consumes the whole string.
Instead it should work its way backwards and realize that .+
should only match bb
.
Consider the following code:
while (true) {
for (let i = 0; i < 3; ++i) console.log(i);
}
The first iteration of the outer loop runs as expected. Numbers 0 to 2 are printed.
On all other iterations, it seems to completely skip the inner loop.
Changing it to this seemingly equivalent code makes it run as expected.
while (true) {
- for (let i = 0; i < 3; ++i) console.log(i);
+ for (let i = 0 + 0; i < 3; ++i) console.log(i);
}
Looks like the inner loop mutates a constant, which is reused later in the next iteration. This causes the loop condition to never be true again in all subsequent iterations.
0 + 0
is a hack that forces a new number to be created, and mutating it no longer mutates the constant.
The same behavior can be observed when calling functions that create locals
function wtf() {
let i = 0;
i++;
return i;
}
console.log(wtf(), wtf(), wtf()); // expected: 1 1 1, got: 3 3 3
Currently dash tries to apply some optimizations at compile time, e.g. given
for (let i = 0; i < 1000; i++);
The comparison and increment get special opcodes that are faster than the generic cmp/inc opcodes as it skips the type check and is optimized specifically to work with integers.
However, doing this statically is very limited and leads to many missed optimizations.
It would be better if we applied these at runtime since we have far more information at runtime as it executes the code.
We already have some amount of infrastructure required for doing this from the JIT, namely the dash_typed_cfg
crate for getting a CFG with type information from bytecode, which should already allow for some really nice optimizations that we couldn't do at compile time.
Opts like constant propagation should still run at compile time since there isn't any benefit to not doing it right away.
Currently, every value is reference counted through Rc<RefCell<T>>
. This is not necessary for primitives such as numbers & bools.
const O = { ['a']: 42 };
console.log(O[42]); // expected: undefined, got: 'a'
console.log(O.a); // expected: 42, got: undefined
The lexer needs to be made aware of escape sequence (eg \n
). Right now "\n"
is parsed as the literal characters \ and n, and not the linebreak character.
It might be worth in a lot of cases, especially for function calls. I'm guessing that an overwhelming number of function calls in any JS program usually only has up to 3-4 arguments at most. We'll have to experiment and see which numbers make sense (or if it even makes sense at all), but if we can avoid many of these tiny allocations, I think that would already be a good improvement.
I know that there's quite a few places in dash where we do something like value.call(this, vec![arg])
(i.e. a one-value vec).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.