Code Monkey home page Code Monkey logo

design's Introduction

WebAssembly Design

This repository contains documents describing the design and high-level overview of WebAssembly.

The documents and discussions in this repository are part of the WebAssembly Community Group.

Overview

WebAssembly or wasm is a new, portable, size- and load-time-efficient format suitable for compilation to the web.

WebAssembly is currently being designed as an open standard by a W3C Community Group that includes representatives from all major browsers. Expect the contents of this repository to be in flux: everything is still under discussion.

  • WebAssembly is efficient and fast: Wasm bytecode is designed to be encoded in a size- and load-time-efficient binary format. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

  • WebAssembly is safe: WebAssembly describes a memory-safe, sandboxed execution environment that may even be implemented inside existing JavaScript virtual machines. When embedded in the web, WebAssembly will enforce the same-origin and permissions security policies of the browser.

  • WebAssembly is open and debuggable: WebAssembly is designed to be pretty-printed in a textual format for debugging, testing, experimenting, optimizing, learning, teaching, and writing programs by hand. The textual format will be used when viewing the source of wasm modules on the web.

  • WebAssembly is part of the open web platform: WebAssembly is designed to maintain the versionless, feature-tested, and backwards-compatible nature of the web. WebAssembly modules will be able to call into and out of the JavaScript context and access browser functionality through the same Web APIs accessible from JavaScript. WebAssembly also supports non-web embeddings.

More Information

Resource Repository Location
High Level Goals design/HighLevelGoals.md
Frequently Asked Questions design/FAQ.md
Language Specification spec/README.md

Design Process & Contributing

The WebAssembly specification is being developed in the spec repository. For now, high-level design discussions should continue to be held in the design repository, via issues and pull requests, so that the specification work can remain focused.

We've mapped out features we expect to ship:

  1. An initial Minimum Viable Product (MVP) release;
  2. And soon after in future versions.

Join us:

When contributing, please follow our Code of Ethics and Professional Conduct.

design's People

Contributors

binji avatar bnjbvr avatar chicoxyzzy avatar ddcc avatar dduan avatar dschuff avatar dtig avatar fitzgen avatar flagxor avatar gahaas avatar hansoksendahl avatar jbondc avatar jfbastien avatar josephlr avatar juanpicado avatar juniorrojas avatar kmiller68 avatar kripken avatar lukewagner avatar mikeholman avatar mtrofin avatar naturaltransformation avatar pjuftring avatar ppopth avatar rossberg avatar s3ththompson avatar sbc100 avatar sunfishcode avatar teemperor avatar wanderer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

design's Issues

SIMD

We current suggest that we'll support SIMD.js (RFC).

The C++ standard committee is currently discuss adding explicit SIMD support as well as auto-vectorization hints to the language, and vector execution policies to executors. C++ isn't the only language that we want wasm to support, but we should make sure that what we implement can support C++! There may need to be some reconciliation between SIMD.js and C++ for the sake of wasm and not JavaScript.

Here are the recent relevant papers (older ones may also be relevant):

Feature detection and running Web Assembly

The initial polyfill will unconditionally load JavaScript code, then load the Web Assembly binary blob, and somehow translate it to a form that'll execute using the JavaScript VM.

Once we have native support this polyfill will be a fallback: developer's code should only load the polyfill and use it if Web Assembly isn't natively supported.

How will we detect native Web Assembly support? Something like navigator.mimeTypes['application/x-wasm'] !== undefined?

How will we trigger the browser's native support? Use an <embed> or <script> tag?

Proposed next steps for 'spec' repo

So, we've reached a state where we have a bunch of high- to medium-level design ideas captured in this repo (not in a remotely final state, lots of TODOs, lots of issues to be revisited after more experiments) and I think soon we'll want to start working on incrementally distilling this down into concrete specs, tools (listed below) and tests. This work would be done in parallel with design-doc-level discussion and likely inform the discussion (as tests/code have a way of doing). As with the design docs, these concrete specs/tools would not represent any final decision, just the current state.

One idea that seems really attractive is having the specs, tests and tools (used to run the tests) all live in the same repo so that, for a single PR, you can see all the relevant changes.

For writing tests, a natural choice would be to use the wasm text format. To avoid early bikeshedding or accidentally freezing on something we haven't thought out fully, we can do what llvm-wasm is doing and just use S-Expressions. Later, when we pick a real text format, we should be able to automatically convert the existing tests. (Of course LISP said the same thing with M-Expressions ;-)

The supporting tools that make sense to me are:

  • wasm text-to-binary converter
  • a wasm interpreter shell
  • polyfill

(I expect the interpreter and polyfill can share some or all of the wasm binary decoding code.) Given these tools, the test harness would convert a test to a binary and then run it against the wasm interpreter, the polyfill (given a JS shell), and, in the future, JS shells and browsers with native support.

I think it makes sense to start with these new concrete specs and tools empty and grow them in lock-step. The new (initially empty) polyfill in this repo would then be the "official" polyfill, and the existing polyfill repo could be renamed prototype-polyfill and continue to be used for experiments while the official polyfill was forming.

The directory structure I was thinking about was:

  • / : Readme.md
  • /docs : all the other .md files
  • /tools : contains subdirs for converter, shell, polyfill, decoder
  • /tests : contains test harness and tests organized into subdirs

Thoughts?

Separate goals from strategy in the spec repo

Goals and strategy are pretty intermixed right now, we should separate them.

This is somewhat similar to #56, where we want to split out implementation details of the JavaScript polyfill from the feature list.

dynamic linking verus module loading: what's the difference?

#53 discusses dynamic linking (post-V1), and PRs such as #71 document module loading (V1 feature). Speaking in-person, @lukewagner, @kg, @ncbray and I don't quite agree on the differences.

  1. Does dynamic linking inply heap sharing? We seem to agree it does.
  2. Is module loading just FII/syscall?
  3. If V1 had dynamic linking, how would uses of module loading be different?
    • Would developers hack-in dynamic linking support on top of V1 using module loading?
  4. Is module loading hindering non-browser uses?
    • If stdout and write were accessed through a dynamic library then it may be more portable than if it went through module loading and baked-in console.log.

Maybe we should even consider supporting dynamic linking for V1.

Steady On: An alternative to Tilt

Here's one of the examples which motivated the Tilt discussion:

while (1) {
  ..
  if (...) {
    label = 1;
    break;
  }
  ..
  if (...) {
    label = 1;
    break;
  }
  ..
  if (...) {
    break;
  }
}
if (label === 1) {
  ...
}

It is possible to represent this without label variables, like this:

L1: {
  L0: while (1) {
    if (...)
      break L0;
    ..
    if (...)
      break L0;
    ..
    if (...)
      break L1;
  }
  ...
}

Furthermore, I have now worked out a constructive proof that it's possible to represent all reducible control flow using our existing structured forms without the use of label variables, by using this general technique as a fallback instead of label variables. In the worst case, it can generate deeply nested structures which may be too deep for some existing JS parsers to handle, so there are tradeoffs to consider when polyfilling to asm.js, but there are many cases where the nesting isn't deep and the technique should work well.

Consequently, I propose we hold off on Tilt for now. A hybrid Relooper implementation extended to use of this technique in situations where it would be beneficial is worth pursuing, as it would allow Emscripten to generate better asm.js code, and allow llvm-wasm to generate better wasm code which polyfills to asm.js for v1.

Pondering the Stack and Globals

There's no direct question here, just pondering how things fit together.

For the most part, variables will live in the "shadow stack" (the JS stack, for many implementations). Manipulation of this stack is entirely implicit. Do a function call? New frame on the shadow stack. There will be some cases where the shadow stack insufficient, however. For example, address taken of a stack local variable. To support this, a "user space" stack is needed that lives in the visible heap. OK... so how is this user space stack implemented? Is it explicitly compiled into the program? Or is it supported by the system and are there special opcodes?

If the user stack is an explicit part of the program, then how do you implement it? I believe Emscripten uses a global variable: STACKTOP. OK, that's kind of weird if you think about it... it's sort of a "shadow global". Can't take the address of it. Spiritually similar to the stack pointer on CPUs. A virtual register that is not confined to a particular stack frame? OK, how do those virtual registers get initialized? For example, when you launch a thread? I suppose they could be parameters passed in to thread initialization. (Although this raises the question of what the system/user interface for thread creation looks like.) Alternatively, you could just store STACKTOP in the user visible heap. Much simpler, doesn't require a separate concept. (The concept of shadow globals may be desirable. It isn't necessary, however.)

On the other hand - where do you store the thread-local STACKTOP global in user space? Some implementations may lower shadow globals into the heap, anyways, so the question of where the globals are allocated is relevant in multiple situations. If memory allocation is an explicit part of the program... how does the allocator know what parts of memory are safe to use vs. grabbed by some lower-level part of the system? Does there need to be a system-level "page allocation" API that user-level memory allocators build on top of?

JF mentioned an example of split stacks being added to LLVM: http://reviews.llvm.org/D6095

Draft an email for the LLVM community

This documentation is great, but it's a bit TMI for the LLVM community and isn't the best way to say "hi, we want to work with you". I'll draft an email that we can send to the LLVM mailing list when we go public. The goal will be to extract specifically what they'll be interested in, to make sure we don't turn people off from the start.

The email will also serve as the main discussion point by LLVM folks, and I'm hoping to get good feedback from them early. We have good ideas about how we can avoid PNaCl and Emscripten's mistakes, we can write a good rationale, but I'm sure LLVM folks will find things to improve.

We want to involve them early, offer to change the design if needed, propose to upstream, and sign up to maintain the wasm backend.

Document what Web Assembly imposes on the web platform

Before going public we should understand what we're imposing on embedders, especially the web platform. Key questions to answer:

  1. How, if at all, is the web platform changing to support Web Assembly?
  2. What code changes are we imposing on browser implementations?

Theoretical examples:

  • An app which uses HTML+JS+WASM could relinquish WebGL-from-JS-main-thread-only, and move the WebGL control to a wasm thread instead.
  • Can a wasm app block the JS main thread?

Web Assembly will also drive / justify new features into the web platform, such as SharedArrayBuffer, SIMD.js, imporved filesystem support, improved networking support, ... It's not just imposing itself onto the web platform!

Support int8 and int16?

Should Web Assembly explicitly support int8 and int16?

We've discussed this:

  • In issue #82 w.r.t. load/store and implicit truncation / extension.
  • In issue #81 because having int64 makes int8/int16 look more regular / less out of place.

A few thoughts:

  • I think supporting these types makes it easier to have a dumb and fast compiler.
  • It is more instructions to know about in the assembler, but less work to figure out patterns in the instruction selector.
  • It adds complexity to a register allocator if we try to treat types differently.
  • Opens the door for more advanced optimizations (value / bit tracking, or even vectorization).

We can also add these types to a later version of Web Assembly if just int32 / int64 prove insufficient.

Define "portable"

We currently don't really define what "portable" means. A few ideas:

  • Works within recent browsers, based on web standards.
  • Works on multiple mainstream OSs. Should we specify which (Windows (which?), OSX, Linux, ChromeOS)?
  • Works on multiple ISAs. Should we specify which? (x86-32, x86-64, ARMv7 ARM, ARMv8 A32/A64, MIPS).
  • Expects little endian.
  • Expects 32-bit address space (for now).

Should we discuss portable performance expectations? It's pretty handwavy, but if it's not pretty fast isn't not really working!

If that sounds good I'll send a PR.

Define instruction set

Although the AST semantics is a nice first start at defining how programs will be structured in WebAssembly, it seems as though a more complete definition of statements (functions and control flow) and expressions (instructions) would be useful for helping people understand the design of WebAssembly, what is possible with it, and where it fits on the spectrum of abstraction.

Perhaps the AST document could be reorganized into more logical sections: types (integers, floats, pointers/memory), statements, and expressions. Currently it's a slightly confusing hodgepodge of all three and more.

What are Web Assembly threads?

Are Web Assembly specified in terms of WebWorkers, or are they different? What are the differences, or are WebWorkers just an implementation detail?

Maybe the polyfill is OK with WebWorkers, and Web Assembly does its own thing that's closer to pthreads.

We need to ensure that Web Assembly can work outside of a browser.

Postpone adding globals until dynamic linking

Split off from #139: I was thinking that perhaps globals should be removed from the MVP and instead go in with the dynamic linking feature. The reasoning is that, until we have to worry about dynamic linking, we can simply place globals in the heap and use compiler-chosen static offsets; indeed this is what we'll need to do anyway for any global whose address is taken (and what Emscripten does for most globals anyway, to avoid JS engine number-of-closure-variable limitations). Technically, globals have the advantage that they can't be aliased by loads/stores, but I don't expect this will win much in practice (esp. assuming a C++ compiler has already optimized the code).

Thinking forward to dynamic linking: since we have to deal with aliased globals anyway, it seems simpler and more orthogonal (not requiring a separate set of LoadGlobal/StoreGlobal ops) to have globals declare immutable pointers (i.e., integers) that would then be used as arguments to plain LoadHeap/StoreHeap ops (so LoadGlobal(a) => LoadHeap(Global(a)) where Global(a) is a const-expr of int32/int64 type). Even trivial backends should have no problem eliminating bounds checks.

With this strategy, the question is of course where the memory pointed to by the global pointers comes from: the engine has no knowledge of how the memory in the [0, sbrk-max) range is being used by malloc et al. There is a lot of room for design here, but I think roughly what we need to do is let the application allocate the data and give it to the engine (either directly or by registering a global data allocator with the runtime). The important thing is that we keep allocation (and addresses) deterministic and under application control so that applications can do smart things with their address space (shadow stacks, emulating MAP_FIXED, etc).

Lastly, I think this same strategy applies to TLS variables: TLS variables would be pointers into the heap and we'd need a way for the user application to allocate the memory pointed to by these TLS variables (when new threads are created or for each thread when new modules are loaded with TLS variables; that's why I feel like these two issues are related and can have symmetric solutions).

When are we comfortable to have a feature that can't polyfill?

At least for V1 everything will be polyfillable to JavaScript. Later on we'll add features, and some may not be polyfillable (or not polyfillable efficiently). We'll of course polyfill when possible.

When are we OK shipping a feature to Web Assembly that can't polyfill to JavaScript?

When there is such a feature, what do we do? Toolchains should make it obvious that using certain features will mean no polyfill.

We should make this explicit in our documentation before going public.

Discuss implementation defined behavior

Opening this bug so I go back and write documentation about this.

We want to avoid all forms of undefined behavior which can lead to nasal demons, and instead discuss how the wasm platforms allows for implementation defined behavior and what acceptable behavior is.

C/C++ UB is progressively refined by the compiler, and can be affected by tools such as sanitizers. The wasm platform then nails down some behaviors and leaves other open to the implementation. The implementation can then decide, based on the OS/ISA it's executing on, what the behavior is.

Note that behaviors include: "what happens if an enum is out of range", "shift by bitwidth or larger", "what do out-of-bounds accesses do", "what about unaligned accesses", "data races", and much more exciting things!

As a reference PNaCl has a non-comprehensive list of undefined behavior.

Short-circuiting conditionals (&&, ||)

In a discussion about control flow it came up that asm.js currently doesn't support && or ||. The resulting control flow trees are pretty complicated sometimes. I think we should make sure short-circuiting boolean logic is expressible in webasm so that the resulting AST is denser and shallower. Thoughts?

EDITED: clarity

Why WebAssembly?

The current repository does a good job of explaining how we'll build WebAssembly, but it's not great at saying why in an obvious place. We've got some explanation in binary encoding, FAQ, and use cases.

Any suggestions on how to make the why easier to find?

As we get more data we should also detail the gains WebAssembly provides (transfer / decoding / translation / runtime speed, multi-language support, ...).

Macro layer sketch for discussion, with implications for plain binary format

Having a "macro layer" that sits in-between the raw binary encoding and generic compression (e.g., lzham) has been something we've all talked about for a while now. Details for how it works determine what size-optimizations make sense in the raw binary encoding. In particular, a question I've had for a while is how small can a use of a macro be; if it can be a single byte, then maybe we can obviate the need for any optimizations that try to fold immediates into opcodes (even the modest Java 0-3 versions) which would be rather elegant.

Here's one scheme I was thinking about that achieves this. (Forget the context-dependent index space idea, I'll mention that at the end.) I'll start with just nullary macros (#define FOO (x+1) as opposed to #define FOO(x) ((x)+1):

  1. A new "macros" module section contains a sequence of macro bodies which are simply little ASTs encoded the same way as they'd be in a function body.
  2. A use of a macro names the macro by its index in the sequence and the "semantics" of a macro use are to simply splat the body (without any recursive macro expansion) in place. Macros can also only be used in function bodies.
  3. Assumption: we don't want to "reserve" any indices/bits in the raw binary format since, after macro expansion, these would just be wasted. Also, this is clean separation of layers.
    • That means the macro layer has to use some sort of escape sequence
    • A simple two-byte scheme analogous to C string escapes is to reserve one byte value (say 255) for "macro use". Upon encountering this byte, the next var-length-int is the index of the macro. One such macro could be the opcode 255 (remember, no recursive expansion).
  4. To offer a one-byte macro form, we could instead reserve the top bit: if it is set, that means a macro use and the next 7 bits would either be:
    • a sentinel value to indicate that the macro index didn't fit in this byte, so the index is the following var-length-int.
    • a macro index 0-126.

What do we gain? This gives us a way to express 127 of the most common subtrees in a single byte. While there is an expected exponential die-off of hotness of opcodes, I expect we'll find that there are easily 127 hot subtrees in a module. In fact, based on this logic, we may want to reserve more than 127 values for macro indices. Importantly, the v.1 plan to implement the macro layer in user wasm code itself (a polyfill without native support) would allow us to experiment in this space a lot even after v.1 has shipped.

With this, we don't need a single-byte GetLocal or LiteralImmediate forms; these can just be macros. Also, constant pools wouldn't be necessary. One temporary downside is that, as long as a polyfill is used for macros, the expanded form will take up more memory which will hurt decode time (BinaryEncoding.md suggests the size hit would be around 26%, so not terrible). However, with a native macro layer, the macro body doesn't have to be copied, the native decoder can just recurse into the macro body, so this issue would disappear (and indeed get much better if the macro savings were significant).

Considering non-nullary macros: a macro could declare its arity in the module table and then, upon encountering a macro, the following n-arity subtrees would be passed as arguments. This doable but rather more complex. A first-order approximation I'm much more interested in would be to only allow macros parameterized over local variable indices. That is, you could define #define ADD1(x) (x+1) with the limitation that x had to be a variable name. The macro body could use some special macro-layer-only GetMacroArg op (erased by the macro layer) to refer to x in the above example. More experimentation is needed, of course.

Given all this, the effect of context-dependent index spaces would be to allow the index space of i32 (almost always the hottest space) to be disjoint from the other expression types' spaces and thus more macros would be able to fit in a single byte overall. How much this wins would be highly dependent on the codebase, but it's quite possible it wouldn't matter that much since any truly hot non-int32 expressions would end up being given single-byte macro indices.

Thoughts?

What about subnormals?

Forking from: #141 (comment)

Should denormal be:

  • Unspecified (say so in Nondeterminism.md).
  • Fully specified as IEEE-754 compliance.
  • Fully specified as IEEE-754 compliance for scalars, and do something for SIMD because existing hardware (especially ARMv7 NEON) doesn't support denormals.
  • Specified as DAZ/FTZ (not IEEE-754 compliant).

We should probably let ourselves change this from developer feedback, but I'd like to make some decision for MVP. I suggested on esdiscuss that JavaScript go full unspecified, and just do DAZ/FTZ because it's often faster. Yes, x86 is better than it used to be but that's not universal, ignores current hardware, and doesn't look towards what new hardware will do. I like leaving the door open :-)

For @sunfishcode searchability, I'll use the word "subnormals" too :-)

Reference Interpreter

During our meeting yesterday, we discussed some future stuff, including having a reference interpreter which could be used as an oracle. I was talking about this with @LouisLaf afterwards, and now I'm curious what we have in mind for this. A standalone reference interpreter for a pure wasm environment wouldn't be so difficult for MVP, but when we add in features like GCable objects and threading it seems like a reference interpreter will become pretty big burden. So is the plan to actually support this, GC and all, or am I thinking about this in the wrong way?

Should WebAssembly have spooky action at a distance?

(forked from #75 and #102, especially comment thread with @sunfishcode and @BrendanEich).

Should Web Assembly allow implementations which don't fit the traditional JavaScript security model? Can it run in a separate process (à la PNaCl), with NaCl-style SFI, with some other control-flow integrity approach, or in general be able to run outside the browser (say, node.js) where different threat models and performance concerns prevail?

So far we've discussed tying the spec to a non-browser shell which will allow us to have a small testable implementation, which we can cross-test with in-browser implementations.

Allowing this efficiently means that Web Assembly exposes more implementation-defined behavior (e.g. don't guarantee a trusted call stack).

This isn't suggesting full-on nasal demons! Implementation-defined behavior isn't undefined behavior, and if specified properly it lets programs shoot themselves in the foot without taking the user's feet along.

@BrendanEich thinks this is scope-creep. Agreed, though I think the upsides are worthwhile.

@sunfishcode suggests node.js should just use native. I think Web Assembly's format is very compelling (portable, fast, well-supported, ...) and it would be a shame to turn node.js away from using Web Assembly.

I also think that Web Assembly should be able to run on small devices, or in phones/tablets, without expecting an entire browser to be present. Feature detection and dynamic linking will be our friends here!

What is Portability.md saying about APIs?

Portability.md currently has this paragraph:

Developer-exposed APIs (such as POSIX) are expected to be portable at a
source-code level through WebAssembly libraries which will use
feature detection. These libraries aren't necessarily
standardized: WebAssembly will follow the
extensible web manifesto's lead and expose
low-level capabilities that expose the possibilities of the underlying platform
as closely as possible. WebAssembly therefore standardizes a lower-level
abstraction layer, and expects libraries to offer portable APIs.

I'm not clear on what this paragraph is saying. What is this lower-level abstraction layer that WebAssembly will be standardizing here? Is POSIX meant to be an example of a higher-level API or a lower-level API, in Extensible Web Manifesto terms?

Cite pre-existing research

Before going public we should cite pre-existing research. We're not inventing in a vacuum, it should be apparent in our documentation.

Alignment / undefined behavior for loads & stores

Related to #23 , the current proposed bytecode has a single load op and a single store op.
The load/store ops as proposed accept addresses in bytes (not elements).

Discussion with @lukewagner and @kripken in IRC suggests that the currently intended behavior is for aligned loads/stores to work, and for unaligned loads/stores to have undefined behavior, most likely one of the following:

  • The address is silently truncated to be aligned
  • The operation fails (unaligned store is discarded; unaligned load returns garbage)
  • All loads/stores involve a byte-wise memcpy to/from a temporary aligned buffer
  • The operation just works because we're compiling down to a native instruction set where aligned/unaligned are (at least in terms of instruction set, if not performance) equivalent

As proposed this would mean one of two things for the polyfill:

  • The polyfill fails to run any applications that perform unaligned loads/stores
  • All applications are slow in the polyfill because it has to generate unaligned handling (i.e. memcpy) for all heap accesses.

I believe this is unacceptable. I think we need a reasonable solution for this problem, so that the vast majority of applications can work in the polyfill and have acceptable performance. In the long run people will be running wasm applications via a native VM, at which point this won't matter, but it's my belief that the polyfill will stick around for years so it will still be important that these basic scenarios work acceptably.

My preferred solution would be to have 'aligned' and 'unaligned' load/store operations, similarly to movups/movaps on x86 and mirroring what is currently expressible in JS (HEAPU32[i] vs HEAPU32[i >> 2] vs (memcpy(temp, i, 4), temp[0])). This would allow compilers to produce code that will be functional in all cases, and efficient in scenarios where unaligned and aligned accesses are equivalent. In scenarios where they are not, this 'safe' code would still run, at some performance cost. In cases where performance is desirable, a compiler can generate aligned accesses that would have undefined behavior in edge cases, as we have in the current C/C++ spec.

Essentially this would allow the high performance of native C/C++ to be maintained by generating fast, unsafe 'aligned' operations, which matches the current behavior of emscripten and is conformant to the spec. Simultaneously it would allow languages without this UB black hole, like C#, to generate code that uses aligned stores in most cases and unaligned stores in the places where it needs them.

There are some alternate solutions here to consider, for example:

  • A 'fast'/'safe' mode switch for the polyfill, where 'fast' mode introduces silent undefined behavior for loads/stores that happen to not be aligned, and 'safe' mode makes all heap operations intolerably-slow-but-correct. This would at least provide a path forward where applications with unaligned accesses will function.
  • Users ship their own polyfill that does the right thing.

The proposed behavior isn't a divergence from current JS, since unaligned accesses against typed arrays don't exist. On the other hand, there are enough scenarios where unaligned accesses matter (file IO, byte-wise unpacking - like the proposed binary format - and GPU buffers among others) that I think we should handle it with care.

Anticipate differential updates to applications

Dynamic linking #53 will help reduce transfer size, but won't fix the differential update problem, e.g.: application fetches wasm.google.com/libm.1.2.so and already has wasm.google.com/libm.1.1.so, and could receive a differential update from 1.1 to 1.2 instead of downloading all of 1.2.

Web Assembly probably shouldn't be the one fixing this issue, maybe HTTP should, but it's a great usecase to move the web platform forward.

EDITS:

  • @kg points out later that differential updates may interact with small binary size. We should collect data on this to avoid regretting design decisions later.
  • @slightlyoff suggests that ServiceWorkers could help with differential updates.

As a reference, asm.js can do this. Trendy Entertainment does this by caching the string to the asm.js module in indexdb and computing their own patches.

It's possible that the Streams API may also help with this.

h/t @kg for raising this issue!

Document specific usecases

We currently document features we want to support, but don't have usecases that would speak to potential users who read our repo, and get them interested in participating.

We don't want to skew the repo by overemphasizing e.g. games. We do want something along the lines of "We want to bring you to the web!"

Tilt: a proposal for fast control flow

The goals of this proposal are

  • Keep representing control flow in a structured manner, as we already agree; the structure makes it compressible, easy to optimize on the client, and is nice for polyfilling too.
  • Reduce currently-existing control flow overhead, which we know exists in practice in solutions like Emscripten, and may be impossible to avoid (even if it is typically small).
  • Do so in a way that allows both (1) a super-simple VM implementation, which might not reduce that overhead, but shows this does not force undue complication on browsers, and (2) a clear optimization path to eliminating that overhead entirely.

This proposal introduces two new AST constructs, best explained with an example:

while (1) {
  if (check()) {
    tilt -> L1;
  } else if (other()) {
    tilt -> L2;
  } else {
    work();
  }
}
multiple {
  L1: { l_1(); }
  L2: { l_2(); }
}
next();

What happens here is if check(), then we execute l_1(), and if other() then we execute l_2(); otherwise we do some work() and go to the top of the loop. Conceptually, Tilt doesn't actually affect control flow - it just "goes with the flow" of where things are moving anyhow, but it can "tilt" things a little one way or another (like tilting a pinball machine) at specific locations. Specifically, if control flow reaches a multiple, then the tilt has an effect in picking out where we end up inside the multiple. But otherwise, Tilt doesn't cause control flow by itself.

In more detail, the semantics are fully defined as follows:

  • Define label as a "hidden" local variable on the stack, only visible to the next 2 bullet points.
  • tilt -> X sets label to a numerical id that represents the label X.
  • multiple { X1: { .. } X2: { .. } .. } checks the value of label. If it is equal to the numerical id of one of the labels X1, X2, ... then we execute the code in that label's block, set label to a null value (that is not the id of any label), and exit the multiple. Otherwise (if it is not equal to any of the ids), we just skip the multiple.

Note that Tilt does not move control flow to anywhere it couldn't actually get anyhow. Control flow still moves around using the same structured loops and ifs and breaks that we already have. Tilt only does a minor adjustment to control flow when we reach a multiple. The multiple can be seen as a "switch on labels", and the tilt picks which one we enter. But again, we could have reached any of those labels in the multiple anyhow through structured control flow (and picked which label in the multiple using a switch on the numerical id of the label).

The semantics described above also provide the "super-simple" implementation mentioned in the goals. It is trivial for a VM to implement that - just define the helper variable, set it and test it - and it would be correct; and control flow is still structured. But, it is also possible in a straightforward way to just emit a branch from the tilt to the right place. In the example above, that is perhaps too easy, so consider this more interesting example:

L1: while (1) {
  if (checkA()) {
    tilt -> L2;
  } else {
    if (checkB()) break;
    while (1) {
      work();
      if (checkC()) {
        tilt -> L3;
        break L1;
      }
    }
    never();
  }
}
multiple {
  L2: { print('checkA passed!') }
  L3: { print('checkC passed!') }
}

It is straightforward for the VM to see that tilt -> L2 will reach L2, and tilt -> L3 will reach L3 - note how we need a break after the tilt to achieve that - so it can just emit branches directly. The helper variable overhead can be eliminated entirely.

This idea is modeled on the Relooper algorithm from 2011. There is a proof there that any control flow can be represented in a structured way, using just the available control flow constructs in JavaScript, and using a helper variable like label mentioned in the Tilt semantics, without any code duplication (other approaches split nodes, and have bad worst-case code size situations). The relooper has also been implemented in Emscripten, and over the last 4 years we have gotten a lot of practical experience with it, showing that it gives good results in practice, typically with little usage of the helper variable.

Thus, we have both a solid theoretical result that shows we can represent any control flow - even irreducible - in this way, and experience showing that that helper variable overhead tends to be minimal. In the proposal here, that helper variable is, in essence, written in an explicit manner, which allows even that overhead to be optimized out nicely by the VM.

A note on irreducible control flow: As mentioned, Tilt can represent it, e.g.,

tilt -> L2;
while (1) {
  multiple {
    L1: {
      duffs();
      tilt -> L2;
    }
    L2: {
      device();
      if (done()) break;
      tilt -> L1;
    }
  }
}

This is not surprising as the relooper proof guarantees that it can. And we can also make that irredicuble control flow run at full speed (if the VM optimizes Tilt, instead of doing just the super-simple semantics as its implementation). Still, this is not quite as good as if we directly represented that irreducible control flow as basic blocks plus branches directly, since with Tilt we do need to analyze a little to find that underlying control flow graph. So something like proper tail calls, which can directly represent that graph, may still be useful, but it is debatable - at least a large set of cases of proper tail calls seem to be handled by Tilt (the cases of having heavily irreducible control flow), and without the limitations of proper tail calls like number of parameters. However, there is obviously a lot to debate on that.

In any case, putting aside irreducible control flow and proper tail calls, what Tilt is clearly great for is to eliminate any small amounts of helper variable usage, that occur often in practice, stemming from either small amounts of irreducible control flow (either a goto in the source, or a compiler optimization pass that complexified control flow for some reason), or reducible but complex enough control flow that the compiler didn't manage to remove all helper variable usages (it is an open question whether 100% can be removed in tractable time). Having Tilt in wasm would open the possibility for straightforward and predictable removal of that overhead.

To summarize this proposal, it can be seen as adding an "escape valve" to structured control flow, where code remains still structured, but we can express more complex patterns in a way that is at least easily optimizable without magical VM tricks.

That concludes the core of this proposal. I see two possible large open questions here:

  • The above cannot optimize indirect branches. Some interpreter loops really want that. I'm not sure if we want to get into that now. But it seems that a clear path is available - an "indirect Tilt" would use a variable instead of a label, together with a way to get the numeric id of a label.
  • In the proposal above, we define the semantics in a super-simple way. I like the simplicity and how it also shows how to implement the feature in a trivial way, so this should not slow down development of wasm VMs. But a downside to that is that it allows one to write nonsense like
tilt -> L1;
work();
L2: while (1) {
  more();
  tilt -> L2;
  if (check()) break;
}
multiple {
  L3: { never(); }
}

None of the tilts do anything; the multiple is ignored; just reading that makes me nauseous. But it is valid code to write, even if silly. It might be nice to get a parse-time error on such things, and it isn't hard. But to do so requires that we define what is errored on very precisely, which is a lot more detailed than the very simple (but fully defined!) semantics described above.

edit: that detailed description appears in WebAssembly/spec#33 (comment)

And that brings me to a final point here. We could in principle delay discussion of this to a later date. However, if we do want to add this later, and do want to error on such nonsense as the last example, then how easy and clean it is to define the semantics will depend on decisions that we do make now. In particular, the fewer control flow constructs we have, the easier and cleaner it will be; every construct may need to be involved in the description. Also, we might want to start out with simple constructs that make that easier (I have some ideas for those, but nothing too concrete yet).

In other words, what I am getting at is that if we design our control flow constructs as a whole, together with Tilt, then things will be much nicer than if we try to tack Tilt on later to something designed before it. For that reason, we might want to discuss it now.

Bikeshedding: Suggestions for better names are welcome. Another name I considered was slide -> X, as in "I totally meant to slide there".

Alignment will probably require implementation-defined behavior

It seems that some ARM implementations may ignore the low order bits of unaligned memory accesses and thus round down to the next aligned address. That would mean that every access that the engine cannot prove is properly aligned would need a dynamic check (since these processors won't cause a hardware fault). That may be too slow or too much code.

Would it be reasonable to spec aligned/unaligned accesses thusly?

  • All accesses require alignment to be specified.
  • Load/Store[aligned=true] have implementation-defined behavior when the offset is not actually aligned.
  • Load/Store[aligned=unknown] never have implementation-defined behavior, but may be slow on some architectures when the offset is not actually aligned.

For both kinds of accesses we could specify a sanitizer mode that will trap on Load/Store[aligned=true](actually not aligned) and profile or warn on Load/Store[aligned=unknown](actually not aligned).

The above would allow the engine to omit checks for the [aligned=true] case, accepting whatever the hardware does, but still require it to emit checks for [aligned=unknown] on these processors.

Define function signatures

It would be helpful in reasoning about ABIs if the Call section of the AST document could give some examples of function signature specifications.

How would this proposal handle variable-length parameter lists?
How do compilers place arrays on the stack? Looks like all of the local variable types are scalar.

specifying the C/C++ ABI (but maybe not in v.1)

I've been assuming we'd leave ABI up to compilers, but while thinking through the motivations for specifying a text format in #65, I realized the same line of reasoning applies to ABIs: basically, if there is some convention that everyone will be forced to choose from day 1 (in the case of #65, text projection), then we run the serious risk of having multiple diverging conventions appear which unnecessarily hurts usability of the ecosystem. In the case of ABIs, there have always been multiple ABIs in C/C++, so it's easy to take for granted that ABIs are just an essential annoyance. However, having multiple ABIs is just part of the inherited history of FORTRAN->C->C++ (and the reuse of shared linkers). With WebAssembly, we necessarily have a clean break in ABI, so we have an opportunity.

A question is whether we can safely delay specifying the ABI until after v.1. Since dynamic linking will put the ABI issue front and center, maybe we can just wait to specify the ABI until then. However, I was thinking that even static link libraries might lead to multiple competing ad hoc ABIs. In fact, with the same motivation as ABIs, I was thinking we should specify static link library container formats. This could make downloading and using a library in C++ (almost, headers) as easy as it is with some other languages. In some sense, these improved ergonomics come to us cheap, prepaid by WebAssembly being a portable assembly language.

I realize (and fully embrace) we're trying to be minimal in v.1, but, unlike most of the items we're able to punt to later versions, there is a real, long-term opportunity cost to punting if it leads to fragmentation. That being said, I'd be happy to delay this decision until "late" in v.1, after we have a lot more experience and understanding of the library situation.

Thoughts?

Does WASM share a stack with JS?

Does WASM share a stack with JS? As in, can JS call into WASM and WASM into JS, and all of those invocations occur on the same trusted stack? I think the answer to this speaks directly to fundamental assumptions about what WASM actually is.

Allowing synchronous calls between languages in both directions is equivalent to sharing a stack because of reentrancy and the inability to unwind the stacks independently.

The cost of sharing a stack is that it implicitly requires some meta-spec defining how a "common stack" behaves for WASM and JS. And that doesn't even consider WASM in non-JS environments. How do the following things work with a mixed-language stack?

Exception propagation and handling.
Crash dumps.
Stack inspection for GC.
Coroutines - would need to be supported by all languages on the stack.
Tail call elimination, inlining, etc.

For cases like C and Python, there is a strict embeder / embedee relationship for language mixing that I don't see applying in this case. I can't think of a precedent, but maybe .NET has some examples? I am not familiar.

These issues could be avoided by keeping WASM’s stack separate from JS. This would look like isolates with async messaging. Basically, WASM threads would run in their own "worker" with postMessage, or some sort of program-defined inter-language RPC interface. (This RPC interface could use promises on the JS side to make it more palatable.)

Because WASM threads can block, it would still be possible to emulate sync outcalls from WASM to JS. (And deadlock when talking between WASM modules!) Sync outcalls from WASM could be explicitly supported by the system to allow optimizations (if desired) such as running JS on top of the WASM stack. (But since WASM is always on the bottom of the stack, the spec can pretend as if there is no stack sharing. Coroutines could be supported without taking JS into account, etc.)

Of course, at this point it should be obvious that not sharing takes some of the shine off of the “WASM libraries for JS” use case. Promise-based RPC could not be used to implement JS getters, etc. (Although plumbing APIs such as event handling into WASM as “system APIs” would allow WASM to produce synchronous responses to Web API callbacks. It’s only the user JS => user WASM calls that would need to be async to eliminate stack sharing.)

It might be possible to support stack sharing on some WASM threads but not on others... but that seems like complications would accumulate. You’d still need to specify mixed language stacks, there would be two execution modes, certain functionality wouldn’t work on certain threads, etc.

So, I think the question boils down to this: are we running two interoperable languages in the same VM, or are we running two separate languages that talk to each other? Or are we willing to pay the cost of trying to do both?

I currently believe there should be no stack sharing because trying to share is a huge can of worms we do not want to open. It’s always possible to add sync incalls / stack sharing later and it’s more difficult to take it away. On the other hand, I understand the charm of not making a hard distinction between WASM code and JS code and let them run in the same VM. I just don’t think that’s what we’re working towards. We’re working towards a world where native code can evolve on the web without being constrained by JS. Sharing a stack creates a fundamental coupling between WASM and JS.

JS interop and glue code

There's a gap in our current plans regarding JS interop and JS glue code.

At present, an emscripten-compiled module includes a bunch of JS interop and glue code that wraps the asm.js module. When you load an emscripten-compiled module, the glue code is what runs and does the work of turning that asm.js module into something that is exposed to javascript.

So far it seems that we are assuming that wasm will eliminate the need for most of this glue javascript. I agree with this. However, some of it is going to have to stick around no matter what, because it abstracts away the emscripten ABI and provides a surface the emscripten-compiled C++ can use to talk to the dom. Eliminating that glue will not be feasible until we've specified an ABI and webassembly can interact with GC objects.

At present the solution to these issues would probably be something like this:
Compilers like emscripten generate a wrapper javascript file (foo.wrap.js) next to every wasm module they generate (foo.wasm). Consumers have to import foo.wrap.js, not foo.wasm. The ABI is abstracted by each generated .wrap.js file.
The .wrap.js file exposes its interop code to the webasm... somehow? Right now it'd be passed in to the asm.js module via that globals/imports object, but we're doing away with that and having webassembly pull things in via ES6 imports. So does the webasm module import from the glue JS while the glue JS imports the webasm module?
The glue JS is required for an asm.js/webasm module to be able to interop with some DOM APIs, for example ones that take strings or arrays. Presently this is simple since emscripten just bundles the necessary wrapper code in - when we take that away the glue code will be mandatory for those modules to work.

Aside from the complexity here, there are some real threats:

  • Once dynamic linking is introduced, webasm modules will need a way to import the unwrapped webasm module so they have direct access to the native entry points, instead of the wrapped JS ones being exposed by the wrapper JS module.
  • Developers may tire of the wrapper JS mess and decide to instead use a single common glue/wrapper library that imports all their modules directly. In this scenario, the emscripten ABI becomes a de-facto standard. (I consider this an extremely realistic threat, since that's the approach I went with for my compiler's emscripten interop.)

It's my opinion that we need to spec - even in the MVP - a basic mechanism for bundling a glue/wrapper JS file into a webasm module/executable. The glue/wrapper file will serve as an analogue to how COM type libraries (IDL descriptions) can be embedded into native win32 .dll and .exe files to allow a consumer to import directly from those executables without any supporting files. In native runtime environments (non-JS) and dynamic linking scenarios, the glue JS would be ignored since it serves no purpose.

My rough proposal for this would be:
Every webasm file has an optional 'glue' section that contains an ecmascript module. If provided, this module can be imported by the webasm code via a special name as if it were a regular ES6 module (i.e. import { supportFunction } from 'glue').

The glue module can expose a special function that is invoked after the webasm module has finished loading:
function doExport (nativeModule)
If provided, doExport is invoked and the native webasm module object is passed as a parameter. The return value of doExport is the actual object exported when JS imports the module. This allows you to add glue code as properties onto the asm module object, or return a special wrapper object instead (that hides methods, etc.)

Performance criteria & conflicts

Good performance (by some set of metrics) is an absolute requirement for us to label a set of decisions as v1, and for us to ship a polyfill and native implementation, as I understand it.

We need to get some clarity on what those metrics are, and try to arrive at an understanding about which metrics matter the most to us. There is a balancing act necessary here where it will not be possible to achieve great results on all these metrics - optimizing really well for wire size can have negative impacts on decode speed or memory usage during decoding or streaming decode performance, etc.

There are also some existing assumptions that we're basing our performance planning on here. Most of these appear to be based on evidence, and we just need to document it. I'll enumerate the ones I've heard:

  • It's assumed that there is an efficient way to hook into the module/executable loading pipeline with user-authored JS or asm.js and use that to efficiently do transforms like macro expansion or applying deltas for updates.
  • It's assumed that reliable caching of some form is available, either by caching wasm executables, or writing some code that stores them into indexedDB and loads them later. This is particularly implied by any strategy that involves complex loader code/pre-filtering.
  • (edited for clarity) It's assumed that VMs and polyfills will be able to do streaming compilation of a wasm executable, and that they will want to do this before the file is fully downloaded and without keeping the full executable resident in memory.
  • (edited for clarity) It's assumed that even on mobile, the cost of keeping the wasm executable in memory doesn't cause any functional problems and is insignificant.
  • It's assumed that a polyfill needs to produce asm.js equivalent to what would be shipped now, which means producing a single module and compiling it in one go, and we assume that this is the best path for running these applications on mobile.
  • It's assumed that implementations will want (need?) to AOT compile an entire executable (many of the above assumptions strongly imply this on their own.)

I'm sure there is more I've overlooked, and I suspect a couple of these are partial understandings on my part. Most of them are things I've heard multiple times, however.

As far as performance criteria go, here are the criteria as I generally understand them:

  • Startup time is very important to us. We want excellent first-run startup time AND even better startup time on later runs (cached compilation, etc)
  • Memory usage to compile the decoded representation is extremely important to us. For existing asm.js applications this is a major issue and has already required extensive changes to JS runtimes.
  • Wire size for first run is very important to us (this feeds into first-run startup time). We want the asm to be small over the wire after transport compression. We don't care about pre-compression file size (I think?)
  • (edited for precision) Application run-time performance/throughput must be equal to or superior to asm.js in the mid term.
  • Being able to efficiently debug wasm applications is important to us (i.e. a userspace debugger isn't sufficient, we need a fast native one)
  • Interacting with code outside of the wasm executable must be reasonably efficient. It is unclear whether we are okay with eating some short/mid-term performance hits here in order to improve our design (i.e. having to copy data into/out of the wasm heap instead of aliasing the heap)
  • The decoding process needs to avoid performance/risk landmines - O(N_N_N) complexity, multiple-pass decoding, complex computation, etc.
  • Network traffic and I/O for later runs (downloading updates/patches off the network, loading quickly from cache storage) matters to us... but it's unclear how much it matters, and whether we're okay with letting that work itself out via userspace code & browser improvements.
  • Memory usage to decode the wire format matters to us: We want it to be reasonable.
  • We are okay with sacrificing performance on the compile side of the pipeline in order to meet our criteria on the decode/runtime side. At some point this may have to change (for JIT, etc) but we don't care about it right now.

Once we have a general consensus on all this I'll create a PR to document it.

Discuss dynamic linking

We had a meeting with Moz folks today about dynamic linking. I'm writing up meeting notes, and we made good progress. I'm opening this bug as a self-reminder to get extract documentation from what we discussed.

Note: we're still targeting dynamic linking as a post-V1 feature, but it can affect some of our decisions so early agreement on the general direction will help us save time in the medium term, and avoid warts in the finalized wasm.

Engage other compilers?

When and how should we engage with other compilers for WebAssembly?

At the moment we plan to implement Web Assembly in LLVM because it's an obvious successor to PNaCl and Emscripten. We're not asking for input on the spec itself (the standards body is where this happens!) but we are asking to go into upstream LLVM because it reduces maintenance burden and is better for developers who want to use Web Assembly since they can just use LLVM. We should engage LLVM folks early because saying "Web Assembly will be in LLVM" without getting buy-in isn't quite nice and we think we can be positive contributors to LLVM overall.

Should we engage other C/C++ compilers? Specifically: GCC and MSVC. IMO we want to avoid skewing things towards LLVM too much, and developers win if more platforms support WebAssembly.
@LouisLaf contacted some MSVC folks.
Does anyone have opinions on how to contact GCC folks? Even better, someone who could implement Web Assembly for GCC?

Assumption: WebAssembly will initially target C/C++ because it's what we have experience with, and will support other languages later.

Should load/store be typed and have sext/zext/trunc?

Filing this issue to capture a discussion that occurred in PR #79.

load/store are often spec'd as having an explicit type, and as being able to implicitly sext/zext/trunc their inputs.

I'm wondering if instead load/store should:

  • Be sized to N bytes, and require explicit type cast for load outputs and store input.
  • Require implicit trunc for stores, and sext/zext for loads.

This is different from some compiler IRs, including LLVM's. However, the current stated goal for this format is to keep it simple, and expect the macro layer to capture redundancy. It seems like separating conversion, extension and truncation out of load/store fits the goal.

Keeping this simple doesn't prevent optimization. It's pretty easy to detect this pattern when doing instruction selection, if such an instruction even exists on the target architecture. It does mean the IR has lees magic, and is even easier to spec and implement in a dump interpreter.

Should we use ELF as a container format?

@sunfishcode and I met with Jim Grosbach (Apple) yesterday and discussed WebAssembly. Jim was asking what our container format would be, and suggested that we may want to consider ELF.

He points out that ELF is already supported in LLVM which would make the backend less special.

I'm not particularly familiar with this topic, but I think it's good to list pros/cons of using ELF, and detail why we'd go with it or not.

WebAssembly logo


WebAssembly  Logo  Contest

⚠️⚠️⚠️⚠️⚠️
⚠️ UPDATE ⚠️ ✅ vote for your favorite logo in ➡️ this thread ⬅️
⚠️⚠️⚠️⚠️⚠️


ORIGINAL POST (now closed)


At the very start of the WebAssembly project, @sunfishcode hacked up a logo:
image
It has nice ideas:

  • Arrows coming together to represent "assemble".
  • Arrows coming together can also represent compiling other languages to the web.

It would be nice to have something more fancy / web-y / designer-y, and keep it neutral so it belongs to the web and not one of the browser vendors.

Petr Hosek from the NaCl team proposed using the HTML5 technology class icons.


Here's  where  YOU  come  in!

Reply to this thread with your suggested WebAssembly logo.

We haven't decided how we'll pick the final logo, but it'll definitely be around "stable MVP" time.


WebAssembly abbreviation

Everyone seems good with WebAssembly as a formal name, but there is no single shared abbreviation (mostly my fault, since I started by saying "WebAsm" all the time before some switching to "wasm" and now this crazy proposal... sorry).

So my current inclination is actually to simply use "asm" as the common abbreviation (pronounced "azum" or "azim"). Why? It seems like, when communicating, we're either in a Web context or a general context.

In a web context, "asm" is unambiguous and it contrasts naturally to "JS" and, from what I've seen, references to asm.js almost always includes the ".js". Similarly, we say "workers" and "socket" and "cache", without any preceding "Web"; "asm" seems like sortof the obvious name in this context.

In a general context, "WebAssembly" would be used. Since it's two real words (not a pronunciation of an abbreviation), it has the added benefit of being easy to lookup for those unfamiliar who hear it in passing.

For a while I liked "wasm" (which I do think is a good file extension for binary modules; separate topic) but it seems to be in No Man's Land: not as short as "asm", not as clear as "WebAssembly". The same argument goes for "WebAsm", although "WebAsm" is admittedly fun to say (though less so if the listener isn't used to hearing "asm" as it seems to evoke other phonetically similar words).

Other opinions welcome; I'm not wed to any of this. Sorry to discuss colors of bike sheds, but it seems better now than later.

Feature detection and feature requirement fallback

Re: issue #84, we need a strategy for how to do feature detection and fallback when a required feature is unavailable.

I propose a small set of solutions for this:

First, we want to be able to fall-back on a polyfill when a native implementation of webasm is unavailable. As suggested in #84, we should just have a simple way to detect whether a browser implements the format at all. Querying mimeTypes seems like an adequate solution; another option is to have something like navigator.webasm, since there's a need for a place to put webasm-related methods in the DOM anyway (so that webasm modules can import them, at the very least, just like Math, etc.)

When a native implementation of webasm is available, but one or two new/optional features are missing, the webasm application needs a way to detect this and fallback to alternate implementations of particular functions that depend on it - the equivalent of an x86 application checking for SSE2, SSE3 and using different implementations of math/graphics routines. I think we should solve this by exposing a simple function for doing named feature detection, i.e. something like navigator.webasm.featureSupported('native-simd') or navigator.webasm.instructionSupported('vec4-add'). Feature detection outside in the JS realm (switching between entire executables) would use these DOM-exposed APIs, while feature detection inside a webasm executable would import them and call them and use them inside if statements. An ideal runtime can inline the constant return value of those queries into the executable and entirely kill dead branches, producing 0-runtime-cost fallback.

When a native implementation of webasm is available, but is missing essential application features, we need a way to fall-back onto a polyfill instead of failing. At a bare minimum, this means that the JS needs a way to get the entire body of the application's webasm executable, or any essential sections of it (code section, constant table, etc) from the native decoder. Without this, the application would be downloaded twice and that's a complete non-starter. The cost of decoding the whole application twice is also undesirable, which is why the above feature detection proposal is important. I'd suggest we have something simple here, like navigator.webasm.getModuleSection(module, 'sectionName'), that returns a given named section of a module (assuming we use a format like ELF that has named sections), probably as a Uint8Array (something like Blob would be pretty gross from a performance perspective).

Exposing a way to get at module sections as raw bytes will also support common usage scenarios for sections, like embedding essential resources (shaders, textures, cursors, icons) into executables.

Exposing feature detection and module section access in the DOM ensures that a JS polyfill can implement these APIs and expose them, so a webasm module can successfully use those APIs regardless of the presence of a native implementation.

How this would interact with CORS is a complicated question. I'm not sure there's any way to make CORS work right for polyfilled webasm, and with native webasm the question of how to secure and authenticate module access could be tricky.

Another question is what happens if multiple webasm executables are loaded in a single environment (page, interpreter, etc) - do we enforce that there's only one polyfill that's used to run all the modules? How do we ensure that happens correctly if multiple modules trigger a native-to-JS fallback due to missing features? Can we enforce a one-polyfill restriction if some of our modules are being loaded via CORS (in which case we want to prevent a compromised polyfill from accessing the remote module)?

EDIT: One alternate way to do feature detection would be instead to have a way to query the runtime about whether an entire function was successfully compiled. Then you could simply query the availability of a series of functions and pick the first one that's available, instead of needing to check a long series of features.

Indirect calls: how do they work?

The current AST semantics document states:

Indirect calls may be made to a value of function-pointer type. A function- pointer value may be obtained for a given function as specified by its index in the function table.

  • CallIndirect - call function indirectly
  • AddressOf - obtain a function pointer value for a given function
    Function-pointer values are comparable for equality and the AddressOf operator is monomorphic. Function-pointer values can be explicitly coerced to and from integers (which, in particular, is necessary when loading/storing to the heap since the heap only provides integer types). For security and safety reasons, the integer value of a coerced function-pointer value is an abstract index and does not reveal the actual machine code address of the target function.

In v.1 function pointer values are local to a single module. The dynamic linking feature is necessary for two modules to pass function pointers back and forth.

IIUC this basically is what Emscripten does.

I'd like us to discuss this a bit more to make sure we consider alternatives before choosing a specific approach:

  • Are different implementations of Web Assembly allowed to return different abstract integers for a function pointer?
  • What's the performance cost?
  • Does this have caveats with C++ UB on function pointers that happen to work on most implementations?
  • Does this support C++ pointer to member function sufficiently?
  • How does this interact with dynamic linking, late binding, and relocations?
  • Is this sufficient for non-C++ languages?
    • Does ObjectiveC work properly / efficiently?
    • Multimethods?
  • Can sanitizers (such as control-flow sanitizer) be implemented efficiently (without Web Assembly runtime involvement)?
  • Can the Web Assembly implementation use a sandboxing approach that doesn't rely on a language VM for security?
    • Can this target NaCl efficiently?
    • can this target MinSFI efficiently?
    • Are we hindering future sandboxing research?

Anything else?

Which forms of extending loads and truncating stores do we want?

#135 adds a Load[Int64] and Store[Int64] which presumably load and store with a 64-bit memory location with no extension or truncation. If we do nothing else, we should clarify that this is the case, but we may also consider the following:

Split Load into Int32Load, Int64Load, Float32Load, Float64Load. They'd still support memory types (though the float versions would only support float memory types, and integer versions would only support integer types). This would:

  • make a very clean way to have 8-bit and 16-bit loads which extend directly to Int64, so you don't have to do a 32-bit extending load and then separately extend to Int64
  • add support for a single-operation 32-bit Load that extends to Int64
  • make a nice space for Float16 loads that directly promote to Float32 or Float64 in the future
  • minimize unnecessary differences between 32-bit code and 64-bit code

And analogous changes for Store.

I imagine the main argument against this will be redundancy in functionality. Why have operations that can be built out of other operations and pattern-matched in implementations (since it's often a single instruction)? My answers are that these operators will be very commonly used, and so they will be appreciated by humans reading the code, and they'll be easier for very-simple implementations to emit good code for.

Split up feature list from JavaScript implementation details

The docs currently mix feature list and how things are implemented in asm.js and the polyfill. This is great to inform prioritization and our general direction, but is distracting. We should split them.

We need a section detailing how this is implemented in JavaScript, both for implementors and to convince web enthusiasts that this makes sense, but this should be separate. Mixing JavaScript with features and goals will turn off other developers that we want to attract: we don't want to systematically compromise the design of wasm for JavaScript compatibility's sake. The wasm spec should stand on its own as a sensible spec for folks who don't know about the web's innards and history.

How should 64-bit integers look?

64-bit integers are an "essential post-v1 feature", so we don't need to figure them out right now, but there are some areas where it would help to anticipate them. While thinking about one of @jfbastien's comments in #79, I thought of two major categories of possibilities.

Option A: Explicit 64-bit versus 32-bit WebAssembly

Make WebAssembly code either explicitly 64-bit or 32-bit. In 64-bit wasm, you'd only get 64-bit integers. In 32-bit wasm, you'd only get 32-bit integers. This would simplify a bunch of things.

It's unfortunate to lose "just one WebAssembly", but things were going to get a little weird with 32-bit indices into >4GiB heaps anyway. And C/C++ code is always going to be incompatible between 32-bit and 64-bit. So perhaps this loss isn't as big as it might seem.

32-bit still wants access to 64-bit integers, but an int32-pair approach with special intrinsics to perform add, mul, etc. would be simple and would recover some, though not all, of the benefit.

On x86 at least, 32-bit operations are sometimes smaller and sometimes faster than their 64-bit counterparts, so simple execution engines would loose those optimizations. Execution engines willing to construct SSA form and run a pass could probably recover these optimizations in many cases though. Or optionally, we could introduce explicitly 32-bit operations that operate on 64-bit registers much like x64 and AArch64 have, and then we wouldn't need any cleverness.

This option would also make it harder, though not impossible, to support 64-bit code on 32-bit platforms.

If we end up using ELF containers (#74), this would fit pretty naturally into the division between 32-bit ELF and 64-bit ELF.

Option B: New local type(s)

Introduce i64 as a new local type. We need instructions to convert between integer types, we need load and store to accept either 32-bit or 64-bit indices (when we have >4GiB heaps), load and store would need to be able to extend to and truncate from either 32-bit or 64-bit, execution engines need to support all the arithmetic operations on both 32-bit and 64-bit, and so on. And, it means we can't do register coloring between i32 variables and i64 variables at the wasm level.

If we do this, then some of the reasons for not doing i8 and i16 local types would be diminished as well. Those types are still not highly valuable for arithmetic, so we still may not want them, but if we're going to have multiple integer types, having a few more isn't as much of a burden. And if we added them, it'd likely mean we could get rid of extending loads and truncating stores, because we could just use type conversions instead (at the cost of making execution engines pattern-match these, since extending loads and truncating stores are usually single-instruction operations).

The loss of coloring between i8, i16, and i32 variables would probably be more significant than between i32 and i64 though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.