<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Should we use ELF as a container format? about design HOT 21 CLOSED

webassembly commented on June 3, 2024

Should we use ELF as a container format?

from design.

Comments (21)

qwertie commented on June 3, 2024 1

Supposing that a custom format were chosen... why not base it on protocol buffers? At least for the outer portions of the container that are not performance critical. This should would make it relatively easy for third parties (non-browsers) to ensure they are reading and writing the format correctly.

from design.

sunfishcode commented on June 3, 2024

ELF has some useful properties, like the ability to have arbitrary optional sections, to split up text and data into multiple sections for various purposes, and pretty broad existing tooling support.

Downsides of ELF for WebAssembly include:

it embeds concepts like virtual addresses and section/segment offsets in numerous places, which aren't going to map well to WebAssembly
it carries a fair amount of semantic baggage with respect to things like weak symbols and library semantics (although perhaps to some, accurate emulation of ELF semantics are an advantage)

from design.

titzer commented on June 3, 2024

On Tue, May 19, 2015 at 8:11 PM, Dan Gohman [email protected]
wrote:

ELF has some useful properties, like the ability to have arbitrary
optional sections, to split up text and data into multiple sections for
various purposes, and pretty broad existing tooling support.

+0.45

I like ELF, but it will take some more consideration before I'm convinced
it's the right fit for this use case. One big advantage is that there are a
lot of existing tools can work with ELF and it's essentially a known
quantity. Also the notion of initialized .bss section that it is loaded
into a particular address space makes sense for wasm.

Downsides of ELF for WebAssembly include:

it embeds concepts like virtual addresses and section/segment
offsets in numerous places, which aren't going to map well to WebAssembly

ELF basically assumes a single linear address space for all entities,
which doesn't map well to the function table concept, but we could, e.g.
map function table numbers into part of the address space. For
architectures like AVR that have separate code and data spaces, they
typically divide the ELF linear space this way.

it carries a fair amount of semantic baggage with respect to things
like weak symbols and library semantics (although perhaps to some, accurate
emulation of ELF semantics are an advantage)

We could settle on a subset of ELF that would allow simplifying decoders.
E.g. we could just mandate the sections that should be ignored (i.e. we
decree they cannot possibly have semantic meaning), that everything is
32-bit little endian, etc. Then we just get a CPU type constant to indicate
webasm ELF binaries.

—
Reply to this email directly or view it on GitHub
WebAssembly/spec#74 (comment).

from design.

lukewagner commented on June 3, 2024

@titzer specifying a subset of ELF sounds like a good idea. Want to make a PR to add a note on this in BinaryEncoding.md?

Also, maybe I'm missing an angle, but I don't see any conflicts with wasm function pointers. Basically, in the most recent iteration implied in AstSemantics.md#calls, there isn't an explicit function table (the polyfill would synthesize the tables) or a priori indices: you just take the AddressOf(function-index). I do think it would be useful to have, in the up-front function list (the one which gives you all the signatures of functions and their offsets so it's easy to do parallel decoding) a flag that must be set for any function which is passed to AddressOf since this a priori knowledge.

from design.

jfbastien commented on June 3, 2024

Would using ELF also be useful for dynamic linking, especially for the function table (instead of __pso_root that we discussed, and/or the callbacks)? What about PLT and GOT?

We can mangle types and versioning in the symbol names, a la C++. I'd be afraid of using C's approach to mangling though!

from design.

lukewagner commented on June 3, 2024

I think we'll be able to generate slightly faster code if directly support linking functions and globals in WebAssembly. For one thing, an engine might choose to avoid a runtime PLT/GOT altogether by directly patching code. Even w/o direct patching, though, since indirect calls in wasm entail an extra load, a hand-rolled PLT call will be slower than a builtin PLT call; the builtin PLT call is basically benefiting from being able to store the raw function address directly in the PLT and knowing it can't be corrupted.

from design.

sunfishcode commented on June 3, 2024

One of the characteristic features of ELF is the concept of having two views over the file contents.

Sections organize the file contents, and segments specify how the contents are to be mapped into memory. Typically, a segment covers multiple sections, but they really are two independent views over the same data. One of the purposes of this abstraction is so that the OS execve code which loads ELF programs into memory can ignore all the details of sections and just focus on making a few bulk segment mappings into the address space instead of digging into the details of each of the sections individually.

There are ways we can shoehorn WebAssembly into this, perhaps by having a processor-specific (PT_LOPROC+x) segment type which says "here is a byte range of all the WebAssembly functions to be compiled", but this abstraction isn't obviously adding much value, since we already know we want a way to specify where all the functions are, where they each begin and end, up front.

from design.

luser commented on June 3, 2024

The big benefits of using ELF vs. inventing our own would be tooling, I think. As mentioned above LLVM already has ELF support, and we'd be able to use tools like objdump for inspecting .wasm files without much trouble. Even without adding any special code to objdump you'd be able to inspect the data and get raw AST bytes out etc, which is pretty useful stuff. If we registered an ELF machine type and added support for that to binutils we could make things very useful.

There are definitely some ELF concepts that aren't going to map well into wasm, but I think defining a subset should constrain the problem a bit. I think it's worth trying, at least, to avoid reinventing wheels.

from design.

flagxor commented on June 3, 2024

Using ELF as a container has practical advantages in terms of porting. I've encountered a number of projects who's build assume all binaries are ELF binaries. PNaCl not using ELF has required some special casing. A few programs also assume they can use extra sections to embed data in binaries.

from design.

sunfishcode commented on June 3, 2024

One big question here is how we want symbol resolution to work with dynamic libraries. While ELF is theoretically flexible, all the ELF ecosystems I'm aware of use a single-level namespace. Symbol imports can be resolved from any library or executable that defines a symbol with the same name. This is in contrast to Mach-O and PE-COFF, which have two-level namespace schemes, where each symbol import specifies which library it's to be resolved by.

Other strengths or weaknesses of Mach-O and/or PE-COFF aside, everyone I've talked to so far has expressed a preference for WebAssembly to use a two-level namespace (e.g. here). There are open questions, such as how best to "name" libraries to allow desirable flexibility, but I've not yet heard from anyone who thinks that these are unsolvable.

In this light, I propose WebAssembly use a two-level namespace. And if we do, I propose that ELF is therefore not a good fit for WebAssembly. Does anyone disagree? Does anyone have other concerns to raise?

from design.

jfbastien commented on June 3, 2024

Agreed on two-level namespace. Does that necessarily block out all of ELF, though?

Are there other attributes of ELF that we do like?

Which container should we use instead?

from design.

dschuff commented on June 3, 2024

+1 for two-level namespaces... I've heard the same preferences, although we've probably talked to mostly the same people :)
As you say, having that as a starting point doesn't necessarily preclude use of ELF as a container or metadata format, but the mismatch means that we'd have to come up with a way to name imports that would be unfamiliar to tools that are used to dealing with ELF. I'd say this wouldn't necessarily be a dealbreaker if there were otherwise compelling reasons to use ELF.

from design.

sunfishcode commented on June 3, 2024

The main attributes of ELF that we like are that it exists, it works reasonably well for the things it's designed for, a lot of people are familiar with it, it's extensible, and it has a lot of existing tooling support.

ELF also has attributes we don't like, including clutter from historical artifacts and a lot of encoding redundancy. And, some aspects of ELF's extensibility can be a disadvantage too, because it means there could be quite a lot of gratuitous variety that WebAssembly consumers would have to support, for example putting the section or segment headers at the end (or middle!) of the file instead of the beginning, having overlapping segments, or having multiple sections with the same name. We could prohibit things like those, but the more constraints we add the less value we get from the existing ecosystem.

And for WebAssembly, we may have to use custom segment and symbol types to cope with the fact that we have a virtual ISA which isn't a direct encoding of executable text in memory, custom symbol types to represent wasm's global variables, and custom section types to hold various bits of wasm metadata. ELF's extensibility means these things are all possible, but the more special things we add, the less value we get out of leveraging the existing ecosystem.

If we agree on two-level namespacing, that's a significant change from the ELF ecosystem, and in my mind, that combined with the other concerns is enough to justify choosing something else.

I'm not aware of any other existing container formats that are plausible to consider here, so effectively I'm proposing we invent our own container format.

from design.

kg commented on June 3, 2024

there could be quite a lot of gratuitous variety that WebAssembly consumers would have to support, for example putting the section or segment headers at the end (or middle!) of the file instead of the beginning, having overlapping segments, or having multiple sections with the same name

These all seem like they would dramatically increase our security attack surface, and potentially prevent important optimizations like streaming decode/compilation.

from design.

dschuff commented on June 3, 2024

That's why we'd have to specify a subset. Doing that wouldn't negate the benefits of using existing tooling (which would presumably handle the subset just as well as the more commonly-used set), but going to a 2-level namespace would be more of a departure.

from design.

jfbastien commented on June 3, 2024

I'm not sure I understand what the advantages of protobuf would be here. ELF (and other container formats) have much higher-level semantics than protobuf defines. If we define a new format then we want to also have high-level semantics.

Huge protobufs (as this would be) are also not something I'd recommend going for!

from design.

qwertie commented on June 3, 2024

@jfbastien I'm just saying if ELF weren't used because it's decided not to be a good fit, that the custom format could, rather than being entirely ad-hoc, start from an existing general-purpose binary encoding. It doesn't have to be protocol buffers, could be something else... the only other candidate I know of is cap'n proto, which isn't optimized for space, but I'm only suggesting this for outer layers, which I assume aren't size-critical.

The advantage of protobufs, of course, is that they are well-understood and have built-in extensibility. I was not aware that protobufs were unsuitable at large sizes (is this an inherent problem or an issue with the libraries that read or write them?).

from design.

sunfishcode commented on June 3, 2024

Let's move discussion of what a new custom format might look like to new issues. This issue is about whether we should use ELF. I am now proposing above that we say no, but it's still open for discussion at this point.

from design.

sunfishcode commented on June 3, 2024

@flagxor mentioned above about ELF making porting easier in some cases. I think it's worth considering, but I hypothesize that such code is relatively rare, and already not portable, and therefore not more important than the other concerns outlined above.

from design.

MikeHolman commented on June 3, 2024

@sunfishcode I agree with the sentiment (someone somewhere said) that we go forward designing our desired format, and then if ELF fits our criteria, awesome! If not, well that's ok too.

from design.

sunfishcode commented on June 3, 2024

It is looking unlikely that ELF ~~does not~~will fit our criteria for the binary format to ship over the wire. A significant advantage of ELF is reusing existing tools, but existing tools don't support two-level namespaces, which we want. ELF can theoretically be extended to do this, but we'd have to add and maintain this in many tools ourselves. Also, our need to load handle code specially means we can't use PT_LOAD for code, which means (among other reasons) that we can't use conventional ELF linker scripts, or runtime loading logic. And ELF has a lot of redundancy (shdrs vs phdrs, ehdr has generality that ELF in practice doesn't utilize, etc.) and obscure features (STV_INTERNAL, STT_FILE, the ordering of symbols in the symbol table, etc.) making it intimiading to work with from tools that aren't already committed to the ELF ecosystem.

Unless other considerations come to light, let's close this issue so that we can focus on evolving the current emerging custom container format to fit our needs.

from design.

Should we use ELF as a container format? about design HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent