Code Monkey home page Code Monkey logo

uvm's Introduction

UVM

NOTE: this project is very much a work in progress. You're likely to run into bugs and missing features. I'm looking for collaborators who share the vision and want to help me make it happen.

   

A minimalistic virtual machine designed to run self-contained applications. UVM is intended as a platform to distribute programs that will not break and to combat code rot. It also aims to be conceptually simple, easy to understand, easy to target, fun to work with and approachable to newcomers. It may also be valuable as a teaching tool. There is a short 4-minute overview of UVM on YouTube if you'd like to see a quick survey.

Contents:

If you think that UVM is cool, you can support my work via GitHub Sponsors ❤️

Features

Current features:

  • Stack-based bytecode interpreter
  • Variable-length instructions for compactness
  • Untyped design for simplicity
  • Little-endian byte ordering (like x86, ARM & RISC-V)
  • 32-bit and 64-bit integer ops, 32-bit floating-point support
  • Separate flat, linear address spaces for code and data
  • Built-in, easy to use assembler with a simple syntax
  • Event-driven event execution model compatible with async operations
  • Easy to use frame buffer to draw RGB graphics with no boilerplate
  • Easy to use audio output API with no boilerplate

Planned future features:

  • Async file and network I/O with callbacks
    • Synchronous I/O possible as well
  • Fast JIT compiler based on dynamic binary translation and basic block versioning
    • Expected performance ~80% of native speed (maybe more?)
    • Near-instant warmup
  • Permission system to safely sandbox apps without granting access to entire computer
  • Ability to compile without SDL and without graphics/audio for headless server-side use
  • Ability to encode metadata such as author name and app icon into app image files
  • Ability to suspend running programs and save them to a new app image file

Build Instructions

Dependencies:

Installing Rust and SDL2 on macOS

Install the SDL2 package:

brew install sdl2

Add this to your ~/.zprofile:

export LIBRARY_PATH="$LIBRARY_PATH:$(brew --prefix)/lib"

Install the Rust toolchain:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Installing Rust and SDL2 on Debian/Ubuntu

Install the SDL2 package:

sudo apt-get install libsdl2-dev

Install the Rust toolchain:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Installing Rust and SDL2 on Windows

Follow the Windows-specific instructions to install the Rust toolchain.

Get SDL2.dll from one of SDL2 Releases.

Copy SDL2.dll (unzip) to the vm/ folder.

Compiling the Project

cd vm
cargo build

To run an asm file with UVM:

cargo run examples/fizzbuzz.asm

There is also a toy C compiler in the ncc directory, along with many example C programs that run on UVM:

cd ncc
./build_and_run.sh examples/snake.c

Running the Test Suite

Run cargo test from the vm, and ncc directories.

Codebase Organization

The repository is organized into a 3 different subprojects, each of which is a Rust codebase which can be compiled with cargo:

  • /vm : The implementation of the UVM virtual machine itself
  • /ncc: An implementation of a toy C compiler that outputs UVM assembly
  • /api: A system to document and automatically export bindings for UVM system calls and constants.
    • /api/syscalls.json: Declarative list of system calls exposed by UVM.

The ncc compiler is, at the time of this writing, incomplete in that it lacks some C features and the error messages need improvement. This compiler was implemented to serve as an example of how to write a compiler that targets UVM, and to write some library code to be used by other programs. Over time, the ncc compiler will be improved. Despite its limitations, it is still usable to write small programs. Contributions to it are welcome.

The api directory contains JSON files that represent a declarative list of system calls, constants and the permission system that UVM exposes to programs running on it. This is helpful for documentation purposes, or if you want to build a compiler that targets UVM. The directory also contains code that automatically generates markdown documentation, Rust constants and C definitions for system calls.

Open Source License

The code for UVM, NCC and associated tools is shared under the Apache-2.0 license.

The examples under the vm/examples and ncc/examples directories are shared under the Creative Commons CC0 license.

Contributing

There is a lot of work to be done to get this project going and contributions are welcome.

A good first step is to look at open issues and read the available documentation. Another easy way to contribute is to create new example programs showcasing cool things you can do with UVM, or to open issues to report bugs. If you do report bugs, please provide as much context as possible, and the smallest reproduction you can come up with.

You can also search the codebase for TODO or FIXME notes:

grep -IRi "todo" .

In general, smaller pull requests are easier to review and have a much higher chance of getting merged than large pull requests. If you would like to add a new, complex feature or refactor the design of UVM, I recommend opening an issue or starting a discussion about your proposed change first.

Also please keep in mind that one of the core principles of UVM is to minimize dependencies to keep the VM easy to install and easy to port. Opening a PR that adds dependencies to multiple new packages and libraries is unlikely to get merged. Again, if you have a valid argument in favor of doing so, please open a discussion to share your point of view.

uvm's People

Contributors

abdulbahajaj avatar calroc avatar costava avatar maximecb avatar neauoire avatar olimpiu avatar rudxain avatar tkchia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uvm's Issues

"array element types do not match" error with certain literals.

I've been busy the last couple of days, but I sat down this evening to fiddle with fixed point 3D math and discovered something odd while building a table of sin values.

Contents of t.c:

u64 a[2] = {0x7fff8beb, 0x8000e82a};

Result of attempting to compile:

uvm/ncc % cargo run ./examples/t.c
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/ncc ./examples/t.c`
Error: array element types do not match

I tried with 0xffffffff and 0x100000000 on the theory that it might have something to do with having literal values of both 31 and 32 bits, respectively, in the same initializer. But with that pair of initial values the program compiles without the error message.

Unable to recover from misinput to read_i64

The read_i64 syscall has no way to recover from invalid input.
The program crashes.
This would likely be frustrating to users of long-running programs where mistyping "4r" instead of "4" will bring the whole program down.

Recovery would mean some action like using a default value instead and/or being able to retry.

The syscalls doc states that each syscall outputs either no value or a single value onto the stack.

In the event of invalid input:

  1. read_i64 could not push onto the stack, but returning is fixed as void or not void, and if it wasn't, there would still need to be a way to check if the stack has grown or not, otherwise would be checking against a sentinel value put onto the stack.

  2. read_i64 could return some sentinel value in the event of invalid input.

  3. read_i64 could take a single argument, a "default" value, that is returned in the event of invalid input.

It would be nice if sentinel values can be avoided, and if invalid input could be known without having to assume that the user has not entered the default value.

Option 1 is not possible with the current constraints (not that the constraints should be changed), and 2 and 3 have shortcomings. I'm sure someone can determine a better setup.

Should a VM even have a notion of 'syscall'?

I'm curious if you considered making each syscall a first-class opcode.

What I've observed is that every new namespace gives you more room to add features. If the goal is stability, it might be useful to eliminate namespaces as far as possible to minimize temptations to add features.

A single namespace of opcodes might be easier for people to learn.

Real processors sometimes introduce a 'syscall' instruction for ease of implementation, but that seems irrelevant for a VM.

I'm curious if there are other considerations here I haven't thought of.

Async TCP networking API sketch

I'm in the process of designing a simple TCP networking API that would make it possible to create servers running in UVM. This API works with callbacks that tell you when a new connection request is incoming, or when data is available to read. Then, you can call the listen or read functions to accept a connection or read incoming data. If there is no incoming connection or data, then these functions will blocks, so the API can also be used in a synchronous mode as well. Having to call read functions also makes it so we don't have to preallocate global buffers to receive data, we can simply pass pointers to those buffers when calling the functions to read data.

// Syscall to create a TCP listening socket to accept incoming connections
u64 socket_id = net_listen_tcp(
  u16 port_no,
  ip_space, // IPV4 / IPV6
  const char* net_iface, // Network interface address to listen on, null for any address
  callback on_new_connection, // Called on new incoming connection
  u64 flags // optional flags, default 0
)

The callback to be notified of an incoming connection request has the form:

void on_new_connection(u64 socket_id)

To accept new connections:

// Syscall to accept a new connection
// Gives you the client address in the buffer you define
// Will block if there is no incoming connection request
u64 socket_id = net_accept(u64 socket_id, client_addr_t *client_addr, callback on_incoming_data)

When there is incoming data to be read, we can be notified by a callback:

// Callback to notify you that incoming data is available to read
void on_incoming_data(u64 socket_id, u64 num_bytes)

To read and write data:

// Syscall to read data from a given socket into a buffer you specify
u64 num_bytes_read = net_read(u64 socket_id, void* buffer, u64 buf_len)

// Syscall to write data on a given socket
void net_write(u64 socket_id, void* buffer, u64 buf_len);

Finally, to close the socket:

// Syscall to close a socket
net_close(socket_id)

We need an API to read/write to files/streams

UVM needs an API to read/write to files/streams. Ideally the API should be simple and easy to use. It should ideally also be usable for standard input/output, files and network sockets if possible. I'm opening this issue to solicit feedback.

It might make sense to have both a synchronous and asynchronous API with callbacks. UVM has no threads so, it makes more sense for it to process network traffic asynchronously rather than polling. However, when it comes to reading files, it might be fine to read all the data synchronously as this can be very fast and simpler to work with.

We might need different syscalls for opening sockets vs opening files, but the functions for knowing how much data is available to read and to read the actual data could be the same.

Input/feedback/help welcome.

UVM Wishlist

This is a list of potentially fun things to work on that would be useful for UVM and its software ecosystem. An easy way that you can contribute to UVM is to write example programs, or useful utility code to be shared with the community, and report any bugs or difficulties you run into along the way.

Some ideas:

  • C code for line drawing
  • Text/font rendering code
  • Basic printf implementation in ncc/include/stdio.h
  • A simple text editor running inside UVM
  • Retro-style BASIC interpreter/REPL (read-eval-print loop)
  • Rotating 3D wireframe cube example.
  • Demoscene-style fire effect
  • Demoscene-style plasma effect
  • 2.5D raycasting example
  • More demoscene-style effects
    • Tunnel effect, rotozoom, etc?
  • Function to fill anti-aliased circles (/ncc/include/uvm/graphics.h)
  • Fun examples making use of the audio output API
  • Simple 2D platformer game
  • Port implementation of BLAKE3 or better crypto-safe hashing function
    • This will be used in a networking context
  • C code for flat-shaded polygon (triangle) rendering
  • Create a UVM logo, drawn using UVM
  • Recreation of the amiga boing ball demo/example ❤️
  • Get the Doom shareware code running on UVM?

My intent is to share all examples code under the CC0 1.0 license (a public domain dedication) so that people can get inspired from it and do with it as they please.

And generally speaking, if you write any kind of simple game for UVM, the code is readable, and you're willing to add it to this repo as part of the examples, that could be a useful contribution to the ecosystem :)

Smaller pull requests are easier to review and more likely to get merged quickly.

Should misaligned loads/stores panic?

Currently, load and store instructions can read/write from any address. My understanding is that this may be inefficient on x86, but it won't cause a crash. However, it can cause a fault on ARM AFAIK.

There's a question as to what UVM should do under the hood. If we want to allow unaligned load/stores, we may have to test for alignment, and use different instructions depending if the load/store is aligned or not.

If we disallow unaligned loads/stores, then we still have to test if the address if aligned or not, and fault if it's not. A JIT compiler may be able to optimize away some or most of these tests.

I'm leaning towards probably disallowing misaligned loads/stores because although this is more strict, it has the advantage that it will force people to write more performant code. Also, if loads and stores are expected to be aligned, then even if we have to test for alignment, we can assume that the address will in fact be aligned (branch not taken), whereas if we allow misaligned loads/stores, we can make no such assumption.

Seeing "end of input inside string"

I am seeing an "end of input inside string" error when I add ' in a comment in a macro

#define k 1 // '

int main() {
}

Note: this is low priority as far as I am concerned.

Encoding metadata with programs

This is not the highest priority, but something to think about. Eventually, I think that programs should be able to store metadata. This should probably just be a third segment aside from the code and data segments. We can then leverage the assembler's logic to parse it.

Each field should have a name string (null-terminated?), followed by u32 length, and then some data. The data could be string for string fields, or binary data in the case of something like an app icon.

What kind of info do we want to store in the medatata?

  • program name (string)
  • version? (string)
  • short description (string, up to some number of characters, e.g. 255 chars)
  • Flags, such as specifying whether JIT compilation support is enabled, jit_enabled
    • This allows the program to write to its own code segment while running (implement a JIT)
    • May want to specify an address range or min address?
  • An icon to be used in a GUI, e.g. icon_32x32, RGB24 data
  • A list of required syscalls for the program to run

Help us collect nice screenshots!

To help promote UVM, it would be helpful to gather a collection of nice screenshots of the different apps we create. We could include those in the /media directory, in PNG format (or jpeg if PNG doesn't compress well).

Will gladly accept pull requests.

Global pointer assignment doesn't work

A pointer defined globally is not getting assigned value later in code
Code to reproduce

char* dummy;
void main()
{
    char* hello = "Hello String";
    puts(hello); //works
    char* test = hello;
    puts(test); //works
    dummy = test;
    puts(dummy); //doesn't works prints 0
}

"undeclared identifier" that doesn't appear in the source code.

When compiling d.c:

#include <uvm/graphics.h>

u32
HSVA_to_RGBA(u8 hue, u8 saturation, u8 value, u8 alpha)
{
    u8 chroma = 1;
    u8 m = 1;
    u8 X = 23;
    return rgba32(chroma, m, X, alpha);
}

The compiler generates an error message for an undeclared identifier that doesn't appear in the source:

uvm/ncc % cargo run ./examples/d.c
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/ncc ./examples/d.c`
Error: reference to undeclared identifier "chromalpha"

That's the smallest repro yet, but I haven't tried that hard (yet), e.g. the exact values of the variables don't seem to matter. (I'm in the middle of making breakfast, but I wanted to give you what I had so far. I'll fiddle with it later today (if you haven't already fixed it! :) )

Should the VM provide FP functions such as sin, cos, tan, log, etc?

There's an open question in my mind as to whether UVM should provide the "standard" set of floating-point functions and other functions found in C's math.h as part of the VM itself.

The sin and cos functions are very often used in graphics for example. The argument in favor of this would be that it makes UVM an easier platform to target. The argument against would be that it's possible to implement all of those functions using Taylor series expansions, and CPU-specific cosine instructions on intel CPUs for example aren't particularly fast.

An argument for implementing functions like sin/cos in userspace would be that they run the same on every platform. That being said, this argument might be moot because AFAIK, basic floating-point operations such as add/sub/mul don't produce the same result on every platform due to hardware differences. For instance, Intel hardware can use 80 bits of precision internally whereas other platforms use 64 bits. It's also the case that it's easier for the host system to accelerate an operation like sin/cos with SIMD instructions than it is for code written inside UVM.

At the moment I'm leaning towards the host should probably provide at least a set of core FP math functions such as sin, cos, sqrt, but open to discussing this. It's always possible to add more of these functions later. The functions would probably be added as syscalls under a math subsystem rather than core instructions.

global var is being overridden

#include <uvm/syscalls.h>
#include <stdlib.h>
#include <stdio.h>
size_t vm_interned_symbols_capacity = 1024;
char** vm_interned_symbols;

int main() {
  vm_interned_symbols = (char**)(malloc(sizeof(char*)*vm_interned_symbols_capacity));
  memset(vm_interned_symbols, 10, 10);
  
  if(vm_interned_symbols_capacity == 1024) {
    puts("vm_interned_symbols_capacity == 1024\n");
  } else {
    puts("error!!!\n");
  }
}

this prints

error!!!

if I changed the name of vm_interned_symbols_capacity to something like "prefix_vm_interned_symbols_capacity" the problem will go away

Postfix increment and decrement operator not working

Prefix increment/decrement operator seems to be working fine but postfix increment/decrement operator throws error.
Code to reproduce

int i =0;
while(i < 5) {
    i++;
}

This behaviour is not limited to any type. Was able to reproduce this for pointers was well in similar fashion

Headless Version

Hi,

I love the vision of this project and I think it could be something really special.

Is there a timeline for the headless version ? I would love to integrate my own SDF based 2D rendering backend. Personally I think a headless version which works on an internal bitmap and triggers external actions (audio) would be great as one could embed it into any kind of framework (Xcode, pixels etc).

Thanks

Audio output API

I'd like to have an API to output audio data so people can make games and audio software. I think that audio output is generally more important than input, so we could start with output, and add input later.

We'll need a syscall to open an audio device, with a number of channels and a sample rate. This will interface with SDL audio internally, but the details of SDL APIs should be hidden from software running on UVM.

It might also be useful to have a function that just says "play these sample right now". This could be used for playing back something like a simple sound effect when someone clicks a button, without having to worry about buffers or things like that. This would be useful for simple games as well.

For more advanced usage, we're going to want a way to register a callback to generate audio samples as needed. The SDL API for generating samples calls a function from a separate thread. In order to keep working with a single-threaded event loop model in UVM, we may need to take a VM lock from the audio thread in order to call the user-registered callback inside UVM.

Sketch of syscalls we would minimally need:

  • audio_open_output(sample_rate, num_channels, flags: u32) -> device_id
  • audio_output_cb(out_device_id, callback_ptr) register a callback to generate output samples
  • audio_close_output(device_id)

Input/feedback/help welcome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.