maximecb / uvm Goto Github PK

Fun, portable, minimalistic virtual machine.

License: Apache License 2.0

Rust 48.99% C 45.84% Assembly 1.62% Shell 0.03% Python 3.51%

emulation virtual-machine bytecode interpreter jit-compiler containerization bytecode-interpreter permacomputing sandboxing emulator

uvm's People

Stargazers

Watchers

Forkers

load1n9 anima-os-dev olimpiu gmh5225 calroc hd-coo7 cyberflamego abdulbahajaj costava tbsp bezerrathm cagix hparker hallucino standardgalactic vasucp1207 miguelvelasquezdev tanmaysachan maaarcocr

uvm's Issues

UVM Wishlist

This is a list of potentially fun things to work on that would be useful for UVM and its software ecosystem. An easy way that you can contribute to UVM is to write example programs, or useful utility code to be shared with the community, and report any bugs or difficulties you run into along the way.

Some ideas:

My intent is to share all examples code under the CC0 1.0 license (a public domain dedication) so that people can get inspired from it and do with it as they please.

And generally speaking, if you write any kind of simple game for UVM, the code is readable, and you're willing to add it to this repo as part of the examples, that could be a useful contribution to the ecosystem :)

Smaller pull requests are easier to review and more likely to get merged quickly.

Global pointer assignment doesn't work

A pointer defined globally is not getting assigned value later in code
Code to reproduce

char* dummy;
void main()
{
    char* hello = "Hello String";
    puts(hello); //works
    char* test = hello;
    puts(test); //works
    dummy = test;
    puts(dummy); //doesn't works prints 0
}

Help us collect nice screenshots!

To help promote UVM, it would be helpful to gather a collection of nice screenshots of the different apps we create. We could include those in the /media directory, in PNG format (or jpeg if PNG doesn't compress well).

Will gladly accept pull requests.

Audio output API

I'd like to have an API to output audio data so people can make games and audio software. I think that audio output is generally more important than input, so we could start with output, and add input later.

We'll need a syscall to open an audio device, with a number of channels and a sample rate. This will interface with SDL audio internally, but the details of SDL APIs should be hidden from software running on UVM.

It might also be useful to have a function that just says "play these sample right now". This could be used for playing back something like a simple sound effect when someone clicks a button, without having to worry about buffers or things like that. This would be useful for simple games as well.

For more advanced usage, we're going to want a way to register a callback to generate audio samples as needed. The SDL API for generating samples calls a function from a separate thread. In order to keep working with a single-threaded event loop model in UVM, we may need to take a VM lock from the audio thread in order to call the user-registered callback inside UVM.

Sketch of syscalls we would minimally need:

audio_open_output(sample_rate, num_channels, flags: u32) -> device_id
audio_output_cb(out_device_id, callback_ptr) register a callback to generate output samples
audio_close_output(device_id)

Input/feedback/help welcome.

Calling memcpy with negative value segfaults

.data;
A: .zero 10;
B: .zero 10;

.code;

push A;
push B;
push -1;
syscall memcpy;

Postfix increment and decrement operator not working

Prefix increment/decrement operator seems to be working fine but postfix increment/decrement operator throws error.
Code to reproduce

int i =0;
while(i < 5) {
    i++;
}

This behaviour is not limited to any type. Was able to reproduce this for pointers was well in similar fashion

Async TCP networking API sketch

I'm in the process of designing a simple TCP networking API that would make it possible to create servers running in UVM. This API works with callbacks that tell you when a new connection request is incoming, or when data is available to read. Then, you can call the listen or read functions to accept a connection or read incoming data. If there is no incoming connection or data, then these functions will blocks, so the API can also be used in a synchronous mode as well. Having to call read functions also makes it so we don't have to preallocate global buffers to receive data, we can simply pass pointers to those buffers when calling the functions to read data.

// Syscall to create a TCP listening socket to accept incoming connections
u64 socket_id = net_listen_tcp(
  u16 port_no,
  ip_space, // IPV4 / IPV6
  const char* net_iface, // Network interface address to listen on, null for any address
  callback on_new_connection, // Called on new incoming connection
  u64 flags // optional flags, default 0
)

The callback to be notified of an incoming connection request has the form:

void on_new_connection(u64 socket_id)

To accept new connections:

// Syscall to accept a new connection
// Gives you the client address in the buffer you define
// Will block if there is no incoming connection request
u64 socket_id = net_accept(u64 socket_id, client_addr_t *client_addr, callback on_incoming_data)

When there is incoming data to be read, we can be notified by a callback:

// Callback to notify you that incoming data is available to read
void on_incoming_data(u64 socket_id, u64 num_bytes)

To read and write data:

// Syscall to read data from a given socket into a buffer you specify
u64 num_bytes_read = net_read(u64 socket_id, void* buffer, u64 buf_len)

// Syscall to write data on a given socket
void net_write(u64 socket_id, void* buffer, u64 buf_len);

Finally, to close the socket:

// Syscall to close a socket
net_close(socket_id)

Headless Version

Hi,

I love the vision of this project and I think it could be something really special.

Is there a timeline for the headless version ? I would love to integrate my own SDF based 2D rendering backend. Personally I think a headless version which works on an internal bitmap and triggers external actions (audio) would be great as one could embed it into any kind of framework (Xcode, pixels etc).

Thanks

Seeing "end of input inside string"

I am seeing an "end of input inside string" error when I add ' in a comment in a macro

#define k 1 // '

int main() {
}

Note: this is low priority as far as I am concerned.

We need an API to read/write to files/streams

UVM needs an API to read/write to files/streams. Ideally the API should be simple and easy to use. It should ideally also be usable for standard input/output, files and network sockets if possible. I'm opening this issue to solicit feedback.

It might make sense to have both a synchronous and asynchronous API with callbacks. UVM has no threads so, it makes more sense for it to process network traffic asynchronously rather than polling. However, when it comes to reading files, it might be fine to read all the data synchronously as this can be very fast and simpler to work with.

We might need different syscalls for opening sockets vs opening files, but the functions for knowing how much data is available to read and to read the actual data could be the same.

Input/feedback/help welcome.

Should a VM even have a notion of 'syscall'?

I'm curious if you considered making each syscall a first-class opcode.

What I've observed is that every new namespace gives you more room to add features. If the goal is stability, it might be useful to eliminate namespaces as far as possible to minimize temptations to add features.

A single namespace of opcodes might be easier for people to learn.

Real processors sometimes introduce a 'syscall' instruction for ease of implementation, but that seems irrelevant for a VM.

I'm curious if there are other considerations here I haven't thought of.

Should the VM provide FP functions such as sin, cos, tan, log, etc?

There's an open question in my mind as to whether UVM should provide the "standard" set of floating-point functions and other functions found in C's math.h as part of the VM itself.

The sin and cos functions are very often used in graphics for example. The argument in favor of this would be that it makes UVM an easier platform to target. The argument against would be that it's possible to implement all of those functions using Taylor series expansions, and CPU-specific cosine instructions on intel CPUs for example aren't particularly fast.

An argument for implementing functions like sin/cos in userspace would be that they run the same on every platform. That being said, this argument might be moot because AFAIK, basic floating-point operations such as add/sub/mul don't produce the same result on every platform due to hardware differences. For instance, Intel hardware can use 80 bits of precision internally whereas other platforms use 64 bits. It's also the case that it's easier for the host system to accelerate an operation like sin/cos with SIMD instructions than it is for code written inside UVM.

At the moment I'm leaning towards the host should probably provide at least a set of core FP math functions such as sin, cos, sqrt, but open to discussing this. It's always possible to add more of these functions later. The functions would probably be added as syscalls under a math subsystem rather than core instructions.

"undeclared identifier" that doesn't appear in the source code.

When compiling d.c:

#include <uvm/graphics.h>

u32
HSVA_to_RGBA(u8 hue, u8 saturation, u8 value, u8 alpha)
{
    u8 chroma = 1;
    u8 m = 1;
    u8 X = 23;
    return rgba32(chroma, m, X, alpha);
}

The compiler generates an error message for an undeclared identifier that doesn't appear in the source:

uvm/ncc % cargo run ./examples/d.c
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/ncc ./examples/d.c`
Error: reference to undeclared identifier "chromalpha"

That's the smallest repro yet, but I haven't tried that hard (yet), e.g. the exact values of the variables don't seem to matter. (I'm in the middle of making breakfast, but I wanted to give you what I had so far. I'll fiddle with it later today (if you haven't already fixed it! :) )

Encoding metadata with programs

This is not the highest priority, but something to think about. Eventually, I think that programs should be able to store metadata. This should probably just be a third segment aside from the code and data segments. We can then leverage the assembler's logic to parse it.

Each field should have a name string (null-terminated?), followed by u32 length, and then some data. The data could be string for string fields, or binary data in the case of something like an app icon.

What kind of info do we want to store in the medatata?

program name (string)
version? (string)
short description (string, up to some number of characters, e.g. 255 chars)
Flags, such as specifying whether JIT compilation support is enabled, jit_enabled
- This allows the program to write to its own code segment while running (implement a JIT)
- May want to specify an address range or min address?
An icon to be used in a GUI, e.g. icon_32x32, RGB24 data
A list of required syscalls for the program to run

Should misaligned loads/stores panic?

Currently, load and store instructions can read/write from any address. My understanding is that this may be inefficient on x86, but it won't cause a crash. However, it can cause a fault on ARM AFAIK.

There's a question as to what UVM should do under the hood. If we want to allow unaligned load/stores, we may have to test for alignment, and use different instructions depending if the load/store is aligned or not.

If we disallow unaligned loads/stores, then we still have to test if the address if aligned or not, and fault if it's not. A JIT compiler may be able to optimize away some or most of these tests.

I'm leaning towards probably disallowing misaligned loads/stores because although this is more strict, it has the advantage that it will force people to write more performant code. Also, if loads and stores are expected to be aligned, then even if we have to test for alignment, we can assume that the address will in fact be aligned (branch not taken), whereas if we allow misaligned loads/stores, we can make no such assumption.

unresolved import `std::os::fd`

global var is being overridden

#include <uvm/syscalls.h>
#include <stdlib.h>
#include <stdio.h>
size_t vm_interned_symbols_capacity = 1024;
char** vm_interned_symbols;

int main() {
  vm_interned_symbols = (char**)(malloc(sizeof(char*)*vm_interned_symbols_capacity));
  memset(vm_interned_symbols, 10, 10);
  
  if(vm_interned_symbols_capacity == 1024) {
    puts("vm_interned_symbols_capacity == 1024\n");
  } else {
    puts("error!!!\n");
  }
}

this prints

error!!!

if I changed the name of vm_interned_symbols_capacity to something like "prefix_vm_interned_symbols_capacity" the problem will go away

"array element types do not match" error with certain literals.

I've been busy the last couple of days, but I sat down this evening to fiddle with fixed point 3D math and discovered something odd while building a table of sin values.

Contents of t.c:

u64 a[2] = {0x7fff8beb, 0x8000e82a};

Result of attempting to compile:

uvm/ncc % cargo run ./examples/t.c
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/ncc ./examples/t.c`
Error: array element types do not match

I tried with 0xffffffff and 0x100000000 on the theory that it might have something to do with having literal values of both 31 and 32 bits, respectively, in the same initializer. But with that pair of initial values the program compiles without the error message.

Unable to recover from misinput to read_i64

The read_i64 syscall has no way to recover from invalid input.
The program crashes.
This would likely be frustrating to users of long-running programs where mistyping "4r" instead of "4" will bring the whole program down.

Recovery would mean some action like using a default value instead and/or being able to retry.

The syscalls doc states that each syscall outputs either no value or a single value onto the stack.

In the event of invalid input:

read_i64 could not push onto the stack, but returning is fixed as void or not void, and if it wasn't, there would still need to be a way to check if the stack has grown or not, otherwise would be checking against a sentinel value put onto the stack.
read_i64 could return some sentinel value in the event of invalid input.
read_i64 could take a single argument, a "default" value, that is returned in the event of invalid input.

It would be nice if sentinel values can be avoided, and if invalid input could be known without having to assume that the user has not entered the default value.

Option 1 is not possible with the current constraints (not that the constraints should be changed), and 2 and 3 have shortcomings. I'm sure someone can determine a better setup.

maximecb / uvm Goto Github PK

uvm's People

Stargazers

Watchers

Forkers

uvm's Issues

Recommend Projects

Recommend Topics

Recommend Org