Code Monkey home page Code Monkey logo

go.vm's Introduction

Go Report Card license Release

go.vm

This project is a golang based compiler and interpreter for a simple virtual machine. It is a port of the existing project:

You can get a feel for what it looks like by referring to either the parent project, or the examples contained in this repository.

This particular virtual machine is intentionally simple, but despite that it is hopefully implemented in a readable fashion. "Simplicity" here means that we support only a small number of instructions, and the 16-registers the virtual CPU possesses can store strings and integers, but not floating-point values.

If you want to see a real virtual machine, interpreting a scripting language, which you can embed inside your Golang applications:

Installation

Installation of this tool from source is as simple as:

$ git clone https://github.com/skx/go.vm
$ cd go.vm
$ go install .

If you prefer you can download the latest version directly like so:

$ go install github.com/skx/go.vm@latest

If you don't have the golang compiler/tools installed you can instead fetch binaries from our release page.

Usage

Once installed there are three sub-commands of interest:

  • go.vm compile $file.in
    • Compiles the given program into bytecode.
  • go.vm execute $file.raw
    • Given the path to a file of bytecode, then interpret it.
  • go.vm run $file.in
    • Compiles the specified program, then directly executes it.

So to compile the input-file examples/hello.in into bytecode:

 $ go.vm compile examples/hello.in

Then to execute the resulting bytecode:

 $ go.vm execute examples/hello.raw

Or you can handle both steps at once:

 $ go.vm run examples/hello.in

Opcodes

The virtual machine has 16 registers, each of which can store an integer or a string. For example to set the first two registers you might write:

 store #0, "This is a string"
 store #1, 0xFFFF

In addition to this there are several mathematical operations which have the general form:

 $operation $result, $src1, $src2

For example to add the contents of register #1 and register #2, storing the result in register #0 you would write:

 add #0, #1, #2

Strings and integers may be displayed to STDOUT via:

 print_str #1
 print_int #3

Control-flow is supported via call, ret (for subroutines) and jmp for absolute jumps. You can also use the Z-flag which is set by comparisons and the inc/dec instructions and make conditional jumps:

    store #1, 0x42
    cmp #1, 0x42
    jmpz ok

    store #1, "Something weird happened!\n"
    print_str #1
    exit
  :ok
    store #1, "Comparing register #01 to 0x42 succeeded!\n"
    print_str #1
    exit

Further instructions are available and can be viewed beneath examples/. The instruction-set is pretty limited, for example there is no notion of reading from STDIN - however this is supported via the use of traps, as documented below.

Notes

Some brief notes on parts of the code / operation:

The compiler

The compiler is built in a traditional fashion:

  • Input is split into tokens via lexer.go
    • This uses the token.go for the definition of constants.
  • The stream of tokens is iterated over by compiler.go
    • This uses the constants in opcode.go for the bytecode generation.

The approach to labels is the same as in the inspiring-project: Every time we come across a label we output a pair of temporary bytes in our bytecode. Later, once we've read the whole program and assume we've found all existing labels, we go back up and fix the generated addresses.

You can use the dump command to see the structure the lexer generates:

 $ go.vm dump ./examples/hello.in
 {STORE store}
 {IDENT #1}
 {, ,}
 {STRING Hello, World!
 }
 {PRINT_STR print_str}
 {IDENT #1}
 {EXIT exit}

The interpreter

The core of the interpreter is located in the file cpu.go and is as simple and naive as you would expect. There are some supporting files in the same directory:

Changes

Compared to the original project there are two main changes:

  • The DB/DATA operation allows storing string data directly in the generated bytecode.
  • There is a notion of traps.
    • Rather than defining opcodes for complex tasks it is now possible to callback into the CPU-emulator to do work.

DB/DATA Changes

For example in simple.vm project this is possible:

DB 0x01, 0x02,

But this is not:

 DB "This is a string, with terminator to follow"
 DB 0x00

go.vm supports this, and it is demonstrated in examples/peek-strlen.in.

Traps

The instruction int can be used to call back to the emulator to do some work on behalf of a program. The following traps are currently defined & available:

  • int 0x00
    • Set the contents of the register #0 with the length of the string in register #0.
  • int 0x01
  • int 0x02
    • Update the (string) contents of register #0 to remove any trailing newline.
    • See examples/trap.box.in.

Adding your own trap-functions should be as simple as editing cpu/traps.go.

Fuzzing

Fuzz-testing is a powerful technique to discover bugs, in brief it consists of running a program with numerous random inputs and waiting for it to die.

The CPU in this repository has been fuzzed repeatedly, using the new native fuzz-testing available within go 1.18+. To run tests:

$ cd cpu
$ go test  -parallel=1 -fuzz=FuzzCPU -v
..
=== FUZZ  FuzzCPU
fuzz: elapsed: 0s, gathering baseline coverage: 0/124 completed
fuzz: elapsed: 3s, gathering baseline coverage: 124/124 completed, now fuzzing with 1 workers
fuzz: elapsed: 3s, execs: 124 (41/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 6s, execs: 542 (139/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 9s, execs: 1080 (179/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 12s, execs: 1080 (0/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 15s, execs: 1080 (0/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 18s, execs: 1080 (0/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 21s, execs: 1080 (0/sec), new interesting: 0 (total: 124)
fuzz: elapsed: 24s, execs: 1080 (0/sec), new interesting: 0 (total: 124)
...

See-Also

The original virtual-machine compiler and interpreter is available from the following repository:

In addition to that you can find a real virtual-machine inside the embedded scripting engine I wrote, also for golang. In that case a scripting language is parsed and converted into a series of bytecode instructions, which are executed by a virtual machine. Similar to this project, but the bytecode operations are more complex and high-level:

If you're interested in compilers, and interpreters, generally you might enjoy these other projects too:

Github Setup

This repository is configured to run tests upon every commit, and when pull-requests are created/updated. The testing is carried out via .github/run-tests.sh which is used by the github-action-tester action.

Releases are automated in a similar fashion via .github/build, and the github-action-publish-binaries action.

Steve

go.vm's People

Contributors

skx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go.vm's Issues

Registers overflow - two possible solutions!

This is supposed to be a 16-bit system, as our address-space is capped at 64k, and our registers hold (integer) values in the range 0x0000-0xffff only.

However our registers overflow:

 store #0, 0xFEFE
 store #1, 0xFEFE
 add #3, #0, #1
 print_int #3

The result of this program is 0x1FDFC. Instead we should be % 0xFFFF. We handled the case of inc and dec wrapping correctly in #2 but we need to cap registers more generally.

A simple solution would be to update SetInt to add & 0xFFFF which would limit the value. However a better solution might be to test for this overflow in all the math-operations and set a (new) [C]arry flag.

To do this we'd need to:

  • Add the new flag.
  • Set it on all suitable math-operations.
  • Add JMP_C and JMP_NC operations.

I could go either way, since this machine is posted for learning it could make sense to either:

  • Simplify because it isn't real.
  • Add the opcodes because they demonstrate something useful.

Feedback welcome...

I should investigage fuzzing.

The original c intepreter was fuzzed with AFL, which resulted in a couple of bug-fixes.

It would be worth fuzzing this application with a go-based fuzzer. I'm not yet familiar with any, but no doubt they exist.

We should have pusha / popa

We should have two opcodes for pushing all registers and popping all registers - this would make writing routines to be called useful.

inc + dec don't wrap

Since our registers hold integers in the range 0x0000 0xFFFF they should wrap when they reach the limits. This example:

    store #1,0
    dec #1 

Should result in register #1 containing 0xFFFF.

stack implementation

I'm just curious, that is,
why is the stack implemented as fisrt-in-first-out style? is there any special consideration?

If this were a real 80s CPU ..

We have a lot of special purpose instructions/opcodes which are fun to play with but which wouldn't exist in a real processor, for example:

  • MEMCPY
    • Could be implemented via PEEK/POKE and a loop
  • SYSTEM
    • Can't be implemented by hand, but isn't so terribly useful
    • At least in part because the results are output to the console and not stored in a register.

We're also missing the ability to carry out some operations, either in a simple fashion, or at all:

  • We can't convert a string to uppercase easily
    • Though we could iterate over some memory and cmp >= 'a' && <= 'z'
    • We can't iterate over the characters in a string stored in a register though, or modify them in-place.
  • We can't input string from the user into a register/RAM.
  • We can't draw to the console.
    • Meaning we can't write "tetris" or other simple games.

If this were a real CPU we'd call into fixed addresses into ROM to get support for things, or we'd have an interrupt function. For example we could define the INT 0xNN instruction. Then using that we could provide facilities

  • INT 0x00
    • Read a string from the user, store in register #0.
  • INT 0x01
    • Output a single character, as stored in #0
  • INT 0x02
    • Return the length of the string stored in #0
  • INT 0x03
    • Draw the character stored in #0 at coordinates #1,#2
  • ..

In short we could add higher-level useful utilities in a way that didn't require the addition of more single-purpose instructions. I don't have a strong view of what things are missing, because I've never tried to write "complex" programs for this CPU, but if we're all about the learning then it should be possible to do so!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.