Code Monkey home page Code Monkey logo

lovm2's Issues

Allow optimization of bytecode

Constructs that require labels

  • expressions for logical short-circuit (And, Or)
  • branches
  • loops (break, continue)

Optimization cases

  • constant expressions like 1 + 2 are not optimized:
10:    Pushc(0)
11:    Pushc(1)
12:    Add
  • generator could merge these expressions in Branch conditions into Jt:
10:    ...
11:    Not
12:    Jf
13:    ...
  • the generator often emits code like this at the end of Branch (doesn't happen anymore after introduction of LIR?):
25:    ...
26:    Jmp(27)
27:    ....
  • dead code elimination. everything after offset 12 cannot be reached #9:
func:
11:    Pushc(0)
12:    Ret
13:    Pushc(1)
14:    ...

Steps for implementation

  • HIR is a confusing name. change it to HirUnit (?)
  • update code to use module_builder.entry() instead of module_builder.add(ENTRY_POINT)
  • update docs
  • create new type Label
  • create new type LirElement
  • rename current LoweringRuntime to HirLoweringRuntime that transforms HIR -> LIR
    • on each emit try to optimize the last instructions (depending on instruction)
  • create a new runtime LirLoweringRuntime that transforms LIR -> Bytecode

Eliminate dead-code

optimization issue: #8

branch
    .add_condition(Expr::not(Value::Bool(false)))
    .push(Return::value(Value::Int(1)));
branch
    .default_condition()
    .push(Return::value(Value::Int(2)));

generates the following module. everything after offset 1 is unreachable. 2 constants can be avoided as well.

- consts: [Int(1), Int(2), Nil]
- idents: [main]
- code:
main:
	   0. Pushc(0)        1
	   1. Ret             
	   2. Jmp(5)          
	   3. Pushc(1)        2
	   4. Ret             
	   5. Pushc(2)        Nil
	   6. Ret 

idea

  • a valid path contains offsets that are required by the program. the structure is able to determine whether an instruction at an offset is needed or not
  • iterate over all LirElements
  • return from scanning function if current LirElement is already accepted by a valid path.
  • as every Entry(_) can be reached by external calls, start adding following offsets to the valid path.
  • if a Jmp is detected, continue adding offsets at the jumps target location
  • if a conditional jump Jt/Jf is detected
    • recursively scan at jumps target location. remember to check if the elements inside are already valid (danger of endless loop) - return if a valid instruction was found
    • continue scanning forward
  • if a Ret is detected, stop adding offsets

steps for implementation

  • add new method to Optimization trait: postprocess(&mut self, _: &mut Vec<LirElement>)
    • implement validation algorithm from idea

Boxing should be a variant of Expr

Value { boxed, .. } is weird as other expressions (e.g. function calls) could be boxed as well.

  • Add new Expr variant Box
  • Add new Expr variant Unbox

Goals for 0.4.8

  • handles: new value type specifically designed for data inside shared objects
  • stdlib: create collection of common functions compilable as shared object modules
    • rough layout
    • more testing
  • add new instructions
    • XOr
    • Shl
    • Shr
  • iterators
    • implemented
    • more testing
  • move define_code from exported macros to tests
  • change representation of Call, LCall instructions from (argn, ident_idx) to (ident_idx, argn)
  • rename Cast to Conv
  • update source code documentation
    • maybe expose lovm2_error, lovm2_internals, pylovm2 as extern crates in documentation too?
  • add lovm2_extend usage documentation
  • update pylovm2 on pypi.org
    • currently not possible, because lovm2 requires a nightly compiler. is possible by changing default compiler to nightly.
  • bump lol version

Redesign code layout

Current

The current Module looks like this:

  • CodeObject main
    • globals
    • locals
    • consts
    • code
  • CodeObject add
    • globals
    • locals
    • consts
    • code
  • ...

New Layout

  • idents: globals + locals
  • consts
  • entries: HashMap<Variable, usize> mapping the function name to the offset inside code
  • code which is the concatenated list of bytecode instructions from main and add
main:
    ...
    ret
add:
    ...
    ret

Pros

  • Size of Module is expected to decrease by a lot
    • CodeObject can reuse identifiers and constants
  • The new layout is way simpler than the previous one

Steps for redesign

  • remove current struct Module
  • redefine CodeObject according to new layout
    • wait until #6 is resolved
    • refactor ModuleProtocol trait -> remove slot method
    • implement ModuleProtocol
  • create new type CodeObjectFunction wip name
    • tuple containing a reference to the CodeObject and an offset
    • implement CallProtocol

Redesign Values

Problems:

  • Two different modules (co_value and ru_value) with basically the same operations defined on them
  • No way of mutating a Value other than Dict and List in place/by reference. This makes the implementation of python like slices more difficult.

Steps for redesign:

  • Rename CoValue -> Value
  • RuValue contains either a Value or a Ref(Rc<RefCell<Value>>)
  • Add a new instruction Box for turning the top of stack into a Ref
  • needed? Add a new instruction Unbox for turning the top of stack into a Value (basically a clone)

Pros:

  • Operations like Add are defined for compile time values as well
  • Cleanup of codebase

Cons:

  • Indirection checks are now required when working with RuValue

ExprBranch for conditional values

Add a new HIR structure for conditional values. Currently Branch is only available but it is a statement requiring assignments.

  • Add a new structure with these methods:
    • add_condition(cond: Expr, expr: Expr)
    • default_condition(expr: Expr) - required. Every condition in ExprBranch must always return at least some value
  • Change the name? Alternatives: Conditional

Add optional typing

Todo

  • allow emitting more than one error when lowering the hir
  • introduce a feature-flag for typing support
  • give each Expr an optional type
  • check that there is no type violation when lowering
  • respect type changes when casting

IGet instruction

Implement new instruction IGet(u16) for retrieving items by index .get_by_index(usize).

  • This would save space when working with frequent indices like list[0] and list[1]
  • Also helpful when unboxing iterations over Dict:
              | stack
CPush         | [dict]
IterCreate    | [iter]
...
IterNext      | [list]
Duplicate     | [list, list]
IGet(0)       | [list, key]
LMove         | [list]
IGet(1)       | [value]
LMove         | []

Allow loading custom modules

Add a function fn(name: &str) -> Option<Module> to lovm2 that is dispatched whenever the interpreter needs to lookup modules for the Load instruction.

  • If the user commits to utilizing this feature, do this lookup first
  • If the lookup leads to a None value, continue with regular load path search

This functionality is needed because pylovm2 can generate modules from python code, but importing them into the vm is not suitable as it would require two mutable references to the vm itself. The new method narrows the mutation down to one reference only.

Support for threads?

After the Context has been reconsidered #11 maybe multiple threads can be supported in future versions?

Negation optimization

The optimizer currently merges

Not, Jt => Jf

and vice versa but does not cover

Eq, Not => Ne

adapt pylovm2 api

  • add new method add_with_args to module builder -> avoid with_args on hir
  • avoid calling .code() on hir

Change Variable Scoping

Current Assign::global() and Assign::local() API is cumbersome.

  • Add new method Assign::var(variable, expr)
  • Add new HirElements ScopeGlobal and ScopeLocal
  • Add new methods local, global to Block
  • Add new attribute globals to LirLoweringRuntime (type: Vec<Variable>)
  • Add new LirElements ScopeLocal and ScopeGlobal
  • Deprecate Assign::local and Assign::global
  • Remove AssignType (?)

Redesign module loading

Loading SharedObjectModule and lovm2 Module require special initialization procedures yet the current process enforces too generic loading by only having acces to a GenericModule object in load_and_import_filter.

We never really have access to the underlying types SharedObjectModule or Module because this behavior only returns a GenericModule

if let Ok(so_module) = SharedObjectModule::load_from_file(&path) {

Pending Problems

  • global variables
    constant data cannot be set on modules.
  • no namespaces
    this is not really a problem as long as you can make sure that functions can locally import certain modules into their call scope. currently, including a module influences the whole context. can we add a tiny custom scope on Frame and cache modules in the background?
  • error handling
    there is no standardized way of propagating erroneous states.

Change Hir Creation API

// old
hir.step(Assign::var(n, 2));
// new
hir.assign(n, 2);
  • Move all HirElement constructors as methods on HasBlock
    • Assign: assign, increment, decrement
    • Repeat: break_repeat, continue_repeat
    • Import: import, import_from
    • Interrupt: trigger
    • Return: return_nil, return_value
  • Improve Expr builder
    • Conv: to_bool, to_float, to_integer, to_str
    • Iter: to_iter, create_ranged, has_next, next, reverse

Reducing Context size

The Context structure contains a lot of attributes that are not directly related to the programs state but rather the virtual machines configuration.

pub struct Context {

The load_hook and import_hook callbacks are examples for such attributes. But also the interrupt table shouldn't really be necessary here.

Prefix Hir Building Names

Importing common names such as Expr creates confusions when used as a library. Solution: Expr -> LV2Expr, Call -> LV2Call etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.