nitrogen-lang / nitrogen Goto Github PK

An interpreted programming language written in Go.

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.23% Go 86.43% Inform 7 13.21% Shell 0.13%

nitrogen interpreter interpreted-programming-language golang go programming-language

nitrogen's Introduction

Nitrogen

Nitrogen is a dynamically typed, interpreted programming language written in Go. Nitrogen draws inspiration from Go, C, and several other languages. It's meant to be a simple, easy to use language for making quick scripts and utilities.

Building the Interpreter

Clone the repo git clone https://github.com/nitrogen-lang/nitrogen
Run make cd nitrogen && make
Run the interpreter ./bin/nitrogen [file]

Documentation

Documentation for the standard library and language is available in the docs directory.

Examples

Example programs can be found in the examples directory as well as the tests directory.

Running the Interpreter

Interactive Mode

Nitrogen can run in interactive mode much like other interpreted languages. Run Nitrogen with the -i flag to start the REPL.

Scripts

Run Nitrogen like so: nitrogen filename.ni. The file extension for Nitrogen source files is .ni. The extension for compiled scripts is .nib.

SCGI Server

Nitrogen can run as an SCGI server using multiple workers and the embedded interpreter for performance. Use the -scgi flag to start the server. See the SCGI docs for more details.

Command Line Flags

Usage: nitrogen [options] SCRIPT

-i: Run an interactive REPL prompt.
-ast: Print a representation of the abstract syntax tree and then exit. (Internal debugging)
-version: Printer version information.
-debug: Print debug information during execution. (Very verbose)
-cpuprofile profile.out: Make a CPU profile. (Internal debugging)
-memprofile profile.out: Make a memory profile. (Internal debugging)
-o file.nib: Output a compiled script to file then exit.
-M /module/path: Directory to search for imported modules. This flag can be used multiple times.
-al module.so: Autoload a module from the search path. This flag can be used multiple times. Autoloaded modules are loaded before any script is executed.
-info file.nib: Print information about a compiled Nitrogen file.
-c: Parse and compile script, print errors if any, and exit

Contributing

Issues and pull requests are welcome. Once I write a contributors guide, please read it ;) Until then, always have an issue open for anything you want to work on so we can discuss. Especially for major design issues.

All code should be ran through go fmt. Any request where the files haven't been through gofmt will be denied until they're fixed. Anything written in Nitrogen, use 4 space indent, keep lines relatively short, and use camelCase for function names and PascalCase for class names.

All contributions must be licensed under the 3-Clause BSD license or a more permissive license such as MIT, or CC0. Any other license will be rejected.

License

Both the language specification and this reference interpreter are released under the 3-Clause BSD License which can be found in the LICENSE file.

Inspiration

Writing an Interpreter in Go

nitrogen's People

Contributors

Stargazers

Watchers

Forkers

jmptrader

nitrogen's Issues

Write formal grammar

Because one doesn't exist.

Packaging distribution/download/management

How to do packaging? Right now if a package is needed for an application, the file(s) must be manually downloaded and added to the source tree. They can be placed under a vendor folder of some kind since the search path can be modified.

Start off with a proof of concept. Something like package.json or elm.json. Author information and list of dependencies.

Central repository? Most other languages do that. Maybe later, start by just downloading from an upstream repo via http.

Version resolution, minimum versions like Go modules or "normal" resolution like most languages? Would need to be able to parse semantic versioning constraints and write a dependency resolver.

How are packages structured? Everything under a src directory? Is the repo root the root of the package?

Remove arrow notation

Use dot and index notation for all collection lookups. This includes class, instances, modules, hashmaps, and arrays. Arrays would only use the index notation. Hashmaps can use the dot notation in place of the current arrow notation. Same with modules.

Implement JSON module

Simple encode/decode.

for..in loops

For loops that work over "iterators". Arrays and hashmaps will implement the needed methods. Classes can implement a method to generate and iterator.

for key, value in list {
    ... do stuff
}

for value in list {
    ... do stuff
}

Implement classes in the VM

Title.

Simplify try/catch

Take Python's example and roll with it. Try/catch blocks won't execute in a new scope. Any variable declared is available outside the blocks. At the end of a catch block with the exception bound to a variable, the declaration needs to be deleted. A new DELETE_FAST opcode will need to be added to remove assignments.

This change will simplify the compiler and runtime quite a bit. It will also iron out many bugs with the current implementation.

Inject script arguments

Like #3, also inject the arguments to the script. This will require removing support for multiple scripts as arguments to the interpreter. This would be solved with #2 with the addition of include/require. Use _ARGV in the global scope.

Static class methods and variables

These methods and variables live on the class itself and are not unique to a single instance. Static methods can be called from the class. Calling a static method or static variable from an instance is an exception. Static properties don't belong to an instance.

Type Annotations

Allow functions to specify types for parameters. Types will be checked at compile time as much as possible. Types that can't be checked then will be checked at runtime. This will reduce the amount of runtime overhead as much as possible. The type checking step can be done with the AST as it specifies internal object types. Any parameter or return value not specified will default to the Any type.

The following types can be supported:

Primitives (ints, strings, floats, etc.)
Classes
Interfaces

Syntax

fn example(s: string, i: int): bool {}

interface SomeInterface {
  fn method(s: string): string
}

fn example2(obj: SomeInterface): void {}

Mark functions a native

Allow a function heading to be marked native would allow stdlib and other libraries to be written in Nitrogen while allowing some functionality to be delegated to compiled Go. Right now this is accomplished by exporting stdlib/native which is not documented and is not designed or supported for use outside the std library. A std library package will import its native counterpart and then use it. Allowing fully-qualified functions to be resolved to a native implementation at runtime would remove the need to expose a "native" package collection and allow function declarations with parameter names to be available in Nitrogen for analysis tools later.

# stdlib/somepkg

# doThing is declared in Nitrogen but implemented in Go
func native doThing(param1, param2)

func doThingWrapper(param1) {
    # ...do pre-call stuff
    const res = doThing(param1, "const")
    # ...do post-call stuff
    return thing
}

Hash-based .nib files

Right now, .nib files are always checked to see if they match their source file. Only the modification time is checked. This may cause problems if a file is edited quickly, or a file is renamed but has the same timestamp. Hash-based nibs will detect even the smallest of changes, but at the expense of speed. Calculating even a sha1 hash takes time though it's probably negligible. This could be an optional mode of nib generation.

The current nib format should be updated to contain the size of a source file as well as the timestamp. The size would typically detect changes when files are moved around with the same timestamps.

Make stdlib/opbuf concurrency safe

opbuf maintains a single, global reference to an old buffer. A buffer needs to be kept with a specific VM as each VM can only run in a single goroutine. I'm not sure if this should be something native to the VM or something else. This particularly affects the SCGI server as it runs multiple workers each with a new VM per request. Maintaining a map of VM references to io.Writers would not be great because there's no way to clean up the map and the references would pile up over time.

Expose execution environment

Go exposes the environment as a slice of string. This can easily be converted to a Nitrogen array and injected into the global scope. It can be named _ENV. Variables beginning with underscore can be reserved for interpreter use.

Include/require

Add language constructs to include source code from other Nitrogen files. I like PHP here where they have include and required and then an _once variant of each. I think that allows flexibility on how strict or loose code needs to be.

Implementation details: Can be implemented as a collection of functions or as keywords and constructs directly in the language. Either method would use the same underlying process. A map would be used to store pathnames of source code as it's included. Each path will map to an *ast.Program struct. For the _once variants, nil can be used instead as the parsed tree will never be used again. If a file couldn't be included and it's not required, an Error would be generated and returned. If the script is required, an Exception is returned causing execution to halt.

Implementing as a language keyword and direct AST node would probably be more efficient as an evaluated function call wouldn't need to get setup and invoked. Even builtins need to do a bit of type casting and argument checking. Using an AST node, the evaluator can be guaranteed the arguments and options are correct and use the node directly without a lot of extra checking. The evalInclude function would check if the path has already been included. If yes and _once, do nothing. Otherwise, lex and parse the file. Store the resulting *ast.Program in a map and execute it. If _once, store nil in the map instead as the tree is never used again.

This map is shared across the entire execution. The map will need to be prepopulated with the main script so it can't cause circular imports.

If a file has the extension .so load it as a plugin module. These can only be loaded once. Perhaps use the module keyword instead? or maybe use?

Bitwise and shift operators

&    bitwise AND            integers
|    bitwise OR             integers
^    bitwise XOR            integers (bitwise complement as prefix operator)
&^   bit clear (AND NOT)    integers

<<   left shift             integer << positive integer
>>   right shift            integer >> positive integer

Implement Reflect module

Something to allow introspection of Nitrogen. Right now that main thing I can think of is getting methods and properties of an instance.

Byte Strings

Strings are implemented a slice of Unicode runes. Meaning arbitrary byte sequences are not allowed, or the very least not guaranteed. There needs to be a way to manipulate arbitrary byte data.

Other languages handle this a little differently. Some like Python 3 and Rust have UTF-8 strings and byte strings. Other languages like PHP and JavaScript have a single string type of just bytes.

The runtime could be modified to store a String as a byte slice instead of rune slice. This would require conversions for indexing and string manipulation functions. However this would allow a single type to serve both purposes.

However there's value in distinguishing between the two string types as they serve different purposes. A normal string is guaranteed to be a valid UTF-8 string. While a byte string would be nothing more than bytes that may or may not mean anything. Having them separate would also ensure there's no accidental usage of a byte string in the place of a normal string. There would be conversion functions between the two if needed.

Syntax Notes

As for syntax, maybe borrow Python's way of using the prefix b to denote the following is a byte string. That should be easy to parse. Bytes strings would not be allowed where valid strings are needed in the existing syntax. Examples being import statements, isDefined function, etc.

Will also need to add support for hex literals inside quotes.

b"\xDE\xAD\xBE\xEF"

Implementation Notes

New token to denote a byte string from a regular string. New AST node using a byte slice instead of rune slice. New Object type with the same change. toBytes function to convert a UTF-8 string to a byte string. toString would be modified to allow the reverse. Byte slices can be concated together as well as indexed.

I'm not sure about nay utility functions like the string ones. Byte strings have a particular usage where replace, find, etc would be all that useful. Perhaps start without them and add them later if needed. Maybe allow toBytes to take an array of numbers and convert them to a byte array. That can make generation a but easier for the programmer.

Use _SEARCH_PATHS from environment

To allow modification at runtime.

TODO List

This is simply a brain dump of things I would like to see in Nitrogen. Some will be trivial to implement others will take more time and thought. They are in no specific order.

Functions

Implement printf and sprintf.

Language

Write a formal grammar
Implicitly convert between ints to float for division. Other arithmetic operations will convert to float if even one operand is float, or remain int otherwise.

Modules

Files
- ReadLine

Interpreter

Only die if truly unrecoverable error
Better error messaging (parser enters panic mode)

Replace compiler

Replace the compiler with a more versatile one that makes it easier to manipulate the instructions before fully compiling to bytes. One idea is to use an array of Opcode structs each representing a single instruction with any operands instead of going straight to a byte slice. This array can then go through optimization phases to remove redundant or noop instructions such as a POP immediately after a LOAD. Or even fancier stuff like determining if an assignment is used immediately and then never again thereby removing the store instruction altogether. As more optimizations can be made, some can be placed behind a runtime flag for extra optimization like constant propagation.

REPL is buggy

The REPL can be buggy for some variable lookups due to incorrect opcode generation. Also need to look at implementing a basic readline-like interface for arrow keys, home, end, and backspace.

Incorrect if statement compilation

If statement compilation appears to be broken for all but the simplest case (which is all I've used so far). Need to rewrite the if statement compiler function with the new techniques from time spent with the loops and such.

Need to watch out for:

When an implicit nil needs to be returned
When a return statement is used (check jump index)
Empty bodies/no alternative block
If statements have their own scope (need better opcodes for general block creation, done)
Double pops (POP_JUMP_IF_FALSE followed by POP)
More than two comparisons in a single condition jumping to the wrong location

Generators

Not sure if making generators a part of the language would be good or not. Something to toy with.

Better error messages from parse stage

Right now the parse errors are quite cryptic or even completely wrong. It doesn't need to be as fancy as Rust's, but something a bit more friendly would do nicely.

Improvements to JSON package

Need to fully implement number types (signs and exponent forms). Maybe need to verify strings?

Implement HTTP module

It doesn't have to be fancy. Just something to send and request data from apis. The JSON module needs to come first as a lot of APIs rely on it.

Cache parsed trees and code blocks in SCGI server

The server parses a script every time it's called. It would be better to have a cache so subsequent requests wouldn't have to parse the file. There would need to be a mod time check so a cached script can be invalidated when it's been changed. This cache could be used application wide, even in the interpreter.

Current SCGI benchmark: ab -n 50000 -c 8 http://localhost:8081/cgi/index.ni: 2700 requests/sec.

Single line if statement

For example if x > 3 return x or use a colon for separation if x > 3: return x? Just something a bit nicer than if x > 3 { return x }. The curly braces feel unneeded.

Generic scope and block opcodes

Change PREPARE_BLOCK to OPEN_SCOPE and END_BLOCK to CLOSE_SCOPE. CloseScope will only decapsulate one environment. PopBlock will be moved to a POP_BLOCK opcode.

Summary:

+OPEN_SCOPE (encapsulate env)
+CLOSE_SCOPE (remove one layer of env)
-PREPARE_BLOCK (removed opcode)
~END_BLOCK (changed to only manipulate the block stack)

elseif sytnax

Whether it's } else if () { or } elif () { it doesn't matter. But something needs to be done. The syntax tree can be a tree of if statements. Not the most efficient, but it will get the job done and can be optimized later.

Update docs

A lot of changes have been made and many parts of the docs are now out dated. Here's a list of errors I've found so far:

/readme.md

Building the Interpreter: Update instructions to use the make file. go get won't embed version information or compile the modules.
Move references to the docs and examples near the top.

/docs/language.md

Keywords: Table needs to be sorted alphabetically.
Semicolons: Change But the following is ok: to But the following is syntactically valid:.
Semicolons: Update what keywords get a semicolon. Use the needSemicolon function for reference.
Numbers: Make the references toInt() and toFloat() in backticks for code formatting.
Operator Precedence: && and || need to be replaced with and and or.
Looping over arrays/maps: Mention new forEach() function in standard library.
Try Catch/Exceptions: Fix grammar of first sentence.
Try Catch/Exceptions: Update references to the old import functions.
Try Catch Examples: Update examples to use new import functions.

Class inheritance

The syntax is already available class name ^ parent { } or let name = class ^ parent { }. Classes can have a single parent. Classes may override methods or fields of their parent. The variable parent will be bound to the parent class to access the parent's methods or fields directly. An instance will execute field definition statements from the oldest parent to the class itself. Methods can be retrieved as needed using the method tables.

Improvements to import

Required changes:

New tokens: IMPORT, AS
New syntax: IMPORT_STMT := import STRING[ as IDENT];
New AST node: struct{ importPath, name }
New opcode: IMPORT takes two arguments off the stack (path and name) and imports the module into the local scope binding the modules' return value to name.

The current import function can be reused and placed in the vm package. The import() function will be removed.

Exceptions during class init not propgating correctly

Reproducible sample:

class MyClass {
    const init = fn() {
        println(thing)
    }
}

const main = fn() {
    try {
        const things = new MyClass()
    } catch e {
        println(e)
    }
}

main()

Excepted main function to return nil, instead it returns an instance of MyClass.
The exception created by attempting to lookup thing which doesn't exist, is
not propogated to the makeInstance method in the vm runtime. The throw handler
unwinds the call stack as it should, the function returns, then the makeInstance
method pushes the invalid instance onto the calling frame's stack. This makes it
look like the calling function is returning the instance when in fact, the instance
should never exist.

I'm not sure how to fix this at the moment. It will take some thinking to figure
out.

Reduce module, include, and require to a single statement

Reduce all the above into a single import statement. The mechanics would be similar to Node where a full or relative path will import a script file and anything else will search for an appropriate module or script in the search path. The search path will be configurable at runtime and/or execution time. If a compiled module is found on a platform that doesn't support them, the match should be skipped and the next search result used.

For loops don't work with *_FAST opcodes

A for loop has its own block scope. However the compiler doesn't create a new codeblock for the loop body. So instead of generating *_GLOBAL opcodes, it generates the FAST family for variables are declared in the outer function body. The loop compilation function needs to act like the function and create a new code block. The code block isn't kept, it's constants and scope needs to be merged with the outer code block. which will be interesting... Maybe add an offset for each table like with the code offset?

For the time being, The FAST opcodes aren't limited to the current scope. The functionality should still be the same, but it will recursively go through parents if there are any.

Interpolated strings

Similar to double quoted strings in PHP, f strings in Python 3, and template strings in JavaScript.

Syntax Notes

"This string contains ${some_var}, cool huh?" or
`This string contains ${some_var}, cool huh?`

Do like PHP or JavaScript where double quotes or backticks denote an interpolated string? While single quotes are not interpreted. Or like Python where there's a marker before the string? Backticks aren't being used for anything so they could denote this type of string. Perhaps make it the most flexible allowing new lines and escape sequences.

Curly braces are required.

Are the expressions allowed inside the braces or only identifiers? Can more elaborate expressions be used such as math? What about array/map indexing? Is that allowed?

Implementation Notes

Interpreted strings begin as an AST node with text and variables parts. The parts are reduced at runtime to a single string when the string is first encountered. Any non-string variables are converted to a string. Classes will use a toString method if available, other values will have their String() methods called in Go.

VM could be implemented a couple ways:

Use standard string concatenation. Each variable is passed to toString for conversion and concated to the part before it. This method would require no new opcodes but would be less performant due to multiple concatenations and the multiple calls from Nitrogen to the runtime for string conversion.
Create a new runtime object and opcode that takes that object to generate a string. The runtime object would contain the parts from the syntax tree. The opcode would take the object, and only that object, from the stack and "execute" it to create a flat string. This method would be the cleanest and most performant since the entire conversion and concatenation step would happen in the runtime.

Optimize VM environment

Right now Environment manipulation is the weak point of efficiency in the VM. Allocations are taking up quite a bit of memory and execution hotspots are at memory allocations points around environments. The current environment setup is simply reusing what was already available for the interpreter. I need to do a bit of research and testing to see what can be done. Environment/frame reuse may be helpful.

nitrogen-lang / nitrogen Goto Github PK

nitrogen's Introduction

Nitrogen

Building the Interpreter

Documentation

Examples

Running the Interpreter

Interactive Mode

Scripts

SCGI Server

Command Line Flags

Contributing

License

Inspiration

nitrogen's People

Contributors

Stargazers

Watchers

Forkers

nitrogen's Issues

Syntax

Syntax Notes

Implementation Notes

Functions

Language

Modules

Interpreter

/readme.md

/docs/language.md

Required changes:

Syntax Notes

Implementation Notes

Recommend Projects

Recommend Topics

Recommend Org