Code Monkey home page Code Monkey logo

onyx-lang's People

Contributors

adlerhsieh avatar andre-kaufmann avatar asterite avatar bcardiff avatar benoist avatar bjmllr avatar davydovanton avatar jacobub avatar jhass avatar joshgoebel avatar kostya avatar lbguilherme avatar luislavena avatar makenowjust avatar oprypin avatar ozra avatar papamarkou avatar porras avatar pragtob avatar prep avatar refi64 avatar rhysd avatar tatey avatar technorama avatar waj avatar waterlink avatar will avatar ysbaddaden avatar yui-knk avatar zamith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onyx-lang's Issues

Macros and Templates

This is the only (?!) missing feature in Onyx atm, and the culprit is syntax (and a whoooole lot of coding).

Syntax

While indent-significant layout is ultimately apt for coding, it's not so for meta-coding-contexts. Meta-code cause indentations that has to be compensated for in verbatim-paste-code, it quickly becomes un-manageble.

The only conclusion I can arrive at is that in template macros explicit end-tokens will be required, and indentation is not significant. It's not really a normal coding context.

Concept

  • Template Macros are what is referred to as "macro" in Crystal. They act and look like html/whatever-templates.
  • Template Def Macro - -""- for creating defs.
  • AST Macros are run macros where you work on the AST and return the result instead of working with a template, I'd like to integrate these more seamlessly into Onyx.

Suggestions

template blabla(x, y) =
   ...

template def blabla(x, y) ->
   ...

macro blabla(x, y) ->
   ...

Add splat and discard to multiple assign

See #17.

[x, y, _, ...z, u, v] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
-- => x = 1, y = 2, z = [4, 5, 6, 7], u = 8, v = 9

Proposition is that the splat, if used, takes the type of the right hand side.

Anonymous types

For many cases where callbacks are used in some languages, it's common in baselib to require visitor(-like) instances instead.

A Visitor instance has several pros to simple callbacks: "pattern match" types via overloading handler functions for different types, maintaining state in a formal way (as opposed to informal closured vars in closured callbacks).

However, there's no way of passing an anonymous visitor instance, as it is possible with an anonymous lambda.

Thus, it would be handy to be able to.

Perhaps this should be "mindstate" connected to "soft lambdas" (requiring a better name for reasonable association), and then syntax likeness too?

some-thing.accept (self) <~ Visitor
   @some-state = 0
   init(@base-something) ->
   visit(node SomeType) -> do-stuff
   visit(node OtherType) -> do-stuff

Corresponds to:

some-thing.accept MyOneOffVisitor(self)

type MyOneOffVisitor < Visitor
   @some-state = 0
   init(@base-something) ->
   visit(node SomeType) -> do-stuff
   visit(node OtherType) -> do-stuff
  • The (arguments) and <~ looks like the reverse of "soft lambda"
  • The reverse < gives homage to the inherit symbol in the type declaration
  • The rest to the right of the back-arrow simply follows normal type syntax

Instead of a type name, the type-definition is "thrown" directly onto the initializer arguments (empty if not applicable of course) and immediately instantiated.

The use case if of course for when the type will only be used in a call and never any more places, which would make it utterly pointless in naming the type.

Function / Method / Lambda Definition

Functions / Methods

Note: See #11 (comment) for latest RFC suggestion!

This is a staple construct of any language, and as such I believe there will be some opinions on it.

I like it to be simple and straight forward to define functions and therefore prefer as little formalia as possible while still making them clearly distinguishable and have a clear beacon (the spatial perspective).
At the moment, there are two different ways of defining them, as to not lock in to my personal preference alone, keeping it open for discussion.

Note, I use lisp-style separators in the examples; all the below could of course (as already pointed out) be written with snake or camel.

[edit: I forgot about generics - see separate issue about it instead]

  1. Function/Method name first (minimal formalia)
this-is-a-func(a-param) -> a-param + 1

this-is-another(a-param) ->
   a-param + 1

yet-one(a-typed-param Real, foo SomeType) ->
   foo.blargh = 47
   a-typed-param + 1.0
end

type MyType
   my-member I32 = 47

   a-method(a-param) -> abstract

   another-method(a-param) ->
      @my-member + a-param

   third-method() -> @my-member
end

  1. Prefixed with a keyword (most common in other current languages, some different keywords are allowed atm, they're all equal fn, def, etc.) - support for this syntax has been dropped - it provides no benefit!
fn this-is-a-func(a-param) -> a-param + 1

fn this-is-another(a-param) ->
   a-param + 1

def and-a-routine(a-typed-param Real, foo SomeType) ->
   foo.blargh = 47
   a-typed-param + 1.0
end

type MyType
   my-member Int32 = 47

   def a-method(a-param) -> abstract

   def another-method(a-param) ->
      @my-member + a-param

   def third-method() ->
      @my-member
end

Thoughts concerning both

  • Parentheses around parameter list is always required - even if there are no parameters. This makes it easier to identify definitions at a glance, even without highlighting.
  • The arrow contributes to recognition and serves as nest-start-token at the same time, when one-line functions are written
  • Always having the same count of chars before identifier (zero for (1), and two|three for (2)) makes spatial position of identifier stable and reliable to find at a glance.

Thoughts about (1):

  • Much cleaner for short functions
  • Cleaner for methods
  • The voluntary end keyword, when used, may look a bit unbalanced towards 'identifier'
  • When specifying requirements on the func, say "pure method", "pure function", etc. this must be done with a suffix / pre-row annotation.
  • More complex to parse (must parse a good bit of code before being sure what is meant - like a human) - but that is out of the scope of the user.
  • In style of Erlang, LiveScript/CoffeeScript, C++'ish

Thoughts about (2):

Note: As mentioned above: this syntax has now been removed and deemed non-beneficial.

  • Clumsier, more noise, for short terse functions
  • Feels redundant for method-declarations
  • Gives a better balance when end keyword is used
  • Easier for the parser
  • The arrow is unnecessary for this syntax, instead any nest-start-token (or implicit when indented) could be used. Currently this use the suffix-arrow to.
  • In style of Python, Ruby, JavaScript, Rust, Golang, Lua...

The main reason the keyword style is so common is not because "it's better" - it's because it's easier to write a parser for it!

More Details, Common to Both

Using (1)-syntax in examples.

Sugar for "Functions" Not Returning Any Value

Since this is an imperative language, non (usable) value returning functions are not too uncommon (most often as part of types, where they modify member data of self only).
In order to avoid accidentally leaking internal state involuntarily through implicit returns, and not having to tediously and repeatedly type nil as final expression, an exclamation mark can be suffixed to the 'function-arrow'. Implying "action"/"command"/procedure/routine". This makes sure it ends with nil and sets return type to Nil. Inspired by LS.

my-mutating-only-routine(a Foo, flag) ->!
   a.some-flag = flag

Formal Parameters

more-formal-params(a Int32 = 47, b String = "Cool!"; c = Foo()) String|Int32 ->
   say "c is {c}"
   if b is "Dude" then a else b
  • Parameters are separated with , (comma) or ; (semicolon), these separators can be mixed freely to make the signature as clear as possible.
  • Params can optionally be typed, by following the identifier with space and then a type, type union or other type formulation, and optionally qualifiers.
  • Default values are allowed, simply follow the identifier, or the type, if used, with = and an expression.
  • Any parameter with default value can also be used as a named arg at call site (see #16 for specific RFC on named args)
  • By prefixing parameter-name with @ it is automatically assigned to the member-var of that name. Very useful in constructors and setters of different kinds.
  • Return type can be given by putting a type immediately after the parameters ending parentheses - before the arrow.
  • All parameters and return type are inferred if not explicitly typed, unique funcs are instantiated per need at compile time.
  • Variadic funcs are formed using a splat param: simply prefix the param name with ... (inspired from C++, Java, JS-ES6, CS/LS) - it will be a tuple with all the matched args, it can be placed anywhere in the parameter list.
  • "Soft lambdas" Fragments can only be taken as last parameter, with an identifier prefixed with &, atm.

Visibility

After much pondering, I came to the conclusion that the best way to mark visibility is per function (as opposed to grouped), and as sleakly as possible: suffix the name with an asterisk for protected, two asterisks for private. No asterisk means public. Public is the default because of the "open ness and patchability" philosophy of Onyx.

GOTCHA: This is the opposite of Modula and Nim, where an asterisk designates public visibility and defaults to private.

type Foo
   i-am-public(x) -> stuff
   i-am-protected*(x) -> stuff
   i-am-private**(x) -> stuff

The asterisks are only typed at definition, not in calls (they're not part of the name).

It sort of looks like a footnote, sort of "there's a gotcha about this one", I like it.

There is one unresolved issue with this, which I can't imagine ever coming up in practise, but I have a solution for it, which for natural reasons is not at the top of the list atm. Guess what it is. :-)

Pragmas

Some pragmas usable with functions:
'inline, 'no-inline, 'returns-twice, 'no-return, 'naked, 'raises.

LLVM is very capable of inlining the right stuff for optimum speed (or size), so this should rarely have to be used.

There are also some pragmas for changing semantics:
'pure - this is not implemented yet though ;-) And I'm still thinking about better ways to express pure functions, in order to promote writing them.

Additional Notes

  • Functions always return last expression.
    • That being said, if your function "looks rather imperative", I recommend you use explicit 'return' statement for the final expression too. If your function is neater, more like a pure mathematical function, implicit is better.
  • Return type is inferred from all return points (unless return type has been explicitly declared).

Lambdas

  • The syntax is pretty much the same as for a function, less the name. All parameters must be typed for lambdas (since they can be passed around as values, as callbacks etc.).
  • Any outer scope variables referred to are captured in closure by reference.
  • Note that, unlike functions and methods that are calls primarily when referenced, lambdas are values primarily. So to call a lambda without arguments you must use parentheses, otherwise it just represents the lambda-function as a value: my-lambda() -- call lambda without args.
my-lambda = (x Int, y Int) -> x + y

x = my-lambda 3, 5
-- x = 8

my-fun-fun(f (Int, Int)->, a, b) ->
   f a, b -- call the lambda we got as arg `f` with args `a` and `b`

u = my-fun-fun my-lambda, 3, 4
-- u = 7

z = 47
my-closuring-lambda = (x Int) ->
   z += x
   z

y = my-closuring-lambda -5
-- y == 42
-- z == 42

Soft Lambdas Fragments

This is a special beast - see them in their own issue: #14

Aaaaand, as always: remind me of what I've forgotten or should clarify.

Fragments - The Goofy Kid on the Block

Soft-Lambdas Fragments - The Goofy Kid on the Block [pun not...]

And pleeeease - anyone - a better term than 'soft-lambda' would be nice!
Further alternatives to the current term "fragment" is appreciated!

"Soft Lambdas" "Fragments", called simply "Blocks" in Crystal lingua (which is way too generic and ambiguous a term), was not a primary idea for Onyx, but they do serve well.
They act basically like a kind of macros. It's a deeply used part of Crystal stdlib, and it also solves iterators in a good way - so it must be embraced in Onyx in a good fashion.

A "fragment" is like a lambda in that it can take arguments. They are however never typed, because the code is literally pasted into the using context code, and so it is type checked in that context.
Since it is evaluated in the context it is used, it also means that statements like return, continue, break and next operate in the final context! This means you can stop a map operation half way through by using break, or returning from the function where the soft lambdas is declared using return, which is not possible when using a standard lambda (where those control actions take place in the context of the lambda).
As is obvious from all this, a primary purpose for them are for iterating, but they're not limited to that.

Syntax

Much like lambda/function definition, but type restrictions cannot be given to parameters.
If there are no parameters or if auto-parametrization is used, ~> alone begins the soft lambda.

The number of params don't have to match the number of arguments, much like in JavaScript. Extraneous params are set to nil and extraneous args are simply thrown away.

list = [1, 2, 3, 4]
mapped-list = list.map (v) ~> v + 1     --> [2, 3, 4, 5]
mapped-list = list.map ~> _1 + 1        --> [2, 3, 4, 5]

-- Example of using `break`
mapped-list.each-with-index (v, i) ~>
   break if i is 2
   say "val at {i} = '{v}'

say "All done!"

--> "val at 0 = '1'"
--> "val at 1 = '4'"
--> "All done!"

Note. Initially I tried a variation of Crystal syntax for this:

list = [1, 2, 3, 4]
mapped-list = list.map |v| v + 1        --> [2, 3, 4, 5]
-- same thing, just with an additional explicit nest-start symbol:
mapped-list = list.map |v| => v + 1     --> [2, 3, 4, 5]

But, I felt it better to go with the style of functions and lambdas, distinguishing with the ~> (because of the fuzzy nature of "operating in the target context")

Automagic parametrization

Inspired by KDB/Q, I implemented auto-parametrization through using _1..._x arg names in the code: ~> (_1 + _2) / _3, this would be the same as (a, b, c) ~> (a + b) / c

Shorthand Fragments

Invoking a method directly on a single argument can be done very terse, since it is rather common.
Since operators in Onyx maps to methods internally, the methods can be invoked explicitly when wanted. This can be exploited in this case (a = x + 5 is the same as a = x.+ 5 - note that the +-operator is used as a method in the second case)

list = [1, 2, 3, 4]
mapped-list = list.map ~.* 2       --> [2, 4, 6, 8]

Making fragments as terse as possible is a good target. Any ideas on syntax improvement?

Project status?

This looks interesting, but doesn't seem to get many updates. Is the project still alive?

The option of an alternative syntax for Crystal appeals to me - uppercase/lowercase restrictions for instance are particularly irritating. I hope the project isn't dead.

Delimiter Syntax for Generic Parameters / Arguments

Delimiter Syntax for Generic Parameters / Arguments

I've implemented both <> and [] as generic parametrization delimiters.
The reason for this at first was to make it possible to try out both to see which gets most use and gets to stay.

However, another idea popped up: What if both are allowed always? This way long nested types can be made clearer by altering the symbols.
Granted - this is one of those things that does not occur often in most code - so out of that respect it should not get favoured treatment.

Ideas

    1. [/] only Type[GenType, OtherType[SomeType]]
    1. </> only Type<GenType, OtherType<SomeType>>
    1. Allow both: Type[GenType, OtherType<SomeType>] requiring one or the other for 1st level, for increased consistency.
    1. Allow both: Type<GenType, OtherType[SomeType<MoreType>]> allowing them completely interchangeably at any depth level.

A "Realistic" Use Example

my-ptr-list = List<Ptr[Map<Str,Str>]>()

Possible Conflicts and Ambiguities

[/] are also used for array/sequence literals sugar, and as subscript operator sugar.
'<'/'>' are also used for comparisons

There's not a great risk of ambiguity or clashes.

If allowing both, possibility of ambiguities of course increases. This is always bad in itself of course, since it makes it harder to figure out a possible cause of an error when analyzing errors for "human messages" heuristically. It also locks down other possible uses of those brackets in combination with "const-like literals".

Prior Art

  • </> is dominant in many common languages currently, notably C++, Java, etc.
  • [/] is used in Scala, Nim, etc.

Preference and Motivation

Undecided. Though I tend to use Ty[...] more than <...>, and often use the mixed approach.

Comments and Doc Comments Syntax

-- here's a comment
say "ok!" -- and here

--| This function blows up this universe!
--| Will be better documented after Ragnarök
some-func(x Int) -> x / 0

-- _[ed: added 17-04-05]:_

--/ A Multi Line Doc Comment
    that ends when
    dedent is reached
some-func(x --[an inline comment here]-- Int) -> x

--\ Or Maybe Backslash
    or maybe both, that
       could work
    too

Prior Art

  • -- and additional variations are used by Ada, Eiffel, Haskell, Lua, SQL, etc..
  • // is a more awkward form (albeit very familiar to most of us). I suspect it was chosen back in the days to allow -- decrement operator without additional contextual rules for syntax. In Onyx you write x -= 1 for that (deliberately not allowing inc/dec op; it's not a "side effect").
  • # is a very good symbol to have in syntax, wasting it on comments isn't preferred! It is currently used as prefix for Tag-values (aka "Symbols" in Crystal and Ruby) and in some hash / key-value scenarios for sugar-constructs. It's otherwise pretty universal in the different doc-syntaxes where it's used in; and only one char — good points for it. But, still, it doesn't have a "natural" (typographic / writing rules) reason for it.

Propositions

  1. -- and (--| or --:) (double hyphens only) for comments and doc comments
  2. As per above an also: and (—| or —:) (EMDASH [\u2014]) interchangeably for comments and doc comments respectively
  3. [ed: added 17-04-05]: Different forms of symmetric and multiline comments also: --[ comment ]--, --/ begin comments that close with dedent, etc.
    • Also here it is probably good to designate classes of intent to different comment-markers: simply code put out of action, dev-to-dev comment on impl-details, etc. vs. officially documenting comments (for doc-extraction).

Motivation

  • The reason I chose the notation, before realizing a horde of languages had it, was because that's how I write comments in pseudo code on paper (well more of an EMDASH that is).
  • The pipe | was then picked up from Haskell to specifically point out doc comments - discriminating between them is good. The : is a candidate for doc comment usage also.
  • [ed: added 17-04-05]: finding a new editor (https://github.com/howl-editor/howl - highly recommended!!!) I came into contact with Lua and Moonscript. I think Lua's symmetric comment syntax (--[ comment ]--) would be good to steal from. This would be for "regular" commenting. In addition '--', or '--/' for starting an indent-based multi line comment (ends on dedent), which is considered "eligible for documentation generation".

Allowing EMDASH (, \u2014) could be nice. It looks more like natural writing and stands out a bit. Getting unicode into a de facto position is desirable, but relying on it is still a bit of fringe work.

Allowing them both makes it simple for those who:

  • don't have EMDASH easily accessible on the keyboard (seems unlikely—it's a basic typographic feature), and
  • can't easily configure it (xmodmap in linux, etc), or
  • can't get their editor to replace -- with upon typing, or
  • simply just prefer double-hyphens.

Even though most of us prefer monotype while editing (hell, I know some people who prefer monotype for everything, even book reading!), EMDASH still looks a great deal wider than hyphen in most fonts, so it is clear (even without highlighting).

One downside is that it might not be available in all monotype fonts.

Style uniformity in a project is solved via formatter. Everyone should decide on a style and configure the formatter for it. One for checking in to project (project style), and one for personal editing. This way the project can have -- as mandatory style, while you use during your editing sessions, if you prefer it.

NOTE

Currently the -- style is implemented. Distinction semantically between comments and doc comments is not decided by the notation yet, but currently by proximity.

[ed: added 17-04-05]: The suggestions above added with this edit/date marker are just ideas and not implemented yet.

Feature suggestion: suffix while

Strongly suggest a value-collecting suffix while:

text = "abcdef"
results = text.shift while !text.empty?
puts results            -- ["a", "b", "c", "d", "e", "f"]
results = text.shift while !text.empty?
puts results            -- [ ]

Also a suffix until, which would run at least once.

Feature request: Heterogeneous arrays

In Crystal, tuples are heterogenous, but cannot be extended or modified at runtime; while Arrays are extensible but homogenous. Ideally:

my-array = ["", 2, nil]
my-array << false
puts my-array.class             -- should NOT be String | Int | Nil | Bool,
                                -- but rather [String, Int, Nil, Bool], like a tuple

Come to think of it, might be better to turn this on its head: keep arrays homogenous, but make tuples editable:

my-tuple = {"", 2, nil}
my-tuple << false
puts my-tuple.class             -- {String, Int, Nil, Bool}

Named Arguments in Callable Invocation

Named Arguments in Callable Invocation

Alternatives

1: Current: Use Tag-literal to denote arg-name assignment. foo #named-arg = 47, #other-arg = "Ok!", same with parenthesis: foo(#named-arg = 47, #other-arg = "Ok!")

Shining light on the complications requiring some thought as to syntax:

foo(x = 47, y = 42, color = 0xffaa00) ->
   say "x = {x}, y = {y}, color = {color}"

foo 100, #color = 0xff0000
foo #color = tmp = 0x00ff00   -- Set a named arg, and assign a variable(!)
if v = some-obj.foo           -- assign variable in condition
   foo 100, v, 0xcccccc

2: Use = with the names without Tag-literal notation, require a more explicit notation for assigning variables in the contexts of args of calls, say :=. This would not be a "first assign" designator - just means: "assign is actually intended in this context". In the case of when "explicit first assigns" is demanded, it could perhaps double for that role in these contexts - so that you don't have to whip out something like my-call(and-assign ':= 47).

For this case, the assignment syntax should also apply to control structures like if, which also would make things a bit clearer (especially since it's a common pattern in Onyx because of Nil-elimination branching.

foo(x = 47, y = 42, color = 0xffaa00) ->
   say "x = {x}, y = {y}, color = {color}"

foo 100, color = 0xff0000
foo color = tmp := 0x00ff00   -- Set a named arg, and assign a variable
if v := some-obj.foo          -- assign variable in condition
   foo 100, v, 0xcccccc

3: Use colons, like in map notation. This however clashes with "python-style" nesting control symbol :. So, requires a core syntax overhaul: if some-foo x: 13, y: 56, verify: true then do-shit. The then here could have been a : which mess things up, unless using parentheses call, which would be an inconsistent requirement.

Feature request: named and multiple blocks

The language would be much more powerful if constructs like this could be done:

my-if(&condition, &trueblock, else: &falseblock) ->

my-if { ... }
   { ... }
else
   { ... }

We'd need to allow braces to delimit parameterless blocks.

It would be really useful, to allow new control structures to be written in user space. For example, Go's channel select.

Gotchas For Crystalers and Rubyists

Gotchas For Crystalers and Rubyists

It's reasonable that some of you who find interest in Onyx know a bit of Crystal.

There are some differences that can be confusing, likewise if you come from a Ruby background.

  • Indentation matters!

  • end keywords are optional, but significant when used, and can also be more explicit: end-type

  • Comments are written -- comment here

  • Symbols are called Tags and are written #symbol_name, not :symbol_name (think hash-tag)

  • Char-literals are written %"a" not 'a' - they're not too common except in special code bases.

  • Dont use :: for namespacing / module-hierarchy-digging, simply use dot: MyMod.IsMuch.More.Natural.ToUse.like-this() than MyMod::IsMuch::More::Natural::ToUse.like-this()

  • Type annotating variables, instance-vars, parameters etc. is done simply by writing the type after the identifier, spaced. No colon or double colons. There are also some type modifier/qualifier symbols, but they're in early workings. foo SomeType = 47 -- foo typed to SomeType and assigned 47

  • It's recommended to use endash or hyphen as word-delimiter, however the archaic underscore-style is allowed interchangeably, even camel-case can be used (but don't!).

  • a module is a module, a "type-partial" is declared trait MyTrait, not module MyTrait, and mixed in to a type with mixin MyTrait - not include

  • do is a generic "nesting starts" keyword, not a "crystal/ruby-block" starter. do, then, :, => are exactly equivalent. Example: if foo then "ok", if foo => "ok", if foo: "ok", etc.

  • A "rubyish block" is called "soft-lambda" (please, help with a better term!), because frankly "block" is too generic, ambiguous and confusing as a term. Soft-lambdas begin with (params, here) ~> or ~> only. Instead of &.do_stuff, you would use the visually similar ~.do_stuff (to keep with the style of the other soft-lambda notations.)

  • A "soft-lambda" is part of the arguments, not an "add-on". That means it goes inside parentheses if they're used in a parenthesized call, and is separated with a comma if there are arguments before it. It still only goes last among the arguments.

  • with '~>', auto-parametrization is available (KDB/Q inspired). Simply use names _x where x is the position in parameters it should represent. Count begins with 1.

  • This means that in Onyx you could write (a few variations here, result unused):

    list.map (v) ~>
      v * v
    
    list.map (v) ~> v * v
    
    list.map((v) ~> v * v)
    
    list.map (v) ~>
      v * v
    end  -- explicit 'end' is voluntary
    
    list.map ~> _1 * _1  -- implicit parametrization
    
  • Bitwise operations are not commonplace and therefore they have been demoted to the Haskellish form: .|., .&., .^., .~. - that is, simply surrounded with dots (think bits...). Note that the same goes for any type overloading those operators: List "unique join" would also be [1, 2] .|. [2, 3]. - this will change back in a push not to soon...

  • Generic types / type constructors. In Crystal you use Type(Type2(Type3)), in Onyx you can currently use </> and [/], like so: Type[Type2<Type3>].

  • Instantiation of types has short form foo = Foo("Param to init") which is the same as calling .new. Parentheses has to be used if there are no arguments: foo = Foo(), otherwise it just represents the type (unlike first-hand callables that are calls primarily and values only when specified). Another example with generics: list = List[String](). The short form is preferred instead of calling new.

  • Lambdas are defined as funcs, but without name, or rephrased: like soft-lambdas but with straight arrow, so also here it might look a bit backward to Crystalers: foo = (x I32, y I32) -> x + y

  • Lambdas and any instances having 'call'-methods defined can be called Functor-style, that is, used as functions directly. They primarily represent their value, so if arg-less call is wanted, empty parentheses are needed. my-lambda = (x Str) -> say "x: {x}"; end; my-lambda "47"

  • String interpolation is simply written "{interpolate-me}", not "#{...}"

  • and, or, is, isnt, not is available in addition to &&, ||, ==, !=, ! - they behave exactly the same as their symbolic counterparts, it's a mere matter of lexical choice.

  • for-loops are available - they're pure sugar for .each*-iteration, and so scoping and behaviour exactly mimics method based soft-lambda iteration. A multitude of syntactical variations are currently available, whatever you find natural to use will likely work. The abundance of choices will be reduced by consensus.

  • In type-defs:

    • initializer is simply called init
    • self does not refer to the current type, you use Self or Type (capitalized). One of the terms may be ditched.
    • "class-methods" are declared as Type.my_method(params) -> (or Self...) and are currently referred to as Type Level Functions.
    • class-variables can be accessed either as @@class–var or Type.class–var. These also, are called Type Level Variables. I'm inclined to ditch the @@-notation.
    • instance-vars do not need @ prefix at declaration site.
    • instance-vars can be typed, default assigned and getter/setter flagged on the same line.
  • Integer literals are of type StdInt (platform width), not Int32 by default.

  • The type of int- and real literals can be changed through parser-pragma, and can therefore be retyped per scope as seen beneficial for clear code.

  • There's no ternary conditional in the common form, it is prefixed with if. Or, better phrased: if has an alternative nest-start/else notation: foo = if bar is 47 ? 1 : 2. You could as well write foo = if bar is 47 then 1 else 2

  • There are many ways to write case constructs currently, try them out and don't get stuck with what you're currently used to - you might find something you like more. You can currently write them exactly as in Crystal. The huge amount of styles now available will be reduced based on RFC input.

  • func/method defs can currently be written prefixed with def or fn, however the idiomatic way is just my–def(params) -> body. It's uncertain whether the keyword style will remain.

  • splat parameter is written ...par–name - not *par–name

  • a routine / func that only mutates can be written with an exclamation mark after the arrow to automatically set return type Nil, and value, to not leak internal state by mistake: my-evil-func(x) ->! x.bar = 47 -- returns nil

  • begin/rescue/else/ensure constructs are named try/rescue/fulfil/ensure.

  • fulfil is the same as else in Crystal but can be used even when there's no begin/rescue defined - to ensure that something always run on successful return of a function, but not if it raises and is caught by some other spot up the call chain.

  • "Attributes" are called "Pragmas" - simply because they are pragmatic constructs that differ in semantics from use-case to use-case. The syntax currently is an apostrophe followed by pragma: 'link("some_lib"). The names differ from Crystal too. See issue on pragmas.

Probably some more things - help out by commenting on what confuses you!

Feature request: pipe operator

Nicer syntax for calling free functions in a chain. Useful to avoid nested calls. Accepts local free functions, proc objects and blocks.

fn(value) -> value + 1
puts 1 |> fn |> fn |> fn    # outputs 4. Equivalent to fn(fn(fn(1))), and much nicer

Might be redundant if UFCS is implemented. Then again, maybe this is preferable to UFCS. After all, a free function isn't a method, so it might be a good idea to differentiate in this manner. Alternatively, implement both. The |> operator would have a lower precedence than ., so perhaps each one has its merit.

Explicit Start Tokens for Expression Blocks of Control Structures

[RFC] Explicit Start Tokens for Expression Blocks of Control Structures

For Crystalers: read 'Gotchas for Crystalers' first.

A block of expressions is simply defined through indentation. But sometimes it looks clearer with specific block start notation. Also, if you want to write a list of expressions on one line - it's a must.

Currently an abundance of notation is available:

  1. => - Seen in LiveScript, maths etc.
  2. : - Seen in Python
  3. do - Common in several languages, old to new, with varying connotations
  4. then - Common mostly in if control constructs
  5. begins - This is a special one, starts a block without requiring indent - it will close along with the enclosing nesting. Seen with varying connotations in different languages.

In Onyx these are all equal and interchangeable at the moment, except begins which closes on one dedent more.
begins is mainly intended for use with module that spans the entire file to avoid unnecessary indent. The idea, as with most things in Onyx is to reduce the amount of constructs by consensus.

That said, I feel they all work well in different contexts. For cond/match the colon feels just right, while in other cases the => feels more right.
I also included the wordish styles, in case it would turn out more people prefer them (I haven't found any scientific studies supporting any of these aspects yet).

Deliberately Messy Example

module MyMod begins

my-fun(x) ->
   if x is 3 then say "Yo - x is three!" else say "I'm clueless!"
   if x < 1 => say "Yo - x is less than one!"
   for i in 1..x do say "Iteration {i}"
   y = case x
      1: "It's one"
      2: "It is two"
      *: "Seems we got {x}"
   return "{y}, ok?"  -- last expression is implicitly returned, but explicit
                      -- use of `return` is recommended as soon as the code
                      -- starts looking 'very imperative'

Try all variations in code, and see what you think!

If/when something's changed, it will be possible to "upgrade" the code automatically to rewrite any deprecated constructs, so no worries, use the variations freely!

Feature request: typeless variables

Essentially variants. Obviously they would incur a performance hit, but sometimes ease-of-use trumps execution speed. Use of a variant would never trigger a compilation error:

type T
    a-fn() ->

v = T()
puts v.some-method     # run-time error, not compilation error

Note that I am not talking about unioned types. I mean true typeless variables. Essentially Ruby variables.

Callable- / Function- / Method Invocation

Callable- / Function- / Method Invocation

This is just a brief overview. When it comes to named arguments - see its own separate issue: #16.

-- Methods and Functions --

-- are always _called_ when referenced - the function as value has to be
-- explicitly requested by turning it into a lambda (`f = ->my-func`)
this–is–a–call
this–is–a–call()
other-fn 1, 2, 3
other-fn(1, 2, 3)

-- an instance of a type having a `call` method can be called immediately as a
-- "functor". In line with other operator overloads, it would seem
-- appropriate to define it `()(x) -> x + 1` - but it looks slightly confusing.
type MyFunctor
   add-val = 1

   call() -> @add-val
   call(x) -> x + @add-val
   foo() -> @add-val * 2
end

my-fu = MyFunctor()
-- Just call it like a function
x = my-fu 2    --> x = 3 (maps to .call(x))
x = my-fu(2)   --> x = 3 (-""-)
x = my-fu.foo  --> x = 2

-- Except - when it has no arguments, then parentheses must be used to
-- distinguish between call and value
x = my-fu()    --> x = 1 (maps to .call())
x = my-fu      --> x = #<MyFunctor:0x1e14f90> (assigns the value of my-fu to x)

-- Lambdas have a call method - so same goes here
add-val = 2
my-lambda = (x I32) -> x + add-val
y = my-lambda 1   --> y = 3
y = my-lambda(1)  --> y = 3

-- Since lambdas rely on the same mapping to call, the same thing goes for
-- param-less lambdas as functors.

Something that may be desirable is UFCS, short for Uniform Function Call Syntax.
Incorporating UFCS means calling position / syntax is interchangeable between "method style call" and "function style call". See discussion in specific issue: #20

Let's showcase the concept briefly here:

val.method–call
val.method–call()
method–call val
method–call(val)
(method–call val)

fn–call val
fn–call(val)
(fn–call val)
val.fn–call
val.fn–call()

Feature request: auto coercion of some kind

This, came out of #27 by @stugol, I take the liberty to re-formulate the needs here:

do-something(value String) ->

a = ["fred", 3, true]
do-something a.0           -- compile error

The wish for the above is to be able to make the compiler choose do-something(String) and of course coerce the argument to String (in this particular case).

Standard Type Namings

Standard Types - Namings

NOTE - this is only partially implemented currently, and so, it's mostly a discussion of how/what to actually implement.

Since Onyx uses Crystal stdlib, there are already de facto names for common types. However, I feel there's need for some clean up out of the perspective of Onyx.

I'd like to favour terse names for the common types.

Proposed Type Names in Onyx

Type Name Description
Nil The ubiquitous Nil-type, which plays an important role in Onyx
Any Abstract type, mother of all types except Nil
Num Abstract type, mother of all number types (corresponds to Number in Crystal)
Int Abstract type, mother of all Int'ish (corresponds to Int in Crystal)
Intd Int default. Defaults to ArchInt, unless specified otherwise for a specific program
Real Defaults to ArchReal, unless specified otherwise for a specific program
Nat Natural - non negative Int that is not binary unsigned (this would require some additional internal changes to types to be enforced)
List Dynamically growable list (like vector, array, sequence in some languages. Called Array in Crystal)
Map Map, currently HashMap implementation-wise
Tup Tuple
TTup TaggedTuple, aka NamedTuple
Set Set (duh!)
Tag Called Symbol in Crystal and Ruby
Str String
Bool Boolean
Ptr Pointer - a lethal "raw pointer"
ArchInt "Architecture Int" data type: pointer bit width for most platforms
ArchReal "Architecture Real" data type: simply Float64 on most platforms

Have I forgotten some obvious one?

"Machine Level" Data Types

Keeping these slick could be good, and also tell-tales their "machine-closeness" (do use "cleaner" types like Intd, Real etc. for most things! These are for type-defs/performance/c-lib interfacing code).

Type Name Description
F32 32 bits wide floating point
F64 64 bits wide floating point
I8 8 bits wide signed integer
I16 16 bits wide signed integer
I32 32 bits wide signed integer
I64 64 bits wide signed integer
U8 8 bits wide unsigned integer
U16 16 bits wide unsigned integer
U32 32 bits wide unsigned integer
U64 64 bits wide unsigned integer

Suggested Definition of Arch* types

As you can see, heavily x86*-centric atm, has to be extended when other architectures are added.

Note that this pseudo-code is to showcase the definition, in reality it will be specified only as "bit-width for Int and Real, respectively", and not as aliases.

ifdef x86_64
    type ArchInt    = I64
    type ArchUInt   = U64
    type ArchReal   = F64
else
    type ArchInt    = I32
    type ArchUInt   = U32
    type ArchReal   = F64
end

Thoughts?

Defining Types

Defining Types

Before anything else, let's show case what some code currently looks like, to pitch your sensation whether it's intuitive without formal explanation. Then we'll take it from the top going through the details.

type Foo
   @foo-name Str 'get 'set

   init(@foo-name = "Foo Doe") ->

   say-name() ->
      say "My name is {@foo-name}."
end

type CompactFacts < flags Int8
   AppleLover
   PearLover
   Dude
   Hater
end

trait SomeFacts
   say-some-facts() ->
      say "These are some of the things I am: {@some-facts}"
end

type Bar < Foo
   mixin SomeFacts

   @some-facts = CompactFacts.flags Dude, PearLover, AppleLover

   say-name() ->
      say "Hey dude! I'm {@foo-name}!"
      say-some-facts
end

type AlsoBar = Bar

type BarOrInt = Bar | Int

type PointKind < enum
   Regular
   Special
end

type Point[T] < value
   @x T = T(0) 'get 'set
   @y T = T(0) 'get 'set
   @kind = PointKind.Regular 'get

   init(@x, @y, point-kind PointKind?) ->
      if point-kind is nil
         @point-kind = if @x > T(9999) ? PointKind.Special : PointKind.Regular
      else
         @point-kind = point-kind
   end
end

my-bar = Bar()
my-bar.say-name

if my-bar.of? Bar  => say "Well of course it is!"
if my-bar.of? Foo  => say "Yup, parents are parents, what are you gonna do?"
if my-bar.of? AlsoBar  => say "Yep - you guessed it!"
if my-bar.of? BarOrInt  => say "That too!"

other-bar = Bar "The Dude"
other-bar.say-name

my-point = Point 47.3, 42.1
my-point.x = 18.0

say "Kind of Point: {my-point.kind}"
-- my-point.kind = PointKind.Special -- no can do: no setter defined

other-point = Point 4018, 2096
yet-a-point = Point 10, 10, PointKind.Special

Motivation Out of Onyx Development Philosophy

  • The keyword type begins all type-definitions: consistent.
  • The type identifier is in the same place every time: spatial beacons.
  • The "type-builder-name" decides how the type-definition is interpreted: formally described consistency of the inconsistent. ;-)
  • The sane defaults let's you write cleaner code, excluding the "type builders" almost all of the time: brevity.

The syntax, concisely described (pseudo-formally), is:
"type" TypeName ( ("<" | "=") (builder_name AncestorTypeName) | builder_name | AncestorTypeName )? type_definition_block?

Wow that looked cryptic! Well, it's not. It's easy!

Apart from inheritance you can also mixin any trait you want. Traits are declared much the same as a type, but are simply pieces re-used in types at will. We'll get into that too.

Type definitions has sane defaults that makes life simple and beautiful. Let's start with an example to show that it really is simple, before explaining further.

Most often you'll declare what is commonly called a class in other languages: a reference type, or object type, the ellipsis in the example below would of course be the type definition (the member data, methods, etc.):

-- This is how simple it is, and what you will commonly write. (`object` type-
-- builder and `Reference` ancestor default added):
type Foo
   ...
end

-- Bar inherits Foo, type-builder is defaulted to `object`
type Bar < Foo
   ...
end

That is all you need to know to write even an advanced program!

But just to show of the pattern, let's show some more explicit ways of writing the above (which you shouldn't) simply to paint the picture and hopefully help your understanding:

-- All the following result in the exact same thing as the `Foo` declaration
-- in the previous example. (they're just annoyingly explicit)

-- more explicit as to type-builder. (here ancestor is defaulted to Reference):
type Foo < object
   ...
end

-- more explicit as to ancestor. (here type-builder is defaulted to object):
type Foo < Reference
   ...
end

-- even more explicit - stating all the obvious:
type Foo < object Reference
   ...
end

So why write it like that? Well, it only becomes important when you don't want the default (a reference/object type, or using the specific type-builder's default ancestor/underlying type) - read more about in the "The Type Builders" section. But first: generics.

Generics

Working with generic types is dead simple.
Look at the first example, type Point has a generic, T.

Generics follow the pattern of "Duck Lexing" like most things in Onyx: A generic variable can only be made out of one capital letter and optionally followed by number 1 - 9. Thanks to this, we get away with not having any certain markers to designate a free type variable in parameters for instance.
We may change this to allow type-style names, if you think that should be the case, let your voice be heard, and why.

Recap: to define a type that takes generic arguments:

type Point[T]  -- in this case, it's a reference obj instead of value as above
   x T
   y T

   init(@x T, @y T) ->
   get-scaled-x(s T2) T2 ->
      s * T2(@x)
end

type IntPoint = Point[Int]

type OtherPoint[T] < Point[T]
   do-more-stuff -> "Ok"
end

p = Point 47.3, 1.41
xs = p.get-scaled-x 2_f32

Note 1: you can use Type[OtherType, Bla] and Type<OtherType, Bla> interchangeably atm.
Note 2: Above we didn't use @ before member data names in the definition. Both styles are allowed atm. But, it might be better to require @name?

Above an instance of Point is created, assigned to p.
But, what? No generic arguments? No! See how the constructor is called with two Real literal arguments: T is inferred to Real, which makes the resultant type of p to be Point automatically.
If you want to state the obvious, you can, but that's a load of bollocks Java can keep pushing.

Now, as to the get-scaled-x method. The parameters s has a type variable, T2. It states that it wants the return type to be restricted to T2 also.

When we call get-scaled-x with the literal Float32, T2 is inferred to Float32, @x is converted to Float32 as is seen in the function body, and then multiplied with s. This is returned and matches the restriction of the return type (Float32).

As for inheritance and type aliases, the two stray examples above should cover it.

Now you know everything about generics. Basically.

For more about the syntax of generics, see issue #19

The "Type Builders"

This decides how your type definition is parsed and affects features of the resulting type. There are also "type builder modifiers".

Type builders are optionally followed by a type to base it on (ancestor for object/value or underlying type for enum/flags) and then, with the exception of 'alias' and 'sum' type-builder, the type definition block (which can be empty if no additional features are to be in the type).

Before you get all scared - the only type-builders you will ever explicitly state in code in practise are value, enum and flags. The rest are figured out automatically.

Type Builders

  • object -- a classic reference object, is implicit default so should not be used in writing in practice unless in some very esoteric program for clarity.
    • Default ancestor type: Reference
  • value -- exactly the same syntax for definition, but the instance is always treated as a value, when passed and allocated (on the stack that is).
    • Default ancestor type: Value
  • enum -- listed constant names in the syntax are automatically enumerated and Enum inherited by default.
    • Default underlying type: Int32
    • Automatically defines methods named after the constants for checking for specific value.
    • A host of methods suitable for working with enums inherited from type Enum
  • flags -- listed constant names in the syntax are automatically enumerated in powers of two and Enum inherited by default.
    • Default underlying type: Int32
    • Automatically defines None (= 0) and All (= all flags binary or'ed)
    • Automatically defines methods named after the constants for checking for specific value.
    • A host of methods suitable for working with flags inherited from type Enum
  • alias -- a pure alias, you shouldn't write this either, just using = followed by a type defaults to "alias type-builder"
    • No default - would defeat the purpose
  • sum -- a sum type, this is implicitly the type-builder when you use = followed by sum-type syntax (multiple types separated with |) for the type definition, so is not explicitly written in practise either.
    • No default - would defeat the purpose

Type Builder Modifiers

There's only one currently:

  • abstract -- you know it: makes the type abstract, and thus it cannot be instantiated. Abstract only works with type-builders object and value, none of the others.

There are some more builders and modifiers ideas for the future, they'll be mentioned in the end of this issue.

Examples

At this point, I think the initial show-case code can serve as example? Otherwise: request what needs to be clarified.

RFC: Alternative Ideas for Syntax

Note: Above described syntax is the one currently implemented - this below is just an idea.

Since the concept of type-builders is mostly formal, and only "comes in contact with reality" for flags, enums and value, another way of expressing above could be better.
Perhaps considering Value, Enum, Flags as "psuedo types" at definition time, basing the parsing rules upon which type-name is used as "ancestor". Inheriting any descendant of Value will then of course use "value" "builder".

type Point < Value
   ...
end

type MyEnum < Enum[Int8]   -- as a "pseudo type" the generic arg here is
                           -- instead "underlying type", even though Enum
                           -- proper is inherited
   ...
end

type MyFlags < Flags[Int8] -- as a "pseudo type" the generic arg here is
                           -- "underlying type", and instead Enum is inherited
                           -- and underlying type set to Int8
   ...
end

It does look better, but there are some magic non standard inheritance rules at play distinguished only by the name of the "inherited" type.

So, hmmmmmmm....

Type Description - The Guts

Most parts of it is exactly the same for all type builders with only small variations (automatic enumeration, etc.). Let's first go through the parts that are in common.

Mixins

In the very beginning of the type description you can use mixins.

   mixin NameOfTrait, OtherTrait
   mixin YetAnotherTrait

Member Data / Variables

Currently it's possible to write them by name only, or prefixed with @ (the way they are used in direct accesses).
They can be followed immediately by a specific type restriction.
They can be given default assignments that are used on instantiation if the initializer invoked don't address them.
There are pragmas to help with some common tasks: get, set (decided to opt out property in favour of simply typing both get+set.). This could easily be accomplished via a user land macro also - but that corrupts spatial proximity. It's also so common that a core implementation is worthwhile.

Type-Scoped (global) Vars

These are global variables really, namespaced and visibility tied to the specific type. Thus, they are shared by all instances.
They are prefixed with Type. or Self. currently, and follows the syntax pattern of member data.

type Foo
   Type.shared-var = 42
   Type.another-shared-var Int = 47
   Type.yet-a-shared-var Int
end

Member Funcs / Methods

Can be "public", "private" or "protected".

Read about the syntax for defining functions/methods in their own issue: #11.

init

The init function is special. It's used in instance construction.
For each type, new functions are automatically created on the type level. These allocate space for an instance, and then call the corresponding init method to initialize the instance.
You can of course define your own new functions for the type also, if you want to do some specific allocation magic.

deinit

The deinit method is invoked when the instance is culled by memory management. Called destructor in some languages.

Type-Scoped Functions

Just like type level variables, these are simply prefixed with Type. or Self..

type Foo
   Type.my-instance-count = 0

   Type.get-count() ->
      Type.my-instance-count

   Type.my-new() ->
      say "My own new function"
      Self.my-instance-count += 1
      return new()
   end
end

Foo.get-count     --> 0
foo = Foo.my-new
bar = Foo.my-new
Foo.get-count     --> 2

Other Constructions

  • ifdef is of course valid in a type description.
  • macros are allowed. (note: not implemented atm, soon!)
  • pragmas relating to: parsing, methods and data.

Other Notes

Types are re-openable at all times. This means you can expand upon type, or override methods on a type, anywhere in the source. You can spread out a type over several files. You can modify types from 3rd part libraries. Etc.

type Foo
   talk() -> "I'm foo"
end

-- re-open
type Foo
   my-number Int32

   init(@my-number) ->
   talk() -> "I'm foo, my number is {@my-number}"
end

foo = Foo 47
foo.talk        -> "I'm foo, my number is 47"

Possible Future Additions

  • builder actor or modifier sync - this depends a lot on what route concurrency takes in Crystal and Onyx.
  • modifier uniq / unique: for making a unique type from another (it copies all traits of the type, but inherits from the appropriate "root" type (Reference or Value). Reasonable example of usage:
EURSEK = 9.5  -- fun to watch the code a year later ;-)

type Kronor = unique FixedPoint<2>
type Euro = unique FixedPoint<2>

x = Kronor 100
y = Euro 10
-- z = x + y               -- not allowed - different currencies ;-)
z = x + Kronor y * EURSEK  -- explicitly made Kronor, now it's ok.

Language Development Support Apps

Language Development Support Apps

There are some applications surrounding the project that would make the design process much more reliable and would ease practical difficulties like GitHub not accepting highlighter for volatile young projects, etc. Here are the primary ideas briefly.

Online "Syntax Preferences" Study Research App

Interesting tool that wouldn't be too time consuming to implement. In order to actively do research on syntactic responses by humans (they won't even require coding skills, but a questionnaire for each participant will establish self reported capabilities and language skills, for reference).

The principal I've thought up is dead simple, and the tool can also be a help for teams and individuals to find the best style for them, that they might not have thought of unless being exposed to the study.

  1. For participation, simply authenticate via GitHub, Google, FB, or other market wide common ID verification.
  2. If first login, answer some questions on programming skills, favoured languages etc.
  3. Test begins:
    3.a. The same source code is displayed with different formatting styles, one to the left and one to the right.
    3.b. Participant clicks "prefer left more", "prefer right more" or "equally (dis)satisfying" (no distinction for disliking or liking both)
    3.c. The preference along with all formatting setting used on both sides, along with which side the preferred style was on (in order to distribute likely favoured composition between left and right, so no latent left/right spatial preference affects results) and the source-snippet used is stored in a record. A new pair is generated from the "genes" available. The more up-votes some traits have, the more often they come up, the more seldom "wild cards" are shown, and always then, only on one side (one "stable" reference at all times). A range of different source material is used, which could look better with different stylings. So preference may change for different code scenarios.
  4. After enough repetitions (one can quit any time, but the data is less reliable), a preferred formatting is stabilized. Contradictions and left/right bias, etc. can be used to evaluate the data statistically. Self reported questionnaire can help spot correlations between preference and former language experience, etc. This way a lot of intell on human perception of code can be collected. Which might be of value in modifying the syntax of Onyx.

The main effort here is the configurable stylizer which is a priority of mine after basic language foundation is in place.

Online "Spot the Error" Research App

This one is harder conceptually. This is to study the actual ease of bug-finding. There are such flame wars on the merits and pit-falls of explicit blocks, braces and off-side indentation structuring, so why not simply test it. A range of source snippets with erroneous structure is displayed, one by one. The participant simply has to click at "which spot" the erroneous "event" takes place (missing brace, unintended dedent, etc.) - in different programming languages! If the user can't comprehend a source in a particular language at all - there's a button for that too, "don't get it - next please".
After a couple of these, the same sources, or preferably just similar, will be gone over again - this time with compilers error output. The best error reporting compiler for different langs will be chosen (ie clang for C++).

This gives us two pieces of information

  • Which styles are easiest to spot errors in by eye?
  • Which compilers give the best help in locating actual error site?

Both are important. One for the syntactic structure of the language itself, the other for the compiler implementations error analysis.

Problems

The erroneous code must be realistic, and have an even spread of realistic occurrence between languages to give as true a picture as possible. This is not a competition - it's a tool to choose the most appropriate syntax for Onyx.

For block-langs it's simple even to synthesize "problem sources". For indent based it's trickier, because the arguments against are that you've indented or dedented entire blocks wrong. So how/when/why (cut n paste etc.) do this happen, how do we form a base of samples? Honest real errors, self reporting by users is probably ideal. But, it's that "honest unbiased" part that makes it tricky ;-)

Online Onyx Highlighting Image Generator

GitHub doesn't accept highlighting of non established languages (or any, until "preferably hundreds of repos are using it"), and also, because of rapid change older examples in issues will render wrong after changes. Therefore an online source highlighter that generates png's would be a good help. The image can be linked into the post in the issue to display the code highlighted. (the same path, changed from .png to .ox could also link to an active page with the source allowing participants to play with it quickly.
(Tool could generate GHF MarkDown for link and image in a go, for easy cut n paste.)
Provided sublime highlighter can be converted automatically to some format that can be used by command line tools to generate styled html, this should be fairly quickly coded.

Online Play with Onyx Sandbox

This is a must. When the compiler is in constant flux, only the bravest will have the energy to keep up with repo and re-compile. Others may be of very good benefit to be able to continually try code online, to evaluate new syntax and constructs and give feedback. This is what above generated source image should preferably link to. Being able to choose previous versions also, may help in quickly finding in what version a bug was introduced, etc.

The only hard part of this is to implement a safe enough sandbox - otherwise it's piece of cake.

Feature suggestion: brittle tuples

I'm not sure what this should be called, but I think it's important.

Consider the following code:

the-function ->
    try
        do-something
        {true, [""]}
    catch e
        {false, e}

success, data = the-function()  -- great :)
success2 = the-function()       -- not so great :(

if the-function()               -- this is really bad!
    ...

In this example, success2 would contain a tuple, of type {Bool, Array|Exception}. But that's not very useful. Ideally, we want some way to return a tuple from a function, such that any values not assigned would be lost:

success = the-function          -- second value is lost
success, data = the-function    -- second value is kept

In both cases, the type of success would be Bool, which is what we want.

Maybe some kind of "brittle tuple" class, that inherits from Tuple, and cannot be assigned wholesale to a variable or passed to another function or control structure. And a special syntax:

the-function -> return {{ true, "" }}       -- returns a Tuple that must be destructured
thing = the-function()                      -- automatic destructuring: only gets first value

[edit / ozra: latest distilled conslusion of issue in https://github.com//issues/24#issuecomment-250832621]

Concurrency

Crystal's concurrency is truly awful. Onyx should support true threading.

Please? ;)

Feature request: cascade operator

Allows calling methods of an object without assigning it to a temporary, and without affecting its eventual lexical value. I know .. already has a meaning, but in this case it would be located at the start of a line.

return MyObject()       -- would return an instance of MyObject, no matter the result of the other calls
   .. some-function()
   .. some-other-function()

Equivalent to:

a = MyObject()
a.some-function()
a.some-other-function()
return a

Maybe also allow it in-line, provided it was separated by a space, to disambiguate it from a range:

o = MyObject() .. some-function .. some-other-function
puts o.class            -- "MyObject"

Feature request: Ruby-style heredocs

Ruby allows methods to be called on the heredoc opener, rather than on the closer:

puts <<-END.downcase
    ...
END

Seems the Crystal devs refuse to support this.

Onyx Language Design Work Flow Philosophy

[RFC] Onyx Language Design Work Flow Philosophy

Onyx - the Name

The specs of the language began somewhere in Cython then later on went under the project code name Jytron (when JS-centric), then Qi, then Fixell, then Qwoe (compiling to C), and in between a host of other names I'm lucky enough to have forgotten. When finally finding the great semantic AST-core that is Crystal (which syntax layer is inspired by Ruby), I then thought it apt to derive on the name concept even though Onyx syntax has "nothing" to do with Ruby.

Or in the words of Bender: "The X makes it sound cool."

Onyx Development Methodology

The essay "The Programming Language Wars" by Stefik and Hanenberg expanded upon some ideas I've carried, urging on the responsibility of language designers, and made me decide on at least trying to implementing a more scientific approach to implementing a language. The focus being us human beings' behaviours and cognitive capabilities in relation to reasoning about, understanding and expressing a solution to a problem in code.
Now, I'm not a scientist per se, and so the approach can hardly be compared to that of a theoretical physicist. I'm practically oriented in solving the problem of problem solving via programming, so I rely on you to help out if claims I make are lacking in proof or methodology.

There's no clear cut solution. Many aspects of human behaviour and cognition come in to play, sociological aspects too. We must think out of the box - but cannot forget the box - because many shy away from anything that looks too alien. Therefore, a new language must build on prior concepts and have a familiarity coders can relate to, while still adding value in cleaning and leaning up, reducing confusion, removing legacy constructs and notation, clarifying concepts. So there's a balance between innovation and familiarity. And the target should be somewhere around "what will work now, and still work in 15 years, while being quickly picked up by a decent coder?".

What aspects should be stricter, and what areas should allow more individual expression?

Current First Actions Performed

I've implemented things I believe are good things (TM), then I've added constructs that fit the criteria of the current "scientific approach", even when I haven't thought much of them myself, in order to not overly bias it to my opinion. This way more constructs are available for you to try out and give feedback on. As such there's currently a multitude of keywords and syntax to express the same thing for some constructs. This is meant to be reduced by trial and discussion.

Once the very basic functionality of the first draft is in place, I will move focus to the configurable formatter, this is the basis for developing the "Syntax Preference Study Research App" (see #4) which I believe will be a great help guiding the development in the right direction. Granted it only touches upon the syntactic aspect, and not on semantic ones, but that is one portion of it that can be addressed at least.

This combined with qualitative (and some inevitable flame wars I presume) discussions in the RFC's hosted in the issues in the GitHub repo will be the main guiding tools for the direction of the language.

Apart from that, studies carried out on actual semantic constructs are needed, and if you find such, please chip in, and also help out with your interpretations of the data.

Human Relations - Cooperative, Catalytic, Synergistic Collaboration

  • Try all possible styles - even ones you don't like - to understand it from more perspectives. You might even discover a new even better way.
  • Expect a better solution to come up from discussions than your original / personal proposition.
  • Don't "fight" for something. Argue and reason for it!
  • Onyx original developer always has the last word - trust it or don't. The aim is for good stuff (TM) - not mob-rules majority-by-a-hair "democracy".

Nuts and Bolts - Reasoning Approach on Constructs

  • Constructs (depending on code mode [see #***XXX]) should fit both (or either [code mode]) scripting / one-off style and huge structural applications.
  • Human readability and comprehension of code first.
  • Human writability of code second.
  • Compilation speed third
  • Compiler "parse-ability" of code fourth.
  • Tool-ability of code (highlighting, diffing, greping when no code-aware editor avail, etc. considerations), fifth.

Favouring all at once is a double-double-win!

Above rephrased

  • Does it feel natural to write? What would you write in pseudo code?
  • Is it visually clear, and intent clear?
  • How common is the construct? Does it warrant syntax sugar?
  • Are there any studies or empirical material to go on, suggesting it being preferred / better / superior / good.
  • Will the notation create more ambiguity?
    • The less ambiguity the better - but it's a balance act with
      the pro of terseness.
    • Less ambiguity makes it easier to analyze code after an error and suggest most likely causes to the user, this is good to do heuristically when the regular parsing comes to a halt (parse code, fail -> parse heuristically for likely intent -> message).
  • Could it be solved just as simply with an existing construct or a macro (did you cross the river to get water)?

Onyx Compiler Implementation Philosophy

Complexity is Best Solved by Simplicity

As much as possible of everything that happens should be implemented in "user space" as callables (and the core of that is already implemented in the onyxbase "stdlib" [from Crystal]).
To the extent possible, the language itself should "only" be sugar, down to the level of calls, types and basic structural constructs. This means almost anything is possible for a developer to override and change if needed, simply by implementing or overriding methods.

Feature request: pipelining operator

Like the pipe operator, but works on elements of collections. Accepts local free functions, proc objects and blocks. Also accepts methods of the items.

fn(value) -> value + 1
puts [1,2,3] |>> fn |>> fn  -- outputs [3,4,5]
puts [1,2,3] |>> .to_s      -- calls a method on each object, outputs ["1", "2", "3"]

Essentially sugar for map. But nice sugar.

We could also have ?> for select, !> for reject, and so on.

Feature request: Different syntax for array and hash types

Swift and Objective C allow for the following syntactic sugar for array types. I suggest also a variant for hashes:

  • Array(T) can be written [T]
  • Hash(K,V) can be written {K => V}

Example:

a-function(messages [String], pairs {String => String}) ->

messages [String] = ["hi"]
pairs {String => String}
a-function messages, pairs

Syntax Structure

[RFC] Syntax Structure

"offside syntax": significant indent and optional end tokens

Yes ,you do know how to write proper sentences ,right ? It' s
  quiteannoying  to read Improperly Formatted sentences
   .Even when
 the content is good . Here 's a run_down with about forty_seven
  Reasons
   :
     1 . You convey a message , getting the  words right is hard enough
 1 .1 : Our reasoning minds ( especially programmers) are spatially orientated
   1-2-  Not structuring information so that it conveys intent ,spatially, is a bad mistake.
1.3.
   2.  Using non_spatial constructs to convey it : {2.1 doesn't help that much does it ?;2.2 At least I don 't think so;}3. My eyes are already bleeding from this exercise— maybe someone more clear_headed likes this ?;}

Structure, indentation and white space is important. If above didn't convince you... Yes, yes, no-one worth their salt writes with such bad style -- granted -- though it did get the point across. I hope!

Onyx is structured via indentation and has optional end tokens to make block ends explicit when wanted.
There's experimental pattern matching support for the end-tokens also, making it possible to match up intended block starts even more explicitly. This will likely be dropped from the syntax though.

Details:

  • Both spaces and tabs should be allowed - but never mixed! Currently three spaces (yes you read that correctly) is recommended style.
    • Old-school-eight "costs" way too much column estate when sticking to 80 chars limit (which still is a good practise: edit three files side by side, diffs, etc.).
    • Two-places is too unclear - code bases where I've seen indent-fix-only check-ins have been two-places indented (and non offset-ruled)
    • After lots of experimenting over the years I've found the odd number of three to be the over all most clearest indent size. Discussions on this regarding some standard style guides is welcome.
  • White space is significant in expressions to some extent.
    • Some operators have one meaning at the end of a expressive unit, an identifier, a type, and another meaning if spaced. It's obvious when seen, really. This benefits us with terser, more concise ways of expressing more things.
    • From the top off my head: (my-perfectly-legal-identifier-for-a-fn? 47) and (my-perfectly-legal-identifier-for-a-fn! 47, which is not the same as my-perfectly-regular-identifier-for-a-fn !47)
    • It is very clear even without syntax highlighting (I often read up on code bases via thick piles of print outs - perfect in the garden). And with highlighting - the separation is as clear as day and night.
    • Onyx allows juxtaposition calls, and parenthesized calls: a-func arg1, arg2 is the same as a-func(arg1, arg2), but watch out for: a-func (arg1, arg2) - that called the method through juxtaposition rule - the parentheses instead expressed a grouping (which would syntax error in this case). A reason why the ident SPACE ... precedes: a-func (foo == true), (if bar == 47 ? 1 : -1).
  • Hyphens (dash), underscores, ENDASH or camelCase can be used interchangeably in identifiers - do yourself a favour and stick to one style though! That means, if there's a method defined as the-method in a lib, you can use it as the_method - if you're hell bent on avoiding fluidly readable identifiers. And vice versa.
    • a study comparing snake_case and camelCase concludes that camelCase is much slower to parse for the human brain.
    • lisp-case is much more fluent to read than snake_case, requirng less eye-relocation-movements, and it's becoming more and more popular in URI's, filenames, etc. not to mention that it's common place in (esp. germanic) natural languages. It is definitely preferred from what I can tell. However, as mentioned, currently all styles are allowed interchangeably. And studies relating to this are still very few, and not of highest quality. So there's room for discussion here.

Here's an example showing off one of the many ways below code can be written and formatted (specially cond/case syntax currently has a shitload of variations to choose from - waiting to be reduced!), including the experimental pattern matching end token extension.

type Foo
    mood = #happy
    bar = "Hey"

    speak(word, mood) ->
        @mood = mood
        speak word

    speak(@word) ->
        speak

    speak() ->
        cond @mood
            #happy
                say "{@bar} - Yeay!"
            #sad
                say "{@bar}. Sob."
            *
                say "{@bar}."
        end
    end~speak
end-type  -- could have done `end-type~Foo`, `end` or nothing.

-- don't do the below at home kids!
my-foo = Foo()
my_foo.speak "Hello", mood: #workaholistic
myFoo.speak "Hi again"

Prior Art

For offside / indentation significance: ISWIM, Python, Haskell, Nim, Cytoc-Cython ;-), LiveScript, CoffeeScript, SugarCpp, F#, etc.

The voluntary end tokens look more like Ruby, Lua, Pascal, etc., and - dare I say it - Visual Basic, for the extended semantic matching.

Preference and Motivation

I've coded in C, then C++, for a loooong time (well, aside from hoards of other languages and a lot of assembler of course). But already in '99 I had enough with the braces-semicolon syntax. I coded a simple heuristic transpiler called Cython (no relation what so ever to the project with the same name that came 9 years later), obviously inspired by Python lexicography.

I find the indent based structuring superior, after having used both, side by side, for about 15 years. "If it looks right - it is right". Not to mention that studies on the programmer's mind seem to hint at it being a better choice. Lacking only one thing in most languages that leverage it; one very important thing; the possibility of ending blocks explicitly (and doing it for long/complex blocks).

When scopes grow over more than a few lines, or nesting depth grows, those dangling ends become very... loose.

The perfect solution is of course: both!

The recommendation, as I see it, is to absolutely not use end-tokens in constructs that are just a few lines, with no complex nesting - it's just visual clutter. Beyond that it's much clearer to use explicit end-tokens, and you can catch and contain scoping and localize syntax errors much better.

Some References

Iterators and Loops

Iterators and Loops

First, a friendly message: most often you can solve things in a functional way, using map, fold, filter, etc. resort to such when possible, before reaching for "imperative style loops".

The Basics - The While Loop

The foundational structural imperative construct while likely needs no introduction. There is no do .. while, use explicit break condition in the loop instead.

x = 0
while x < 47
    do-stuff-with x
    x += 1

Func / Method and Soft Lambda Based Loops

All more complex iterators are implemented as funcs / methods taking a soft lambda (read #14 if you haven't already) as argument. The for-construct is syntactic sugar mapping to these methods.

list = [1, "foo", 3.14]

list.each |val|
  say val

list.each-with-index |val, ix|
   say "{val}, {ix}"

10.times |i|
    say i

(1..10).each |i|
    say i

(0...10).each |i|
    say i

For Loops

For-loops in Onyx is just sugar for calls to de facto named methods each, each-index or each-with-index, depending on which values are used.
If the for-loop is kept in the language, it will likely be extended to handle more advanced iterations later on like SIMD-chunking etc.

[RFC] An abundance of notational styles are intially available - let's find the best fitting one and remove the others. Or - should it be kept at all? Simply stick to callables + soft lambdas and remove the for-construct completely?

list = [1, "foo", 3.14]

-- common variant, shown with two different nest starter tokens

for val in list => say val
for val, ix in list: say "{val}, {ix}"

for val in list
   say val

for val, ix in list
   say "{val}, {ix}"

for ,ix in list
   say ix

for n in 1..10
    say n

for n in 0...10
    say n

-- more esoteric variants, will likely be ditched!

for ix:val in list
   p "{val}, {ix}"

for ix: in list
   say ix

for val[ix] in list
   say "{val}, {ix}"

for [ix] in list
   say ix

Feature request: unnecessary whitespace lowers operator precedence

This would tidy up certain expressions that currently involve ugly parentheses.

puts 2 * 4+1            -- equivalent to 2 * (4 + 1)
puts 2*4 + 1            -- equivalent to (2 * 4) +1
puts 2 * 4.increment    -- equivalent to 2 * (4.increment)
puts 2 * 4 .increment   -- equivalent to (2 * 4).increment

Conditional Branching Constructs

Conditional Branching Constructs

If / elif / elsif / else

The familiar branching construct needs no introduction, except that it can be used as an expression too (as most things in Onyx).
elsif and elif can be used interchangeably at the moment. After using both a lot, elif is preferred imho, and better yet: if you have an 'elif' case, prefer using a cond-list instead!

if something
   do-stuff
elif other-thing
   do-other
elsif further-thing     -- elif and elsif are usable interchangeably for now
   do-further
else
   do-last-resort-stuff

There's an alternative nest-block-begin syntax for if, "mimicking" ternary conditional, while still being prefixed with if:

a = if x ? 1 : 2

-- following are the generic block-nest-starters:

-- The same as:
a = if x then 1 else 2

-- or
a = if x => 1 else => 2

-- or
a = if x: 1 else: 2

-- or
a = if x
       1
    else
       2

-- or...

You can also use suffix-if. It is best for early return patterns, use with discretion.

my-fun(x) ->
   return if x is 1
   say "x is forty seven" if x is 47
   do-something x

There's also unless, which is the inverse of if and can be used the same way as if.

Likelyness Hinting Keywords

likely and unlikely can be used in conjunction with the branching construct. This should preferably only be used if it helps clarify intent to the code reader, but it is also of help to the optimizer to re-order code sections to increase cache hit, which can be significant in tight loops.
Discussion on pros and cons of these is welcome.

if unlikely some-stuff == weird-thing
   seldomly-done-stuff
elif foo == 47
   other-stuff
else likely
   common-stuff

This is currently what I call a "ghost implementation" - you can use the notation, but until it's decided a keeper, it is not implemented all the way to the LLVM optimizer.

[RFC] This might very well be ditched.

Cond / Case / Switch Branching

There is currently a shit load of ways to express this construct. Most of it is just multiple keywords to initiate the expression. This should definitely be reduced, so try them out and give your opinion.

The keywords are currently case, match, branch and cond. And yes, the constructs can be used as expressions - of course! If you prefer a more elaborate description of the notation - comment on it.

Here are some different ways you can express them (taken directly from a mash-up spec-test file, so focus on what matters ;) ):


-- crystal style 1 `case ref`
n = 593

case n
when 1, 2
   say "NOP: 1||2"
when 3
   say "NOP: is 3"
else
   say "YES: {n}"
end

-- crystal style 1 `case`
case
when n == 1
   say "NOP: 1"
when n == 47, n == 593
   say "YES"
else
   say "NOP {n}"
end

-- crystal style 1B `case ref`
case n
when 1, 2
   say "NOP: is 1||2"
when 3
   say "NOP: is 3"
else
   say "17.1: {n}"

-- crystal style 1B `case`
case
when n == 1
   say "NOP 1"
when n == 47, n == 593
   say "17.2"
else
   say "NOP {n}"

-- crystal style 2 `case ref`
case n
   when 1, 2
      say "NOP: is 1||2"
   when 2
      say "NOP: is 3"
   else
      say "17.3: {n}"
end

-- crystal style 2 `case`
case
   when n == 1
      say "NOP 1"
   when n == 47, n == 593
      say "17.4"
   else
      say "NOP {n}"
end

-- onyx style 1 `case ref`
match n
   593
      say "18"
   2 =>
      say "NO is 2"
   *
      say "NO {n}"
end

-- onyx style 1 `case`
branch
   n is 1 =>
      say "NO is 1"
   n is 593 =>
      say "19"
   * =>
      say "NO {n}"
end–case

-- onyx style 2 `case ref`
cond n
   593
      say "19.1"
   2 =>
      say "NO is 2"
   *
      say "NO {n}"

-- onyx style 2 `case`
case
   n is 1
      say "NO is 1"
   n is 593
      if false
      else
         say "19.2"
   *
      say "NO {n}"

-- onyx style 3 `case ref`
match n
   1 => say "is 1"
   2 => say "is 2"
   * => if false => say "NO" else say "20: {n}"
end–case

-- onyx style 3 `case`
branch
   n is 593   => say "21"
   n is 2     => say "NO is 2"
   *          => say "NO {n}"

-- onyx style 4 `case ref`
case n
   1 do say "NO is 1"
   2 do say "NO is 2"
   * do if false then say "NO" else say "22: {n}"

-- onyx style 4 `case`
branch
   n is 593   then say "23"
   n is 2     then say "is 2"
   *          then say "{n}"

-- onyx style 5 `case ref`
match n
| 593
   say "23.1"
| 2
   say "NO is 2"
| *
   say "NO {n}"

-- onyx style 5 `case`
branch
| n == 1
   say "NO is 1"
| n == 593
   if false
   else
      say "23.2"
| *
   say "NO {n}"

-- onyx style 5 `case` one liners
branch
| n is 1    => say "NO is 1"
| n is 593  => say "23.2"
| *         => say "NO {n}"

-- onyx style 6 `case ref`
match n
   1: say "is 1"
   2: say "is 2"
   *: if false => say "NO" else say "20: {n}"
end–case

-- onyx style 6 `case`
branch
   n == 593   : say ": 23.3a"
   n == 2     : say "is 2"
   *          : say "{n}"

-- onyx style 6b `case`
cond
   n == 593:   say ": 23.3b"
   n == 2:     say "is 2"
   *:          say "{n}"

Used as an expression, for completeness. Here with indent-cases and pythonish nest-start:

x = 42
y = 5
msg = case
   x is 47: "Yeah! x is forty seven!"
   y is 47: "Nice! y is forty seven!"
   x + y is 47: "Phew, at least x+y is forty seven!"
   *: "Found no forty seven in my universe"

say msg

Language concerns

I notice that braces are used for string interpolation:

value = 1
puts "{value}"

The omission of the Ruby/Crystal # prefix makes unintentional interpolation far more likely, to my mind.

What is the Onyx syntax for a block or proc?

I like how you can instantiate a class without the new keyword: instance = MyClass().

Concern: interpolation syntax

I notice that braces are used for string interpolation:

value = 1
puts "{value}"

The omission of the Ruby/Crystal # prefix makes unintentional interpolation far more likely, to my mind.

Docs

Can you provide a link to the docs in the README.

How usable is Onyx currently?

I realise the language is still in flux, and subject to change. But how usable is it currently? Can I write a program in Onyx instead of Crystal and expect it to work?

(Also, does Onyx suffer from the ridiculous signed/unsigned int problem I recently reported on the Crystal forums; and which has apparently remained unfixed for TEN MONTHS?)

Exception Handling

Exception Handling

It's rather straight forward. Keywords vary a bit between languages. I'm more used to "catch" than "rescue", but when I saw it, it quickly grew on me.

If you come from Crystal, Ruby, Python, etc. there's else; that term is slightly confusing, especially on its own, which is why that is not allowed in some of the languages. Having that option on its own has use cases though. In Onyx it's called fulfil. When things goes well, fulfil is always called.

This way it can be used for a code block at end of functions that is always called no matter where the exit point (as long as it's an exception less exit).

Here's an example that should make everything clear enough (I hope).

fun-with-exception-action(x) ->!
   try
      a = 1 / 0

   rescue e IndexError | ArgumentError
      say "In fun: Rescued {e}"

   rescue DivisionByZero
      say "In fun: Rescued divizon by zero"

   rescue e
      say "Rescued some kind of shit"

   fulfil
      say "In fun: Nothing to rescue - yippie!"

   ensure
      say "In fun: Oblivious to what happened!"

   a = 1 / x
   nil

fulfil
   say "e of fun-with-exception-action - ONLY on SUCCESS!"

ensure
   say "e of fun-with-exception-action - EVEN on RAISE!"


say "fun-with-exception-action(1)"

try
   fun-with-exception-action 1
   say "after call fun-with-exception-action"
rescue
   say "rescued fun-with-exception-action in Program"

say "after try/rescue call fun-with-exception-action"

say ""
say "fun-with-exception-action(0)"

try
   fun-with-exception-action 0
   say "after call fun-with-exception-action"
rescue
   say "rescued fun-with-exception-action in Program"

say "after try/rescue call fun-with-exception-action"

=>

fun-with-exception-action(1)
In fun: Rescued divizon by zero
In fun: Oblivious to what happened!
e of fun-with-exception-action - ONLY on SUCCESS!
e of fun-with-exception-action - EVEN on RAISE!
after call fun-with-exception-action
after try/rescue call fun-with-exception-action

fun-with-exception-action(0)
In fun: Rescued divizon by zero
In fun: Oblivious to what happened!
eof fun-with-exception-action - EVEN on RAISE!
rescued fun-with-exception-action in Program
after try/rescue call fun-with-exception-action

Automatic Expansion of Unqualified Enum Consts

When making a lookup for method matching arguments, if none is found, and there are candidates with explicitly typed args that are enums, the args will be looked up in the enum-consts to look for matches. This way cleaner code can be written:

type Colors < enum
   Red
   Blue
   Ubiquitous

say-color(c Colors) -> say "Color is: {c}"

say-color Colors.Red  -- the "old" explicit way
say-color Red  -- the new freshness way

Nil Handling Sugar

Nil Handling Sugar

This is strongly wanted. I was inspired by LiveScript, Swift has it too, etc.

The gotcha is crystal compatibility which is desirable to maintain, because of funcs ending with ! and ?.

The sugar is when an expression is ended with ? or ! and followed by an identifier, or dot+identifier (alternatives): callable?only-if-non-nil or callable?.only-if-non-nil.

As usual, the approach is that methods are used "behind the syntax", and therefore the construct can be exploited by the programmer if wished.

x = a?b?c.d becomes x = a.try ~.b.try ~.c.d)) internally (canonical form), that is, the try-method is invoked, with the arg being a "soft-lambda". For Nil try is defined as nop, which is why it works.

The Caveats (or Features...)

Crystal stdlib has a de facto standard of suffixing method-names with ? if they return Type|Nil or Bool. This is inline with the behaviour, but clashes with the notation.
This is not necessarily a problem, but can instead be utilized as a feature!
For 'a?b' (or 'a?.b'), we would first look for a func named a? (unless it's a variable of course), if that is not found, we look for a. By following the pattern of primarily putting nil-check functionality in identifier? that is prioritized for this pattern.

So, to the other end: a!b!c.d or a!.b!.c.d.
This of course means "a is not nil - if it is: throw exception! ...", and so on for the others.
It translates to canonical form a.not_nil!.b.not_nil!.c.d.

Here, the naming scheme usage in Crystal stdlib is a bit more varied. It generally means "beware, dangerous method". For instance, in many cases it means "value or throw if nil", which is what we want, but in some cases it means "mutates stuff", and in other cases "value or throw"-methods are named without suffix !. Surprising behaviours inherited from Ruby (!)

Since this feature must rely on de facto convention of func/method naming, it needs some additional thought and examination of actually used patterns.

I would definitely favour its' implementation. The reason is that for asserting a value is not nil to the type inferer you generally assign a tested return value to a local var (which can't change from the outside), that's a design decision in the code. But if you do want to get the latest value no matter how or where it might have changed from, from a certain instance, you would want to test it at every reference. Also, when making a lookup deep in a tree of instances, it's also very clear and concise.

Syntax Alternatives

Show casing only the "try chain" side of it:

x = a?b
x = a?.b
x = a??b
x = a??.b

Personally I prefer idfr?other, with no additional dot. Since Onyx is space sensitive to some extent, this is one of the places it could be that.

Basic Lexical Elements and Value Literals of Onyx

Note this issue only goes into the surface of constructs: the lexical aspects, for type-definition, etc. there are separate "Doc / RFC"-issues.

There are some [RFC] markers in this text, those are for lexical elements that are very much up to debate. You can question any of them. But those are ones in need.

Identifiers

Variable and Function Identifiers

my–identifer = "47"  -- here using ENDASH in the identifier

if my_identifer == my–identifer => say "Yep - snake case is interchangeable!"
if my-identifer == my–identifer => say "Yep - hyphens (lisp case) also!"
if myIdentifer == my–identifer => say "Yep - even camelCase works!"

my-fun-with-qmark?(foo) ->
   foo == "Say what?"

bar = my-fun-with-qmark?("Say what?")  -- => true

Internally the separators are all represented the same way and therefore comparable.

  • I decided to transparently support (–|-|_) (ENDASH (\u2013), hyphen, underscore) interchangeably as "word-delimiters".
    • For the choice of ENDASH - of all available unicode characters - this came about through a long heated debate when I was hacking on the Nim compiler. If you want some motivations you can probably find that discussion with a search
  • Since there is a big crowd favouring humpNotation, I've now added experimental support for that too, though not being a fan, but for completeness in this sandbox-try-everything-out-phase of Onyx.
  • The rules for camelCase are, a capital latin alpha is translated to '_' + lcase(char) - keep that in mind!
  • This behaviour was inspired from Nim. There is a major difference though: Nim throws away the delimiter-information (case insensitive past the initial character), in Onyx a delimitation is a delimitation, it can simply be expressed differently.
  • To maintain Crystal-compatibility: Callable identifiers may end with ? and !.
  • The (?|!) ending characters are definitely up to debate in entirety - especially since nil-handling sugar using identifier?method-to-call-if-not-nil() and identifier!method-to-call-if-not-nil-otherwise-throw() will be implemented (if no better idea turns up?). The favoured idea in my head atm is that the nil-handling notation favours callables ending with ?, and if not defined, looks for one without it.

Quick (Biased) Comments

  • There are some studies that taken together point to dash-notation being better for comprehension, snake_notation second and humpNotation a far down third.
    • There are some studies pointing the other way, but examining them, they were rather flawed in their design.
    • Current studies must be seen as rather inconclusive.
    • As with all things programming there is a big gap in the scientific community - any more studies that might apply would be welcome!
  • I personally warmly recommend the ENDASH or lisp-case notation. After 28 years of coding, with a "always looking for the better way"-mindset and trying all kinds of styles, that's the one I keep favouring more and more. Much more fluent. The ENDASH "grabs" the neighbouring chars more, and might look more fluent than hyphens in many (monotype) fonts. This is however entirely up to your project's style specification. Personally, in practice, I end up using hyphen.

Types

type MyType
   some–member Int32 = 47
end

my-type-instance = MyType()

Type names are always initial capital

Pros

  • Visual disambiguation - good for human
  • Less ambiguities already at parsing stage - good for compiler speed

Cons

  • For a big team / company with a mother tongue written in non Latin letters this means that iff you'd like to write the code in your own
    language, types must begin with, say, latin T or similar ad hoc scheme (we can't practically consider capitals in every possible script, and there's not the notion of captials distinction in all).
    • It could be argued that english is the lingua franca, especially in programming contexts, and as such it should be favoured as basis for symbols in source code. (I'm not native english speaking myself, although do have a latin alhabet superset - which makes me biased)
    • If it would be of great benefit and regarding a major language (Japanese, Chinese, Russian, etc.) this could be reconsidered in the future, if not too intrusive - also: when to stop adding? Pandoras box...
    • I hold the position for now that pseudo-english should be the lingua-franca of programming, and thus the scheme holds water.

Prior Art, Preference and Motivation

This idea has followed my language design ideas for about a year now. When I stumbled upon Crystal, I saw it used the scheme also. Click. As Crystal now is in the family from AST-level down to LLVM - this is set in stone.

This, along with constants, also form the notion "capital initial letter = compile time fixed symbol".

Constants

MyConstant = 47

type Foo
   MyFooConstant = 47
end

  • Constants, just like types, are a compile time construct, constant at run time. Therefore they share the lexical notion of capitalization.
  • This lets us get rid of having to specify const constantly [hehe], keepig source cleaner and more focused on logic terms than formalia word noise.

Constants aren't "dangerous", they're the "safest part" in code, so why should they have a "shout out look"?
Well, they have a more "formal" importance. If you see code compare x to thingie. What is thingie now? But if x is compared to Thingie, you now that Thingie is a formalized important concept. So it does hold higher system wide importance. I believe this justifies capitalization apart from it's status as compile time constant, which in that regard is a less important factor. In addition it helps speeding up compilation time.

If you're hell bent on having some constant lower case, you could wrap it in a function - compiled in release mode this will be the exact same machine code and exact same speed:

CRUDE_PI = 3.14
crude-pi() -> CRUDE_PI

say "Hey, my lowercase 'constant': {crude-pi * 2}"

Global Variables

-- currently:
$my-global = 47
$my-thread-bound-global = 42 'thread-bound

  • Globals are prefixed with $ - because it looks ugly (and makes them pop out clearly)

Namespacing Paths

You simply use dots: SomeModule.SomeType.a-func()

Literal Values

Numbers

my-int-number = 47
my-real-number = 3.1415
my-hex-int = 0x2f

my-literal-typed-int = 47u64

-- likely future idiomatic way (no longer typed at literal)
-- my-literal-typed-int U64 = 47

my-big-number = 1_000_000_000  -- underscores can be used to clarify

[RFC] The literal typings will be removed. Currently a literal int is typed StdInt by default, and then if assigned to a var that is typed as, say UInt8, it fails because of type mismatch - which is ridiculous from a human being's perspective.
The type inference will be improved for this - just have to figure out the "right way" to implement it conceptually.

The data type is StdInt* for integer literals by default.
The data type is Real for real literals by default.

(*) Note StdInt will be changed to be called simply Int, provided coordination with Crystal team holds.

The data type used for the literals can be changed, either explicitly as above, or through parse-pragma:
'int-literal=BigInt - this would cause any literal integers to produce BigInts instead.
'real-literal=FixedPoint[4] - you get the picture.

The variables in the above examples are inferred to the type of the literal - they're not dynamic.

Tags (aka Symbols)

my-tag = #some-tag

my-fun(foo Tag) ->  -- note, you don't have to specify the type - inferred!
   case foo
   when #some-tag    => say "It was some tag"
   when #other-tag   => say "It was other tag"
   else              => say "It was {a} - which I don't recognise"

my-fun #funky-tag  --> "It was funky_tag - which I don't recognise"

Tag (think "hash-tags"...) are unique program-wide, they get a unique Int32 number internally, and so are very efficient. Preferably you should use enums, but in some cases, just having ad hoc tags is very convenient.
As easy as using strings as identifying tokens; but with the performance of an Int32.

Strings

my-string = "A simple string"
my-interpolated-string = "Interpolation: {my-string} with sugar on top!"
-- any kind of expressions can go in the interpolation brackets of course!

the–str = "111kjhgkjh" \
   "222dfghdfhgd"

--> "111kjhgkjh222dfghdfhgd"

yet-a–str = "111kjhgkjh
   222dfghdfhgd
   333asdfdf
"
-- above preserves the white space and newlines

my-straight-str = %s<no {interpolation\t here}\n\tOk!>
-- for the %* string notations, you can pick your delimiter chars yourself,
-- which ever makes the particular string clearer: `<...>`, `(...)`, `{...}`
-- or `[...]`:

The data type is Str / String

Chars

my-char = _"X"

[RFC] Chars are no way near common enough to warrant wasting a unique symbol on (like single quote for instance, which has several other, more important, functions in Onyx).

Was first c"X", but then changed to %c"X", which follows the pattern of the other "special string literal notations", but I decided to at least give chars some special treatment, going `%"X", but after some use, it looks noisy, so tests underscore variant now.

Regular Expressions

my-regex = /^match-stuff.*$/i
match = my-regex =~ "does this match?"

The =~ above is of course a generic operator that can be implemented for other purposes for other types.

A consideration could be to change the syntax to prefixed-string, like Char:

my-regex = r"^match-stuff.*$"i
match = my-regex ~= "does this match?"

However, in much network programming, which is quite common, regexes serve a steady role, do explicit sugar syntax for them seems warranted.

The resulting type is Regex.

List - a dynamically growing (Vector, Array, Seq, Sequence, etc. in other languages)

my-list = [items, go, here]
other-list = [
    "a string"      -- commas not necessary if newlined
    47,             -- but are allowed
    1.41
    ["nested list", "ok, duh!"]
]

-- type of above is List< Str | StdInt | Real | List[Str] >

an-empty-list = [] of Int   -- empty list has to be typed (since there are no
                            -- values to infer type from)
another-empty-list = List[Int]()  -- same result as previous line

For details on List vs Array see issue on basic data types: #***XXX.

You can make Listish literals with arbitrary type also, see Set for notation.

As is obvious by now: the resulting type is List, where T can be a sum type.

Tuple

my–tuple = {"foo", 1, #bar}

[RFC] It is desirable to use (items, here) notation for tuples, because braces are never used for tuples in mathematical notation. It does however make syntax a lot messier, since both expression grouping and lambda-parameters use parentheses. The current tuple notation would be much better of used for set-notation!

Set

my–set = Set{"foo", 1, #bar}

Any type can be used as prefix as long as it implements the [](ix) method, this is therefor a generic "listish" syntax.
[RFC] Set unfortunately doesn't have it's own literal for now (compare tuple above).

Map - Hash Map

string-keyed–hash = {"foo": 47, "bar": "bar value", "qwo": ["a", "list", "here"]}
tag-keyed-hash = {
    #foo: 47    -- commas not necessary when newlined
    #bar: "bar value"
    #qwo": [
        "a", "list"
        "here"
    ]
}

string-keyed-hash-js-style = {
  var_name: "a value"
}

some-var = #a-key
other-var = "another key"

variable-keys = {
  some_var => "some value"
  other_var => "other value"
}

-- type of above would be {Str|Tag => Str}

[RFC] Note, I will change the syntax for: [ed: this is changed now / 2016-03-25]
{key_here: value_here} - it currently parses it the same as key => val notation. I will change this to follow Javascript JSON variation: key_here considered a literal string. This might facilitate network coding working with JSON's a lot, since you've then essentially got JSON-syntax in Onyx (but strongly typed!).

I've probably forgot something, just tell me.

Multiple Assignment (and A Little Destructuring)

Multiple Assignment

Currently multiple assignment is implemented in Onyx like this:

[a, b, c, d] = [1, 2, 3, 4]  -- => a=1,b=2,c=3,d=4
[a, c, b, d] = [a+1, b, c, d+1]  -- => a=2,b=3,c=2,d=5

x = 1
y = 2
[x, y] = [y, x] -- => x = 2, y = 1

foo(x) ->
   a = 3
   {x, 2, a}

[a,b,c] = foo(1)  -- => a=1,b=2,c=3

  • For the literal source case (array style literal), it's broken out to direct
    assigns immediately.
  • The source can be any indexable (sequence, array, listish, tuple, etc.)

Thoughts

  • It would arguebly be better to do full blown deconstruction. The pro of [a,b...] = notation is that you don't have to care about what's on the right, as long as it is indexable. Perhaps at least both [...] = ad {...} = should be allowed, to enable expressing intent clearer.
  • Splats and discard should be added. This is no biggie, just a matter of getting time prioritized to code it: [a, _, ...some-b, c] = [1,2,3,4,5,6] => a=1, some-b=[3,4,5], c=6 (2 is thrown away).
    • This is where destructuring choice of notation comes into play, splat will be a sequence here, maybe we wanted array or tuple, or... Perhaps it should just default to the type on the right.

Destructuring

  • Destructuring hash(-trees) is another interesting thing to tackle.
    • {foo: a, bar: [_, b]} = {foo:"hey", bar: ["world", "baby"]} => a="hey", b="baby"
    • But then again - at some level, sugary notations like these becomes diabetes instead, and perhaps it's better for readability to just break it down in to individual accesses. - So hash-tree destructuring, has a thumbs down from me currently, even if "cool in concept".
  • Destructuring notation is important for types also, if a full blown pattern matching construct is wanted (it seems like a good idea). So this is tied in to the match case-switch-what-you-prefer-to-call-it construct too.
    • Type-destructuring differs in that there are less dynamics in place (known at compile time)

To UFCS, Do a Novel Approach, or Stay Pure OOP

UFCS (Uniform Function Call Syntax)

I'm eager to do this, but it requires lengthy discussion and then quite some work, turning some things inside and out and up and down in the compiler, so I think it's best to get a lot of input on this before (likely) implementing.

Briefly, UFCS let's you use methods as functions - syntactically - and vice versa.

type Foo
   some-val = 0  'get 'set  -- add getter and setter methods automatically

   my-method(v) ->
      @some-val = v  -- assign value to member var directly
end

my-routine(obj Foo, v) ->
   obj.some-val = v  -- set value via Foo's 'some-val=' setter method.

foo = Foo()
foo.my-method 47
my-method foo, 47
my-routine foo, 47
foo.my-routine 47

See, they all do the same thing - mutates some-val on an instance of Foo to the passed value.
The func does it via a method on Foo, because the semantic distinction is that only member funcs (methods) can reach private state in the value instance - there are no exceptions to this in Onyx.
Fret not! This is optimized away at machine code generation, and they end up with the exact same direct access machine code. Blazing!

Problems

The lookup order is the matter of concern. An instant, "being nice to legacy" idea is:

  1. method/free func first
    • method-style call? Look for best matching method, if none, look for best matching func.
    • func-style call? Vice versa

However, this means that different callables may be chosen by the compiler depending on syntax chosen. This has a bad smell to it. A stricter typed func will be unused if there's even just a slackly typed ("broadly defined") method when doing a method-style call. And vice versa.

So, the better option is:

  1. Look up methods and functions interleaved and only use overloading prioritization rules to choose which is the best candidate. If two callables has the same signature match priority, then method is preferred for method-call and vice versa. Their style of implementation is completely ignored, other than the last check, since the only difference is internal to them (access private state, or access states through methods), and as such an implementation detail beyond the interface.

There's only one thing then to consider, and that is that it doesn't break Crystal stdlib compatibility. Have to check that out. But ignoring that for a second, consider these through the "no compromise, the right thing to do (TM)" lens.

Further questions

How does instance construction sugar, functor and lambda call sugar come into the picture?

  • Should 5.Int be legal? (which would transform as such: "5.Int => Int(5) => Int.new(5)" [and in the end in machine code, probably just 5 as part of a store-instruction])
  • Should lambdas and functors in scope be taken into account!?

These questions can't just be ignored.

Some Pros

  • Implement funcs (well, procedures really) instead of methods to further separate logic from state, good for future changes of code involving more concurrency.
  • An optimal pattern, it seems without deeper analysis, would be to implement the minimum amount of methods on a type to work with state, and define other logic as free funcs. There is no compiler enforcements of any kind. Implement everything as methods or almost everything (sans data accessors) as functions - your choice in the end.
  • One single func def can cover several types horizontally, as long as the methods used match those available in type, if not strictly typing "receiver" parameter (first param).
  • More ways of expressing code, whichever looks clearer in a given situation.
  • No more guessing if a certain piece of functionality is a method or func - just go on instinct (this is more of a problem in, say Python, though).

Other cons (than "the" problem)

  • Some current method names might clash with language keywords - they won't be possible to call function style, but only method-style. This creates a small inconsistency, but a drawback that might be not too tough to swallow.

Prior Art

Currently it can be found in D and Rust. More?

As already mentioned there are different approaches to resolving, here are some papers and articles worth considering, all concerning the discussion of incorporating it in C++ (which must consider some same things as in Onyx: legacy):

What are your thoughts on this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.