Code Monkey home page Code Monkey logo

language-suggestions's People

Contributors

binto86 avatar kentgrigo avatar lpeter1997 avatar lucyelle avatar thinker227 avatar whiteblackgoose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

language-suggestions's Issues

Expression problem

Introduction

The expression problem is a topic that hopefully does not require a lot of introduction, but for completeness' sake I'll still give a short description about it. If you want to read about the problem in more detail, there are some beautifully written articles with various language examples and approaches:

The expression problem is about the hardships of adding a new data-type or a new operation, depending on the approach or paradigm you are working with. If you are writing an AST and implementing the nodes in an Object Oriented style, you might write something like:

abstract record Expr
{
    abstract string PrettyPrint();
    abstract int Eval();
}

record LitExpr(int Value) : Expr
{
    override string PrettyPrint() => Value.ToString();
    override int Eval() => Value;
}

record AddExpr(Expr Left, Expr Right) : Expr
{
    override string PrettyPrint() => $"{Left.PrettyPrint()} + {Right.PrettyPrint()}";
    override int Eval() => Left.Eval() + Right.Eval();
}

This design makes adding a new type fairly easy, you just inherit from Expr. Adding a new operation however is harder, requiring modifying the source code for every type. Let's model the problem in a more functional way with OCaml:

type expr = Lit of int
          | Add of expr * expr ;;

let rec pretty_print = function
  | Lit v -> string_of_int v
  | Add (x, y) -> (pretty_print x) ^ " + " ^ (pretty_print y) ;;

let rec eval = function
  | Lit v -> v
  | Add (x, y) -> (eval x) + (eval y) ;;

We are using sum-types - also known as Discriminated Unions for F# people - to model the expressions instead of inheritance. The operations are implemented as simply pattern-matching over all variants. This makes adding new operations super easy, we just need to write a method and match over all variants. Adding a new type however is hard, because we need to go through each operation on the codebase and handle the alternative. It looks like we have just flipped the problem. While there is no single best solution for this, there are attempts at making both kinds of extensions less painful.

Already proposed solutions

Taken from the wikipedia article, which lists many more solutions than the two presented here. I believe the two solutions listed here are the most viable and pracrical. Things like Type classes don't actually solve the problem, but go around it and hit a wall later in design.

Multimethods

Multimethods - also called multiple dispatch - are essentially a mechanism to call a method overload based on the runtime type/variant of the argument. This is different from most mainstream languages, where the called function is statically resolved based on the parameter types.

Common Lisp has multiple dispatch by default:

(defmethod eval ((a lit)) (value a))
(defmethod eval ((a add)) ((+ (eval (left a)) (eval (right a)))))

Surprisingly enough C# has multimethods with dynamic arguments:

abstract record Expr;
record Lit(int Value) : Expr;
record Add(Expr Left, Expr Right) : Expr;

static int Eval(Expr e) => Eval(e as dynamic);
static int Eval(Lit lit) => lit.Value;
static int Eval(Add add) => Eval(add.Left) + Eval(add.Right);

static void Main(string[] args) =>
    Console.WriteLine(Eval(new Add(new Add(new Lit(2), new Lit(3)), new Lit(4))));

Open classes

Tackling the problem from the other side, arbitrarily allowing class extensions can also work. This would make the Object Oriented approach feasible, adding a new operation would mean simply defining it for the class externally.

Ruby supports this and it's usually referred to as monkey patching:

# Base AST definitions

class Expr
end

class Lit < Expr
  def initialize(v)
    @value = v
  end
end

class Add < Expr
  def initialize(left, right)
    @left = left
    @right = right
  end
end

# Monkey patching a new operation

class Lit
  def eval()
    return @value
  end
end

class Add
  def eval()
    return @left.eval() + @right.eval()
  end
end

This would be very similar to C# partial classes, but Ruby allows for patching non-owned classes, not just ones from the same assembly.

What we might do

This really depends on what kind of subtyping we go with, so what I propose here builds on some assumptions about those as well. I'd say that open classes mostly work with dynamic languages - assuming we don't want crazy runtime magic -, so I'd go with explicit multimethods.

Assuming we have discriminated unions, we could explicitly mark them open for extension. Any method dealing with an open DU would essentially become a multimethod. We could do things like always enforce a default handler for them. An example:

// A non-open type handling all cases is enough
type Address = IpV4(byte, byte, byte, byte)
             | IpV6(/* no idea what ipv6 is */)
             ;

// We can write case functions to essentially pattern match on the variants
case func print((a, b, c, d): IpV4) = $"{a}.{b}.{c}.{d}";
case func print(...: IpV6) = "I still have no idea about IPv6";
// That's it, all cases handled

// Now we define an open type
open type Expr = Lit(int)
               | Add(Expr, Expr)
               ;

// Defining a case func for it means a multimethod
case func eval((v): Lit) = v;
case func eval((l, r): Add) = eval(l) + eval(r);

// A default handler is required in the module defining the multimethod for an open type
case func eval(_) = throw InvalidOperationException();

// We can extend the type externally (syntax is just made up)
extend Expr with Mul(Expr, Expr);

// Define the handler
case func eval((l, r): Mul) = eval(l) * eval(r);

[WIP] Pattern matching

So let's discuss what existing languages have, and what we want.

Potential features

1. Pattern matching over constants

This one is the simplest one. For example:

match a
| 5 -> ...
| 10 -> ...

2. Pattern matching over inheritors

Checking if a given instance is a subtype of a type:

match animal
| Dog d -> d.Bark()
| Cat c -> c.Meow()

3. Deconstruction

We may allow our types to be deconstructed for pattern matching, e. g. if type is a record or just defines this method

type Expr
type Constant = { Value: Float }
type Div = { Left: Expr, Right: Expr }
type Sqr = { Arg: Expr }

Then we could do

match expr
| Div(Sqr(a), Sqr(b)) -> Sqr(Div(a, b))
| other -> other

4. Arbitrary condition clause

If we can't express something with pattern matching rules

match expr
| Sum(Sqr(Sin(x)), Sqr(Cos(y))) when x = y -> Constant 1
| other -> other

5. Branch merge

We might have different cases, but in all of them we need the same value:

match expr
| Sum(0, x) | Sum(x, 0) -> x
| other -> other

6. Active patterns

This is the least compile-time verified but the most functional thing, because customizeable

match email
| ValidEmail(name, domain) -> "Hello, ${name} from ${domain}!"
| other -> "bad email"

7. Operators on patterns

So that we could combine our conditions:

match animal
| IBark and IFeed and not IBites goodboi -> "Good boy"
| not IBark -> "Ok-ish"
| _ -> "evil"

8. Conditional deconstruction

We may have deconstruction method which returns boolean. It is less powerful than active patterns, but tied to a type, not a separate function:

type MyStateMachine<T> =
    Deconstruction(out T res) : boolean = if valid then res = _current; return true else return false

and

moveNext stateMachine =
    match stateMachine
    | MyStateMachine(res) -> "Valid and equals ${res}"
    | MyStateMachine -> "Invalid"
    | _ -> "other subtype"

9. Equality check

Matching against non-constant instances:

type Student = { Age: int, Name: string }
let john = Student(10, "aaa")
let hehe = Student(12, "bbb")

match student
| john -> "something"
| hehe or Student(30, "aaa") -> "else"

10. Mixed active patterns/deconstructions and constant matching

So, we partly deconstruct, and partly match

match animal
| cow(color: white, name) -> "this white cow's name is ${name}"
| _ -> "hmmm"

11. Specialised operators over primitives

E. g.

match a with
| > 5 and < 10 -> ...
| >= 6 -> ...
| _ -> ...

12. Property matching

E. g.

match frog with
| { SkinColor: green } -> "regular"
| _ -> "...."

What language implements what

Consider here three languages, as well as suggested features

Feature C# Kotlin F# Fresh proposed
1. Pattern matching over constants ✔️ ✔️ ✔️ ✔️
2. Pattern matching over inheritors ✔️ ✔️ ✔️
3. Deconstruction ✔️ ✔️
4. Arbitrary condition clause ✔️ ✔️ ✔️
5. Branch merge ✔️ ✔️
6. Active patterns ✔️ ✔️
7. Operators on patterns ✔️
8. Conditional deconstruction
9. Equality check ✔️
10. Mixed active patterns/deconstructions and constant matching ✔️ ✔️ ✔️
11. Specialised operators over primitives ✔️ ✔️ ✔️
12. Property matching ✔️ ✔️

Syntax overview

C# way

a switch
{
    ... => ...,
    ... => ...,
}

C# uses when for conditional clause and var to deconstruct. For the latter, var makes it clear that a variable is newly declared, but also is a bit wordy.

I don't like that C# needs , after each case. But I like that you're not forced to () around your expression.

Kotlin

when (a) {
    ... -> ...
    ... -> ...
    else -> ...
}

else is great. Other than that, not much to discuss since it lacks most major features of pattern matching.

F#

match a with
| ... -> ...
| ... -> ...

I like | instead of surrounding curly braces. But I don't like how wordy this match a with is.

What language has what

Again, these three languages + proposal for Fresh

Feature C# Kotlin F# Fresh proposed
Keyword switch when match ... with when
Position of keyword After Before Around After
() needed? No Yes No No
Surrounding braces Yes Yes No Depends*
Branch separator , (comma) \n (new line) | (pipe) Depends*
Arrow => -> -> ->
Default case _ else _ else

Depends*: so here I suggest single-line vs non-single:

val color = animal when Frog -> green | Dog -> black | Cat -> meow | else -> nothingElseMatters

val color = animal when {
    Frog -> green
    Dog -> black
    Cat -> meow
    else -> nothingElseMatters
}

Debates

Separator

If we don't want new line significant syntax, we will have to make cases not ambiguous. So we need a separator. Adding pipe makes it a bit wordy:

val color = animal when {
    | Frog -> green
    | Dog -> black
    | Cat -> meow
    | else -> nothingElseMatters
}

And, in my opinion, inconsistent with the rest of the syntax. It looks like we mixed ML and Kotlin into something weird.

Should when be the same as other control flow constructs?

if ... {
}
while ... {
}
... when {
}

may feel a bit inconsistent too.

[WIP] Record types

Goal of the document

This document aims to describe the primary user-defined datatype for the language (what class/struct/record is in C#). While traits/typeclasses or DUs will not be outlined here, they will be briefly mentioned, as they still affect the design of the datatype we want to end up with.

What we want from the datatype

I believe that we can (or should) get away with a single construct for datatypes, even if we decide to add syntax sugar/metadata/annotations to help some of the uses. The two main ideas that should be merged are classes/structs and records.

A possible design and syntax

A few points I'd like to enforce (despite not laying out all components here, these are important for later design choices):

  • Discourage inheritance, only have a single layer of abstraction with DUs. Despite that, inheritance will have to be supported because of C# interop.
  • Discourage - or straight up disable - any logic in the constructor. The constructor should only be for setting up a consistent state for the object, nothing more. No exceptions, no networking, ... If the construction requires logic, we should encourage factory functions for those.
  • Encourage composition and a trait-based extension model.
  • Encourage the externalization of behavior from the data definition.

For an initial design we could straight up grab what Kotlin has with its data classes:

type Person(val Name: string, var Age: int);

Which would be equivalent to the following in C# (I'm intentionally not using C# records here so all behavior is explicitly shown here):

sealed class Person : IEquatable<Person>
{
    public string Name { get; }     // val
    public int Age { get; set; }    // var

    public Person(string name, int age)
    {
        this.Name = name;
        this.Age = age;
    }

    public override string ToString() =>
        $"Person({this.Name}, {this.Age})";

    public override bool Equals(object? other) =>
        this.Equals(other as Person);

    public bool Equals(Person? other) =>
           other is not null
        && this.Name.Equals(other.Name)
        && this.Age.Equals(other.Age);

    public override int GetHashCode() =>
        HashCode.Combine(this.Name, this.Age);
}

Which means that by default a Fresh record would:

  • Have read-only/read-write properties based on the listed members (them being properties should be an implementation detail, but I'd say a property would be a decent choice)
  • Have a constructor taking the listed fields and assigning them appropriately
  • ToString is defined to be a somewhat human-readable format
  • The type implements value-equality instead of referential equality
  • The type is sealed by default. Opening up for extensibility is explicit.

Constructors

The members listed after the type name form the parameter for the primary constructor. The idea is that the primary constructor is the only actual constructor for the type, any other constructor would be implemented as static factory functions. The rationale for only having a single "true" constructor is that it simplifies semantics for both the language and for users: every other constructor would have to call the primary constructor (which is a sensible rule brought in by other languages). With factory functions, they have no way to ever get to an instance without calling into the primary constructor. They also serve an important step in factoring out error handling into functions rather than constructors.

For example, implementing a factory function that constructs a Person from a JSON file could look something like:

impl Person {
    func from_json_file(path: string): Person {
        // ...
    }
}

Since there are compatibility reasons to still have multiple constructors defined on CIL level - like for serializers -, these factory functions could be marked later with an attribute to notify the compiler to generate a constructor from the function in CIL. Attributes and metadata are not defined yet, but they could look something like:

impl Person {
    #[Constructor]
    func from_json_file(path: string): Person {
        // ...
    }
}

Which would be equivalent to this in C#:

class Person
{
    // ...

    public Person(string path) { /* ... */ }
}

Calling the constructor

Calling the primary constructor would be like calling a function:

func main() {
    val person = Person("Anne", 27);
}

Calling a factory-function is the same as calling a static function, it's scoped to a type:

func main() {
    val person = Person.from_json_file("person.json");
}

Defining additional members

Additional members would go into the implementation block of the type. For example, if we want to have a function that wishes a happy birthday to our Person type, we would have it as such:

impl Person {
    func birthday(this) {
        this.Age += 1;
        Console.WriteLine("Happy birthday to " + this.Name + " who is " + this.Age + " years old!");
    }
}

A few things to point out:

  • birthday takes this explicitly, marking it a member function rather than a static one. This means that static functions simply do not take this as their first parameter.
  • this needs no type specification, it is known from the implementation block.
  • The scope of this is not implicitly accessible in the function scope, everything is accessed through this..
  • There can be multiple impl blocks defined for a type, which is important for later.

Inheritance (mainly for external types)

While I'm not a fan of having inheritance among types defined in Fresh, we need to support it for types coming from external packages to have a better shot at C# compatibility. Inheritance would be similar to how I imagine trait implementation, which would be another kind of impl block: impl <Trait/Base> for <Target-Type> { ... }. Additionally, the type would be marked open (as opposed to marking it sealed, when not wanting inheritance).

For example, let's say we want to extend Foo with our own type, Bar and override nothing:

impl Foo for Bar {
    // We override nothing, nothing to put here
}

Any overriden member should be in that impl block, and only inheritance-related members can go in there. Any unrelated operation should go into another impl block.

For example, if we want our Person type to inherit from Entity and override int GetId(), the Person type would have the following code (keeping the old functionality):

open type Person(val Name: string, var Age: int);

impl Person {
    func birthday(this) {
        this.Age += 1;
        Console.WriteLine("Happy birthday to " + this.Name + " who is " + this.Age + " years old!");
    }
}

impl Entity for Person {
    func GetId(this): int = this.Age * 123; // Why not?
}

Love/Hate/Miss list for C#

As an initial step, I thought a love/hate/wish list of C# would not be too bad. Plenty of subjectivity here, but oh well. Love/hate should not be taken too seriously, it could be called like/dislike, even nitpicks went into the hate category.

Love in C#

  • Reflection: The ability to inspect types makes writing serializers and other inspection tools super easy. And .NET has one of the most powerful reflection systems I've seen so far.
  • Good BCL: .NET languages ship with batteries included. It's not so lacking as a C++ standard library.
  • No header files: Mainly coming from the C and C++ world, it's a breath of fresh air not having to keep up 2 files for a module.
  • Source Generators: They make metaprogramming much more powerful than in C++. We can generate complex constructs from simple, declarative notations, all compile-time.
  • Inheritance-based type/pattern matching: In most languages discriminated unions are implemented as tagged unions, which is fine, as long as you don't want inheritance. Once you introduce some kind of subtyping, the two features become somewhat redundant.
  • Amazing tooling support: This is a huge one. VS and Rider are extremely helpful. Great static analysis, suggestions, code-fixes, refactorings. You can include extra analyzers just as you would include a package, not even mentioning having an API to implement custom analyzers.
  • Switch expression: After the dreadful switch statement, the switch expression is a breath of fresh air.

Hate in C#

  • Lots of legacy quirks that makes the language feel kind of dirty at some places. And they are not going away because of backwards compatibility. Some of these are:
    • Covariant arrays
    • 0 literal implicitly converts to other integral types
    • Query-based LINQ syntax
  • The amount of new constructs is getting way out of hand and it seems like adding new thing to fix old ones is becoming the trend (which really didn't work out for C++).
    • An entirely new syntax for classes (records) that add equality, hash and print. Why not add the placement-syntax to classes and allow classes to auto-implement the equality and hash? Both derive constructs and tag types exist for this and I can't believe it would have been too much work for the compiler. The new record structs will make this even worse.
    • There is readonly struct but no readonly class. These modifiers seem to fly all around randomly, applicable to some things, while not to others, also making it very inconsistent.
    • Properties are completely separated from methods for some reason.
    • If the closed hierarchies proposal hits the scene, sealed and closed will enforce the same thing but on different scopes, making the feature redundant.
  • No variadic generics: Things like a type-safe OneOf for an arbitrary amount of types is kind of doomed.
  • No const generics: Implementing an N dimensional vector or an NxM matrix means either letting the dimensions be dynamic in a runtime value - type-unsafe - or writing/generating each case - not flexible, cumbersome.
  • Interfaces really only prescribe member-level constraints: There is no way to enforce implementing an addition operator for a type, or even adding type as a required member. This often means externalizing factory functions into factory objects, using reflection, or having some janky post-construction initializer.
  • There are plenty of problems with Source Generators:
    • They can't modify existing code: As much as I love the power SGs bring to the table, not allowing modification is a major drawback. They can't for example rewrite LINQ for you or act as proper decorators. They also make you litter partial all around. There is also no plan to remove this constraint, because they pose a security concern in the way they were implemented.
    • They work on strings: This is a really-really odd solution and the experience is very poor, compared to systems like Rusts quote.
    • Source Generators and dependencies suck together. Having a Source Generator include either a generation-time or runtime dependency is insanity. All because they decided to make it piggyback on the existing analyzers framework. Look at the hoops you need to jump for just a single case. Want to include a local project? Forget it, just include the sources recursively.
    • Because of all this, a nice auto-implementation for INPC without a weaver is still unlikely.
  • There is plenty of crud when defining a type:
    • Namespaces waste indentation (C# 10 resolves this)
    • When having to implement Equals/GetHashCode it's painful and there's crud that needs to be done but the compiler won't do it for you (like overriding bool object.Equals(object? other)). Admittedly, this is not something that has to be done but I've never written code where it is not desired, hence this is likely a holdback for a 0.1% case or something.
    • Similar pattern is when I implement IComparable and I want to also have IEquatable (which is common), again there is boilerplate. Again, this is not desired for partial ordering, but most cases are not partial ordering.
  • There is covariance for overrides of base elements, but none for interface implementations (AFAIK this is getting resolved partially).
  • Default interface implementations are really bad. They almost feel like they gave us some compositional elements or traits, where you can externally define behavior, but no, only when you reference the type as the interface.
  • The new keyword feels like it was just brought in because Java had it. Why not just type out the constructor name?
  • There is no type-parameter inference, resulting in things like the snippet below, despite having both var and new() for inference. Imagine only having to type new Dictionary<_, _>() (or something even shorter) there.
public IDictionary<string, int> MyDict { get; } = new Dictionary<string, int>();
  • Implementing the dispose pattern is horrific and there is no way to enforce disposal.
  • The switch statement. It's horrible. I usually avoid it at all cost.

Miss from C#

  • Closed type hierarchies, which would bring in DUs without the warning or useless default case in switch-cases.
  • A derive macro, which would auto-implement the common patterns correctly, in place. Records could just become classes that derive equality and debug-print or something.
  • User-defined, AST-based decorators. Instead of SGs that can act anywhere as long as it's new code, it could go the other way: only let the decorator act on the code it annotates, but allow to modify it. This could be a compile-time function, very much like Rust procedural macros.
  • A better construct instead of interfaces (traits or mixins):
    • Nonmember constraints (static and type constraints), which are also usable as generic constraints.
    • Proper default implementations, maybe unchangeable ones. For example, you'd only have to implement T1+T2, and T2+T1 would be automatically implemented, and there would be no way to override that. (just an example, not necessary in this exact case)
    • Externally implementable, maybe as long as the implementer type or the implemented trait definition is owned by the user.
  • Higher-kinded-types: I'm really not sure or sold on this yet, maybe type fields in traits would resolve most needs and this feature wouldn't be justified anymore. Rust only has type fields in traits and still manages to provide a fully typed LINQ-like API.
  • Way better type inference, mainly for generic parameters and constructors.
  • Constant computations: Since SGs require compile-time code execution anyway, compile-time evaluating some expressions would really benefit at some places. Maybe speed up serialization of known types, allows a wider range of arguments for attributes to pass, ...

F# Computation Expressions expressed as higher-order macros

In a recent discussion F# Computation Expressions came up as a possible similar feature or inspiration for the language. While the feature is powerful, it does not feel like a too coherent one in itself. I think I've found a way to bring in CEs in with not only existing (right now meaning planned) features, but also making them more powerful in the process. Since CEs are kind of a niche feature, I'll try to give an overview of them before anything else. Note that I'm not an F# programmer, so I might make mistakes, I just want the general idea to go through.

What are F# Computation Expressions?

Computation Expressions is a feature that allows modeling various abstractions that span through multiple expressions/statements and changes aspects of certain language elements. According to the official documentation, it can model:

  • Non-deterministic computations
  • Asynchronous computations
  • Effectful computations
  • Generative computations

An example asynchronous computation in F#:

let fetchAndDownload url =
    async {
        let! data = downloadData url
        let processedData = processData data
        return processedData
    }

An example sequence:

let squares =
    seq {
        for i in 1..10 do
            yield i * i
    }

They boil down to the exact same feature, with the general syntax:

builder { block }

What happens in the background?

There is really no magic here. All that happens is that some language elements (like let!, return and yield) are transformed into a call to the builder object (in our previous 2 examples this would be async or seq), essentially letting us intercept them. For example the sequence example turns into something like:

let squares = {
    seq.For(1..10, (fun i -> {
        seq.Yield(i * i)
    }))
}

That's all there is to it. The computation expressions just take some of your statements/expressions, and turn them into something else, usually a call to the builder object. The complete list of what turns into what can be found in this section of the docs. For completeness's sake here, are a few examples (assuming our builder is called builder here):

  • let a in cexpr -> let a in { cexpr }
  • let! a = value in cexpr -> builder.Bind(value, (fun a -> { cexpr }))
  • yield expr -> builder.Yield(expr)
  • return expr -> builder.Return(expr)

What is this really?

CEs are nothing, but a somewhat limited tool for Aspect Oriented Programming. The feature itself jumps on certain syntax, transforms it slightly to be an interception call to a custom listener object (called a builder here). To make most common operations/structures nicer to write, some extra keywords are introduced (let and let! are different, return, return!, ...), but they just relate to a different syntactical transformation, resulting in a different method call on the builder.

Since this is purely a syntactical feature - there are no semantic rules involved -, we could try turning to an existing feature that happens to work with syntax: macros.

Declarative macros in a bit more depth

I've touched on declarative macros in #16, but now I'd like to illustrate how it could lead us to free Computation Expressions. For this I'll initially use Rust macros, but then will have to use pseudo-notation.

Step 1: Identifying structures

Our first step is to create a macro that can identify what kind of syntactic structure we encountered. This is essentially the mechanism that decides what the found structure needs to be transformed into. Writing a macro that can match all the different control-flow structures is a bit hard with declarative Rust macros because of limitations, so we will only differentiate let-bindings and expressions for now, anything else will be ignored. The macro will insert a print statement before the matched structure, but will not modify the original source in any other way:

macro_rules! identify_structure {
    // A simple let-recognizer
    (let $p:pat = $value:expr) => {
        println!("let binding with name '{}'", stringify!($p));
        let $p = $value;
    };

    // Any expression
    ($e:expr) => {
        // To keep this a balid expression, we use the fact that blocks are completely valid expressions in Rust
        {
            println!("expression '{}'", stringify!($e));
            $e
        }
    };

    // Anything else is just kept as is
    ($i:item) => {
        $i
    };
}

Note: Rust let bindings are a bit more complex than this, but our goal is not to implement a standard compliant Rust grammar recognizer. The macro system sadly doesn't expose enough tools to let us what we want to do.

Now if we try out this code:

fn main() {
    identify_structure!(let foo = 123);
    identify_structure!(if 2 > 1 { println!("2 > 1"); });
    identify_structure!(fn bar() {});
    bar();
}

This will print:

let binding with name 'foo'
expression 'if 2 > 1 { println!("2 > 1"); }'
2 > 1

You might rightfully say that this is not entirely correct, as the right side of a let binding - here it happens to be 123 - is also an expression, why did it not get logged? We can solve this by recursing with the function! Showing the modified let-handling:

    (let $p:pat = $value:expr) => {
        println!("let binding with name '{}'", stringify!($p));
        let $p = identify_structure!($value);
    };

Now the previous program prints:

let binding with name 'foo'
expression '123'
expression 'if 2 > 1 { println!("2 > 1"); }'
2 > 1

This almost solves all our needs. Sadly Rust doesn't give us tools to recurse arbitrarily in a syntax tree in declarative macros, it only allows recognizing a few syntax fragments. This will be the reason we'll have to diverge from declarative Rust macros a bit. I will not write procedural macros, as those are really verbose.

Step 2: Applying a macro to multiple parameters

Applying another macro for multiple arguments is also fairly easy in Rust. For example, we can create an apply! macro that simply applies the macro received as the first parameter to the rest of the parameters. Example:

macro_rules! forall {
    // Match a macro name, a semicolon separator and then an arbitrary amount of statements
    ($mac:ident; $( $s:stmt );*) => {
        // Unpack all matched statements...
        $(
            // Apply the macro
            $mac!($s);
        )*
    };
}

We can combine this with a simple log! macro to log all statements:

macro_rules! log {
    ($s:stmt) => {
        println!("{}", stringify!($s));
        $s;
    };
}

fn main() {
    forall!{
        log;
        let foo = 123;
        if 2 > 1 { println!("2 > 1"); }
    };
}

Output:

let foo = 123;
if 2 > 1 { println!("2 > 1"); }
2 > 1

Step 3: Recursion!

The only thing missing really is the recursion in apply!, to visit each child. This is sadly not doable in Rust declarative macros. To illustrate, it could be something like:

macro_rules! forall {
    // Match a macro name, a semicolon separator and then an arbitrary amount of nodes
    ($mac:ident; $( $n:node );*) => {
        // Unpack all matched nodes...
        $(
            // Apply the macro
            $mac!($n);
            // Recursion for each child, very much pseudocode
            forall!($mac; children!($n));
        )*
    };
}

If this worked, Rust could implement CEs right now!

Proposed feature

I hope it became pretty clear that even Rust declarative macros are almost there to give us a POC mechanism suitable for CEs. I propose that we work on our macro system to make defining CEs completely possible to implement to their full potential. This would mean that CEs wouldn't just be in the language, but could be slightly more powerful - because it would allow syntactic observations - and be a library feature, not a language feature! In my opinion this would be:

  • Pretty awesome
  • A great indicator that our metaprogramming capabilities are pretty good

Some pseudo-code how we could define our CE macro in a library:

// Recursively transforms the AST using a transformer macro
macro transform(node: AstNode, transformer: macro(AstNode)): AstNode
{
    let transformed = transformer(node);
    return transformed.WithChildren(transformed.Children.Select(c => transform(c, transformer)));
}

// Let's say the user wants to log each let binding, they'd do something like
// Yes, we are essentially writing a higher-order macro and pass in a lambda-macro parameter!
macro logged(node: AstNode) = transform(node, macro(n: AstNode) = match
{
    // Make let statement logged by sequencing a call to print and the let itself
    LetStatement l => new SequenceStatement(
        new CallStatement(/* TODO: Call */),
        l
    ),
    // Don't care
    _ => node,
});

Then we can use logged! like so:

fn main() {
    logged!{
        let a = 3;
        foo();
        let b = 4;
        bar();
    }
}

And the output would be something like:

Assigned 3 to 'a'
Assigned 4 to 'b'

Note: You might have noticed, that my proposed solution falls into an infinite recursive transformation as the LetStatement node will be nested infinitely. This can be solved by making the transformer macro a bit smarter, but I think it still demonstrates the point.

Pretty keywords

There might be a fear that the pretty keywords would be lost (like await, yield, ...) if we decide to roll with this solution. Since many features could be implemented using CEs, there would be no reason for them to be part of the syntax.

We could use the fact that semantic syntax highlighting is a thing! The CE macros simply append metadata that would tell the highlighter to highlight certain keywords in the given block. This has the advantage that only the usable extra keywords will get highlighted, unlike in F#, where all keywords are always highlighted, even if the given CE doesn't allow them.

[WIP] Properties

In this issue I'll discuss the basics of C# properties (for completeness sake) and present what different languages went with.

What are properties?

Properties are - in essence - syntax sugar for calling a function when retrieving a member with syntax obj.Member, and calling another function when assigning to that member with syntax obj.Member = value. The former is called a getter, and the latter one is called a setter. While the code they execute can be arbitrary, it usually makes sense that they manipulate the object state in some sensible manner.

For example, if a Person stores their birthday, an Age property could compute the age in years from the current time and the birthday, when using the getter, and recomputing the birthday, when using the setter.

It can also make sense to omit the setter (get-only properties) or the getter (set-only properties). Set-only properties are relatively rare, it's more common to omit the setter, which are sometimed called derived or computed values, and are a good tool for reducing data-dependency, while keeping the syntax natural to the "field-notation".

Important to remember: Properties are really just syntax sugar for method calls with two kinds of signatures. Even when they look like fields, they can be declared in interfaces/traits!

Properties in different languages

Below I'll go through how different languages integrated properties and discuss how we could adapt it to Fresh.

C#

C# has the general syntax

Type PropertyName
{
    get { /* ... */ }
    set { /* ... */ }
}

The get block has to return a value with type Type, and the setter receives an implicit parameter called value with type Type. Onitting one will make the property get/set-only.

Going back to the birthday example, this could be:

class Person
{
    private DateTime birthday;
    public int Age
    {
        get
        {
            return (DateTime.Now - this.birthday).Years;
        }
        set
        {
            this.birthday = DateTime.Now.AddYears(-value);
        }
    }
}

C# evolved and simplified a lot on the general syntax over the years. For example, one-liners can be written with =>, just like arrow-methods:

public int Age
{
    get => (DateTime.Now - this.birthday).Years;
    set => this.birthday = DateTime.Now.AddYears(-value);
}

If the property is get-only, it can be written in an even shorter form:

public int Age => (DateTime.Now - this.birthday).Years;

C# also has so-called auto-properties, where the accessed and modified field automatically generated. Examples:

public int Foo { get; } // An int field is automatically generated and returned in the getter
public string Bar { get; set; } // A string field is automatically generated and returned in the getter, assigned in the setter

The latter is too different from simply writing a public field, but it can come from an interface signature - and a fields can't. C# generally prefers auto-properties instead of public fields, at least for classes.

For some special cases - mainly UI frameworks -, C# is planning to introduce the field keyword, that can be used in not-auto-implemented properties, and will result in a backing field automatically being generated, accessed with the field keyword (proposal). This proposal simply eliminates the need to write a backing field ourselves.

Personal thoughts

C# properties are nice, but a bit too "special" for what they are. The grouped and special-cased syntax is fine, but it has some pain-points:

  • Documentation of the elements needs a new form instead of documenting it like a method.
  • The value parameter being implicit, attaching attributes onto it is either impossible, or requires special casing.

It makes sense why they went with this, as this ensures that the property truly has a unified purpose and the two methods are sort of related (at least by the fact that there is only one documentation for the getter and setter).

Integrating something like this into the language would be possible with some minor modifications. As of right now, the difference between static and nonstatic functions is that nonstatic functions take this as their first parameter. We'd either still have to introduce some keyword to annotate staticness, or modify the syntax slightly to include the parameters.

Python

Python properties are really close to their roots. They are simply methods annotated with an attribute, marking them as properties (and the attribute is a Python decorator, that is discussed already in the metaprogramming issue):

class Person:
    def __init__(self, name, birthday):
        self.name = name
        self.birthday = birthday

    # This is the getter
    @property
    def age(self):
        return (date.today() - self.birthday).year

    # This is the setter
    @age.setter
    def age(self, a):
        # I don't know the Python date lib, I made this up
        this.birthday = date.today() - date(a, 0, 0)

Personal thoughts

I believe this is very simple and elegant. Properties are nothing more, than markers on methods, and since Python methods are relatively lightweight, it's not a hassle to type it out. One huge advantage of this syntax is that everything for methods automatically becomes usable for properties too, since they are not a separate language element, but rather metadata.

Integrating something like this into the language would be pretty trivial without any nontrivial syntax modifications/considerations.

D

Properties in D before version 2 are as bare-bones as they can get. They are just getter and setter methods that you can simply use as properties.

class Person {
    // I don't know the D standard lib, made up something

    private DateTime birthday;

    public int age() {
        return (DateTime.today() - this.birthday).years();
    }

    public void age(int v) {
        // ...
    }
}

Properties after D version 2 have to be marked with an attribute:

class Person {
    // I don't know the D standard lib, made up something

    private DateTime birthday;

    @property public int age() { ... }
    @property public void age(int v) { ... }
}

Personal thoughts

The pre-version 2 alternative is a bit too implicit and not very clear. The second version is virtually identical to the Python one, all advantages apply. Both versions would be relatively easy to include in the language.

Delphi

Delphi separates the declaration of the property and the methods that are executed for the getter and setter. A property simply declares the getter and setter functions, which are written externally.

type Person = class
// Again, sorry, I don't know Delphi standard lib
private
    birthday: Date;

    function GetAge(): Integer;
    procedure SetAge(const v: Integer);

public
    property Age: Integer read GetAge write SetAge;
end;

// TODO: Implement GetAge and SetAge

Personal thoughts

A bit too verbose, but neatly connect properties back to just separate functions/methods. The advantage is that the functions are usable by themselves, they can be passed around and such.

Some syntax would have to be made up, but otherwise it's pretty simply implementable.

F#

Unsurprisingly, F# ML-ifies the property syntax of C#:

type Person() = class
    // ...

    member this.Age
        with get() = (DateTime.Today - this.birthday).Years
        and set value = this.birthday = DateTime.Now.AddYears(-value);
end

Personal thoughts

By now you know how I feel about ML-syntax, but this is a relatively positive surprise. The setter has an explicit value argument and they are both method-like declarations in their shape. Something like this could be easily added to the language.

Proposed syntaxes

Below I'll propose some possible syntaxes for the language without any additional thoughts (those are already expressed in the lang-specific cases).

Python-style

impl Person {
    #[getter]
    func age(this): int = ...;

    #[setter]
    func age(this, v: int) = ...;
}

Functions with a different keyword (D-style)

impl Person {
    prop age(this): int = ...;
    prop age(this, v: int) = ...;
}

F#-style V1

impl Person {
    prop Age: int
    {
        get(this) = ...;
        set(this, v: int) = ...;
    }

    // Getter-only
    prop Age(this): int = ...;

    // Auto-prop
    prop Age: int { get(this); set(this); }
}

F#-style V2

(Placement of this changes)

impl Person {
    prop Age(this): int
    {
        get() = ...;
        set(v: int) = ...;
    }

    // Getter-only
    prop Age(this): int = ...;

    // Auto-prop
    prop Age(this): int { get; set; }
}

Delphi-style

impl Person {
    prop Age(get: get_age, set: set_age): int;

    // TODO: Implement get_age and set_age
}

[WIP] Tracking issue for the feature-set

This issue attempts to collect all aspects of a language (by some opinionated partitioning). The number of issues is becoming significant, and we need to know what we need to work on to actually get somewhere. Feel free to add aspects that are currently left out.

General features

  • Variables
  • Functions/methods
  • Control structures
    • If-else
    • While-loop
    • For-loop
    • Match
  • Exceptions, throwing, catching

Type-system

  • Record types
  • Traits
  • Properties
  • Type-inference
  • Generics

Special features

  • Asynchronous programming
  • Generators

Metaprogramming

  • Attributes
  • Derive-metchanism
  • Macro-system

Unified currying, partial application, methods and functions

Assume we have

func(1, 2, 3)

Then we could make those valid:

1.func(2, 3)
(1, 2).func(3)
(1, 2, 3).func()
1 func (2, 3)
(1, 2) func 3
func(1, 2)(3)
func(1)(2, 3)
func(1)(2)(3)
1.func(2)(3)
(1, 2).func(3)
(1, 2).func()(3)
(1, 2).func()()()()()(3)

This is just an idea, but might add something to the final design

Help the user with "undefined identifier"

Problem

In scripting, interactive environment (including notebooks), in editors without intellisense (e. g. when you want to quickly edit a file and you don't open it from IDE) people often misspell an identifier or forget to include the namespace. Trying to compile, they get a very unhelpful error about undefined identifier.

Solution

The idea here is to help the user with suggesting what should be there instead. Two ways:

  1. First, let's check if there's a type or function with a similar identifier in the opened namespaces and suggest that
  2. Then, crawl down the opened namespaces to check if the type needs another namespace imported

Details

To determine close identifiers we could use something like Levenstein's algorithm, doesn't look like a problem. However, in the latter case it is more complicated:

  1. The compiler will need to store all BCL types' names in RAM? Then it could quickly search over them
  2. It should also crawl down over 3rd party packages. Then again, there's no mapping namespace-assembly, so at most we can do is to load those types too and search over them.

Examples

open System.Collections.Generic
val list = new Listt<T>()
- No "Listt<T>" found. Maybe you meant List<T>?
open System.Collections
val list = new List<T>()
- No "List<T>" found. Maybe forgot to open System.Collections.Generic?

[WIP] What I like and dislike about F#

While I'm still not an expert, I'll try to express my current thoughts

I dislike

  • rec. Recursion is the core of FP, it's a necessity, not addition
  • fun a b -> ... is too verbose for lambdas
  • Verbosity for member constraints
  • How weirdly classes are defined in F#. They're obviously not first-class (no pun intended), but still, guessing between interface impl and type inheritence, logic in constructor (that mess with do), get/set properties, ehhh
  • Low-level features are inconvenient: byref, pointers, function pointers (those aren't impled by the time of writing this), fixed buffers

I like

  • That there's no need to put ; between statements / elements of a list/array/etc if there's a linebreak, e. g.
let a = [ 1; 2; 3 ] 

is <=>

let a = [
    1
    2
    3
]

Which is super-convenient for building trees of anything in custom DSLs.

  • Computation expressions. It's a huge topic, but they're ... just awesome. Seq (C#'s hard-coded yield generator), Async, Task, my showcase, list/array comprehesions, custom sql builders, it's all CEs.
  • Signature of functions (e. g. int -> int -> int)
  • No parens hell (often you don't need them at all)
  • (single/double/triple) piping, forward/backward composition of functions
  • Curried functions, so that I could create a function by providing just one argument, e. g.
let tag name attrs = ...

let html = tag "html" // html is a function
  • SRTP. Member consrtaints are ugly, but better than nothing. And it's zero cost as the compiler inlines

Explicit declaration of types

Sometimes you want to specify

  • Type of local variable
  • Type of argument
  • Return type

Consider 3 ways.

C's declaration

Type, identifier:

Type localVar
Type arg
Type function(...)

ML, partially Pascal

Entity, colon, type

let localVar : Type
arg : Type
function(...) : Type

VB.NET's As

Dim localVar As Type
arg As Type
function(...) As Type

Of

val localVar of Type
arg of Type
function(...) of Type

That could also unify generics, like

type List of T

Language proposal 0.1

Edit: By the suggestion of @WhiteBlackGoose, we are reducing the scope of this proposal even further, notably we are taking out modules and visibility. For now we assume a single-file system.

This is a proposal, not a specification. Changes can still happen to almost any extent, constructive discussion is encouraged.

Goal of the document

The goal of this document is to establish features for the first prototype compiler. The reason to make such an early design document is to end the era of "design by random cloud of ideas popping up" and actually start going and expanding in a given direction. I believe this will induce more directed and more fruitful discussions down the line.

Scope of the features

The feature set does not have to be big, but it has to be stable enough not to change too much in the future. The features listed here will not be the full, planned feature set of the language, just serves as a bare-bones skeleton to fill in with further proposals.

The proposal implicitly defines the initial syntax.

A note on semicolons

I know that some are not a huge fan of semicolon terminators. For now, we use it in the syntax, because it makes many grammatical rules simpler. This doesn't mean that the final language will have to use semicolons, there are languages that got rid of them down the line without any breaking change or issue.

Primitive types

The following primitive types are supported:

  • int, which is equivalent to int32
  • uint, which is equivalent to uint32
  • int8
  • uint8
  • int16
  • uint16
  • int32
  • uint32
  • int64
  • uint64
  • bool
  • float32
  • float64
  • unit, which is roughly equivalent to the void type in C#, but is a true type in the type system, not a marker for no-return (meaning that you can for example use it as a generic parameter, or create a variable of type unit)

The naming of these types gets rid of the C heritage, which is very inconsistent among the C family. The explicit sizes make sure we don't look up docs to know integer sizes. The convenience aliases int and uint are in there for "casual" use, when the explicit size is mostly irrelevant for the developer.

Comments

Single line comments are supported with the usual starting sequence //, ending at the end of the line.

Note: Originally, comments right above a construct was considered documentation comment. For now we have removed that, as we started to feel that this could cause an "accidental" information/doc-leak. Documentation comments will be added later and will have a different syntax (probably something like ///).

Functions

The language supports free functions defined on top level. The general syntax is func <name>(<arg1>: <type1>, <arg2>: <type2>, ...): <return type> { ... }, for example:

func fib(n: int): int {
    // ...
}

For one-liner functions returning a single expression, the syntax can be shortened:

func times_two(n: int): int = n * 2;

Functions can return a value using the conventional return statement. Since blocks can return a value (see later), the following is valid and is very similar to Rust implicit returns:

func foo() = {
    bar();
    1 + 2
};

We have decided that for now this will suffice, we can make function blocks do implicit returns later, if we decide to.

Operators and precedence

The following is the precedence table for the supported operators:

Operator Description Associativity Notes
expr(args...)
expr[indices...]
Function call
Indexing
-
expr.member Member access Left-to-right
+expr
-expr
Positive
Negative
-
expr * expr
expr / expr
expr mod expr
expr rem expr
Multiplication
Division
Modulo
Remainder
Left-to-right Hopefully the keywords instead of the made up % helps disambiguate and avoid bikeshedding syntax arguments in the future.
expr + expr
expr - expr
Addition
Subtraction
Left-to-right
expr in expr
expr not in expr
expr < expr
expr > expr
expr <= expr
expr >= expr
expr == expr
expr != expr
Containment
Does not contain
Less-than
Greater-than
Less or equal
Greater or equal
Equals
Not equals
Left-to-right These operators can be chained arbitrarily, like in Python.
x < y >= z in foo is equivalent to x < y and y >= z and z in foo, all expressions evaluated at most once, short-circuiting on the first falsy value.
The elements in the chain can not be parenthesized, (x < y) == (y < x) is not equivalent to x < y == y < x!
not Logical not The placementhas changed from the usual C-way.
and Logical and Left-to-right
or Logical or Left-to-right
=
@=
Assignment
Any compound assignment
Right-to-left In @= the @ stands for the usual symbols allowed for compound assignment.

TODO: Missing all bitwise and shift operators.

A small addition to the relational operators is that (x < y) == (z < w) would evaluate as "is the result of x < y the same as z < w", while x < y == z < w is "is x < y and y == z and z < w".

in and not in would translate to a .Contains or .ContainsKey call, depending on the argument type.

The allowed list of compound operators: +, -, *, /.

The not operator precedence has changed, compared to what C does for example. Note, that the relation of the logical operator precedences has not changed, not still has the highest precedence among and, or and not. The rationale for this change is that since not isn't sticky like ! anymore, we might as well put it alongsode the rest of the logical operators, and simplify expressions like !(start <= point && point < end) - a bounds check - to be something like not start <= point < end.

Scoping rules

Lexical scoping is followed, meaning that variables defined in a scope will be visible only in that scope, or scopes nested inside that scope. By #13, arbitrary variable shadowing is allowed, meaning that function-local variables can overwrite each other, within the rules of lexical scoping.

Control flow structures

By default all control-flow structures are expressions, meaning they return a value. The most basic such structure is a nested block. They return the last non-semicolon terminated expression in the block. If there's no such expression, the return value is unit.

// Evaluates to 3
{
    foo();
    bar();
    1 + 2
}

// Evaluates to unit
{
    foo();
    bar();
}

The two branches of the if-else statement have to evaluate to the same type and the condition has to be a bool type. If the else branch is missing, an empty one is assumed, returning the unit type.

if (foo()) bar() else baz()

While loops always return unit type.

while (foo()) bar()

Variables

The variable declaration syntax is var|val <name>: <type> = <value>;. The keyword var defines a mutable, and val defines an immutable variable. For var both the type specification part : <type> and the value assignment = <value> are optional, for val the value assignment is required. This gives 4 possibilities:

  • Type specified, value specified: The specified values type has to be assignable to the specified type
  • Only type specified: No extra checks
  • Only value specified: The type is immediately inferred from the specified value
  • Nothing specified: The type will be inferred from the first use.

[WIP] Literal values

Important: Parts of this proposal depends on what we end up in the type inference issue (#42). If we end up deciding that literals always have a fixed type, then we can introduce the usual suffixes for literals. I'm personally not a fan of those, so for now, this proposal assumes that we can agree on literals being specified during inference.

Integer literals

  • Decimal integers would match the regex [0-9]+. Examples: 0, 123, 9625
  • Hexadecimal integers would match the regex 0x[0-9a-fA-F]+. Examples: 0x0, 0xbadc0fee, 0x2f5a
  • Binary integers would match the regex 0b[01]+. Examples: 0b0, 0b011101

We could introduce a separator character for large constants to make them more readable. Some languages use _ for this. The only rule would be that _ can't be the first significant digit. Examples: 12_000_000_000, 0xffff_0000, 0b1100_0000_0101_1110

Boolean literals

The keywords true and false.

Floating-point literals

They would have two forms, the normal decimal-separated form and a scientific form.

  • Decimal separated form would match the regex [0-9]+\.[0-9]+. Examples: 0.0, 0.123, 25.0, 62.73. Note that omitting either side completely is not enabled on purpose.
  • Scientific notation form would match the regex [0-9]+(\.[0-9]+)?[eE][+-]?[0-9]+. Examples: 10E3, 0.1e+4, 123.345E-12

Escape sequences

They would be enclosed in single-quotes. Escaping would be the usual \. Escape sequences would be:

  • \': Just a ". It does not have to be escaped in a string literal, but simplifies code-generation for the users. Since it's otherwise meaningless, it's essentially no effort to allow it in string literals. (inspired by C#)
  • \": Just a ". It does not have to be escaped in a character literal, but simplifies code-generation for the users. Since it's otherwise meaningless, it's essentially no effort to allow it in character literals. (inspired by C#)
  • \\: Escapes the \ to literally mean a \.
  • \[0abfnrtv]: Same as in every C-like programming language (reference)
  • TODO: How do we want Unicode escape sequences?

Character literals

They are enclosed in single-quotes ('), like in C#. Any visible character can be inside (no control characters), or an escape sequence.

String literals

They are enclosed in double-quotes ("), like in C#. Any visible character can be inside (no control characters), or an escape sequence.

Verbatim strings and string interpolation is not yet specified, that will come in a later issue. For now, I believe the default strings should allow for string interpolation, there should be no need for a separate annotation.

Issue for string interpolation is #53 .

Inequality sign syntax

We have three options.

Option 1: !=

Most popular operator in mainstream languages, no need to familirize anyone with it.

Becomes inconsistent with not operator if it's not !.

Option 2: <>

Pascal and ML-famility use it. Looks a bit more mathy.

Not formally correct though, <> reads as less or greater, which means that operands have to have total ordering defined.

Option 3: =/=

The closest equivalent to , but also the most verbose and least familiar.

Vote

👍 for !=
🚀 for <>
🎉 for =/=

Feel free to suggest other options

Variable shadowing

Introduction

Shadowing happens, when two variables are visible within the same scope, but one is inaccessible due to the others definition with the same name. For example:

int x = 0;
{
    string x = "hello";
    // Here only the string typed x is available
}
// Here we can access the integer x again

Shadowing in C#

C# only allows shadowing between field members and locals, but not between locals themselves. Meaning that the following is valid:

class Foo
{
    int x;
    public void Bar()
    {
        string x; // int x; shadowed
    }
}

But this is not:

class Foo
{
    public void Bar()
    {
        int x;
        {
            string x; // ERROR
        }
    }
}

I believe this is a very annoying constraint. It disables the useful cases for shadowing, but allows for the age-old typo:

class Foo
{
    int x;
    
    public Foo(int x_)
    {
        x = x; // Oops
    }
}

Note, that this can be solved by always requiring a this./self. prefix - like Python, Rust, ... - but that should be a design document on it's own.

Shadowing in C++

C++ is a bit less restricted, allowing nested scopes to shadow variables in the outer scope:

int x = 0;
{
    std::string x = "Hello";
}

This is useful sometimes, but can lead to some confusion. When you leave the inner scope, you have to keep in mind that you access that old variable again.

A case for shadowing same-scoped variables (how Rust does it)

I believe that if shadowing is allowed, it should allow to shadow same-scoped variables as well. I'll bring up two use-cases for it.

Type conversions

In many cases I have a pattern like this in C#:

public static Pattern LoadPattern(string fileName)
{
    string patternText = File.LoadAllText(fileName); // Need the 'Text' suffix, even though it's the same thing, but in a different type/state
    Pattern pattern = ParsePattern(fileName);
    // ...
}

After I'm done with the conversion from text to the object model, I'd probably not want to ever refer to the text. I should be able to name them the same, since they describe the same thing in a different shape:

public static Pattern LoadPattern(string fileName)
{
    string pattern = File.LoadAllText(fileName);
    Pattern pattern = ParsePattern(pattern); // Refers to the old pattern variable
    // From here it's Pattern type
    // ...
}

After the conversion, pattern is only visible as a Pattern type, hiding the string version forever.

Mutability

There are algorithms, where I calculate some helper variable (like a LUT), but then I expect it to never change again. Allowing shadowing would allow us to re-bind the value as an immutable (using the notation from #12):

var helper = ...;
// mutating helper
val helper = helper;
// from here on 'helper' is immutable

This ensures that after a certain point helper can not be mutated. Note that again, there's no scoping confusion, the mutable helper is hidden forever.

To be decided

I believe we should either completely disallow shadowing, or allow shadowing in the same scope. Both have their benefits and annoyances. I can't whole-heartedly put my vote for either, but I'm kind of leaning towards allowing arbitrary shadowing.

An alternative

If we find enough use for the same-scoped shadowing, but not the nested ones, we could even think of some unique constraints, like only allowing the same-scoped case. We could even put further restrictions, like only allowing to re-bind to make a variable immutable, but not change the shadowed type. I don't know of any language that does this. Allowing re-binding an immutable to a mutable variable might be dangerous.

[Idea] Use conventions to reduce boilerplate and number of keywords

Conventions build into the language itself could reduce boilerplate, number of keywords and make language more readable.

Supporting many conventions may sound nice, but in reality this doesn't make language more capable and different conventions can make code harder to read. While no specific style will please everyone, unified style will make different codebases more familiar to everyone and make it possible to replace some of the keywords with conventions — consequently reducing boilerplate.

An example from Dart: A leading underscore character ( _ ) indicates that a member is private. So, there is no need for a private keyword.

Language philosophy

The idea to what thins language aims to be seems to be all around in old conversations/issues. I'd like to summarize it in a short list here. It begins with the README bullets, but will expand further.

  • A generally high-level language with the aim to replace C# (do to C#, what Kotlin did to Java)
  • Keeping the good aspects of C#
    • Strongly and statically typed
    • Not an academic/research language, something that's usable by the average .NET developer
    • Readable by people coming from other languages
    • Great debugging
    • Great editor support
    • Starting to embrace a more functional-style way of programming (LINQ, pattern matching, ...)
  • It prefers generalized approaches over sugared special cases
  • It adds a bigger and more worked out feature earlier, than an underspecified one that will result in extra features to work out the missed use cases
  • Keeping C# interop as long as we are not sacrificing too much for it. If we need to give up either a great feature or complicate our interop layer, possibly do the latter. If we need to, we can help the interop layer from the C# side with analyzers + metadata.
  • Metaprogramming is a first-class citizen

Erasable nullables

The key here is to keep C# interop both backward and forward (that is, use C# API from Fresh and use Fresh API from C#) but make sure to keep the code safe and consistent.

C#'s approach

C# has a separate type for value type nullables, but has annotations for reference types. It creates some inconsistent behaviour, to be precise:

Inconsistent deconstructure

object? a = ...
if (a is not null)
    a.Method(); // works

int? a = ...
if (a is not null)
    a.Method(); // doesn't work, Nullable<int> has no method that int has

Inconsistent generics

Consider two methods:

void Method1<T>(T? a);
void Method2<T>(T? a) where T : struct;

Let's substitute T = int:

void Method1(int a);
void Method2(Nullable<int> a);

See sharplab.

Erasable nullables

Universal type

Instead of having annotations and a type for value types, I suggest having a single type. To avoid confusion, let's call it Null for now:

[Erased]
type Null<T> = T | None

C# API from Fresh

C#:

void Method1(object a);
void Method2(object? a);
void Method3(int a);
void Method4(int? a);

Fresh:

void Method1(object a);
void Method2(Null<object> a);
void Method3(int a);
void Method4(Null<int> a);

Fresh API from C#

It's exactly the same in the opposite direction. That's how it will be alised runtime-wise: present NRTs as Null<>, present Nullable<> as Null<>.

Null exists only at compile time

Null<string> s = GetString();
if (s is not null)
    return s.Split().Length;

Null<int> i = GetInt();
if (i is not null)
    return i;

Runtime-wise looks like

string s = GetString();
if (s != null)
    return s.Split().Length;

Nullable<int> i = GetInt();
if (i.HasValue)
    return i.Value;

Unresolved

As type argument

What about cases when there's no way to erase it? E. g. when used as a type argument:

void Method<T>(T a)
{
    SomeMethod<Null<T>>(a);
}

Or,

val list = List<Null<int>>(); // is it List<Nullable<int>> or List<Null<int>>
val list = List<Null<object>>(); // is it List<object?> or List<Null<object>>

Should we unconditionally materialize Null<> when used as a type argument?

Then typeof(Null<>) will work as expected.

Nested nullables

E. g.

Null<Null<int>> a = null;

Should all but the innermost materialize into Null? Should we prohibit this behaviour?

Relations with Options

Should we have explicit Option type if we manage to design a consistent nullable?

Few ideas for an unusual language

These are some of my ideas for a language with emphasis on direct control and low-level control while also offering a lot of expressiveness. All the ideas were conceptualised with a native language in mind but I'm posting here by @LPeter1997's request.

1. No reference types
All data types are value types, to pass something by reference one must take a pointer to a value type.

2. Safe pointers
Pointer arithmetic is illegal outside of unsafe blocks, pointers are also split into two categories: Nullable pointers and non-nullable pointers. (What would be the syntax to distinguish them?)

3. No visibility modifiers on fields
For abstraction, you'd write against traits instead of concrete data types, as soon as you want complete control over everything you can use the concrete type and access all its data.

4. Immutable by default
Immutability is handy but you can't expect people to tag things as immutable, making it the default avoids this problem and instead forces people to tag things as mutable when they really need mutability.

5. Traits and generics
All types can be extended from anywhere, allowing you to write generic code against traits then adding support for those traits to types you care about.
Variadic generics included, with the ability to index them, avoiding the C++ recursive mess.

6. Compile-time reflection and code-gen capability
Just steal JAI's fancy compile-time functionality. That's pretty much it.

7. Iterator APIs, similar to functional languages/LINQ
They're too handy not to have them, especially if you can afford to transform them into code similar to what a for loop would produce. Also includes the good old for (T x in Y) loop.

8. Discriminated unions and pattern matching
Pattern matching (switch expressions) would work against anything implementing a matching trait with the standard library providing a type implementing it.

9. Expression blocks
Allow wrapping blocks of code in an expression block, reducing the need for separate one-of-a-kind functions and ternaries where they're not actually needed.

An imaginary example of how such a block could look like:

bool someCondition = {
    i32 responseCode = inScopeLocal.DoSomething();
    if (responseCode != 0)
        return true;
    else
    {
        Log("Operation failed");
        return false;
    }   
};

(I can't come up with a better example, will replace it with one if I find it)

I'm probably gonna be adding more to this list and I'll be making a list of issues that will need figuring out.

A look at C# version history

I thought it would make sense to visit each version and addition in C# and collect my thoughts/personal experience on them. It might not be a very fine listing, mainly going through this version history page.

C# 1.0

Features

  • Classes
  • Structs
  • Interfaces
  • Events
  • Properties
  • Delegates
  • Operators and expressions
  • Statements
  • Attributes

My thoughts

Since this is version 1.0 and it's really bare-bones, there's not a lot to say.

Classes and structs are already differentiated which I'm not a big fan of (of course it's really important to have stack-stored and heap-stored types, but I'm wondering if this is the best one can do here). I almost never use class inheritance, I think I use inheritance the most to share functionality, but not overriding or customizing them in any way.

I've always had mixed feelings about events. On one side, it's nice sugar, on the other it's just an observer pattern so hardcoding it into the language feels like a bit of an overkill.

I really don't like how properties are conceptually differentiated from functions instead of being purely syntactic sugar for them.

I almost never wrote an attribute until Source Generators hit the scene. The rare occasion was when I actually had to write a custom serializer. But until SGs, they felt very weak, at least for the end-user (as the compiler team used them for plenty of things AFAIK).

The single control-flow structure I really, truly dislike is the switch statement. It brought over the C-style switch without trying to improve what everyone already hated there.

C# 2.0

Features

  • Generics
  • Partial types
  • Anonymous methods
  • Nullable value types
  • Iterators
  • Covariance and contravariance

Improvements:

  • Getter/setter separate accessibility
  • Method group conversions (delegates)
  • Static classes
  • Delegate inference

My thoughts

I honestly can't think of a good reason to delay generics for a 2.0 version. This meant that the ecosystem lived in type-unsafety for 3 years when working with collections and more importantly, caused a few really bad design decisions for 1.0. If anything, generics should have been there since 1.0.

I don't really care for partial types, other than the fact that they are the only way to properly work with Source Generators as of right now. I know that Forms and friends generate the UI code in a partial class, but again, I'm wondering if the best way is to make a class you can essentially paste code into from anywhere within the assembly.

The anonymous methods listed here are the old "delegate operator" style lambdas. Honestly never used them, since lambdas are in the language long ago.

I hate how nullables came to be in this language. And it just shows how the compiler essentially has to double the work to differentiate nullable values from nullable references. Defining a generic nullable field in a class that can be either a reference or a value type still doesn't work properly without introducing an extra bool flag. Note that I think explicitly nullable types are neat, both for reference and value types. I just really dislike the limitations present in the language (generics mainly).

Iterators - I'd call them generators - are awesome. They truly show how iteration of a data-structure should be defined in every language. An absolutely beautiful feature, especially when paired with LINQ.

The introduction of static classes really shows that all languages just need free-functions in one way or another.

C# 3.0

Features

  • Auto-implemented properties
  • Anonymous types
  • Query expressions
  • Lambda expressions
  • Expression trees
  • Extension methods
  • Implicitly typed local variables
  • Partial methods
  • Object and collection initializers

My thoughts

Before auto-properties, properties must have been a real pain to use, so given that they already added properties, this made a lot of sense.

Anonymous types are again, something I maybe only ever used once, before tuples hit the scene. To me they really feel redundant with the introduction of tuples.

LINQ is truly awesome, despite the overhead that I'd really like to eliminate. I really like the declarative nature of it. The operations are chosen really nicely for the most part. Query expressions on the other hand only make LINQ unreadable, and in my experience, most generally avoid the query syntax.

Expression trees show that some syntax-tree based metaprogramming is really needed in major languages. My opinion is that the way expression trees work in C# is not the best or nicest way at all. I'm looking at Rust and its shiny proc macros again!

I Believe extension methods were mainly added because of LINQ, and because the type system was too weak to express what they wanted from LINQ. HKTs might ring a bell for some.

Again, partial methods seemed like they needed it for better or easier codegen for the UI frameworks. Major dislike from me.

C# 4.0

Features

  • Dynamic binding
  • Named/optional arguments
  • Generic covariant and contravariant
  • Embedded interop types

My thoughts

dynamic is both cool and painful at the same time. My problem with it is that many that come from dynamically typed languages immediately shove everything in them, without considering that maybe C# statically model the thing they are doing.

Generic covariance and contravariance is something I rarely even consider. And to be honest it can be really useful. My pain point with it is the damn keywords. Really? in and out? How am I supposed to remember which is which?

C# 5.0

Features

  • Asynchronous members
  • Caller info attributes

My thoughts

The only exciting thing here is async/await, which is an important feature in general. I have no strong feelings about it like in LINQ, but I know it's really an essential part for IO operations now.

C# 6.0

Features

  • Static imports
  • Exception filters
  • Auto-property initializers
  • Expression bodied members
  • Null propagator
  • String interpolation
  • nameof operator
  • Index initializers
  • Await in catch/finally blocks
  • Default values for getter-only properties

My thoughts

Exception filters really feel like an overkill feature for me. This might be because of the domain I work in, but personally never used them.

Auto-property initializers and expression bodied members are finally getting rid of lot of the crud that old C# made us write. I'm a big fan.

The null-propagator is the poor mans mbind for Maybe types. I wonder if there's a better way for them.

String interpolation is simply awesome. After coming from languages without it, it finally makes writing debug-code not as much of a chore (like dumping a DOT graph file).

nameof is a really nice addition. I wish there was a way to enforce it correctly for things like ArgumentOutOfRangeException.

C# 7.0

Features

  • Out variables
  • Tuples and deconstruction
  • Pattern matching
  • Local functions
  • Expanded expression bodied members
  • Ref locals and returns
  • Discards
  • Binary Literals and Digit Separators
  • Throw expressions

My thoughts

Out variables are kind of controversial, but in my opinion the Try... pattern is not too bad, works very well with control flow. Things like if (!x.TryGetValue(...)) return false;, while (stk.TryPop(...)) ... are all around my code.

Tuples, deconstruction and pattern matching are very much welcomed. Finally some decent product-types that you don't need to write a class or struct for.

I happen to use local functions a lot, especially when dealing with recursion. Since many recursive functions require parameterization you don't usually want to expose for the initial call, this is really handy.

C# 7.1-7.3

These feel like patches for existing features, there's not much to talk about.

C# 8

Features

  • Readonly members
  • Default interface methods
  • Pattern matching enhancements:
  • Switch-expressions
  • Using declarations
  • Static local functions
  • Disposable ref structs
  • Nullable reference types
  • Asynchronous streams
  • Indices and ranges
  • Null-coalescing assignment
  • Unmanaged constructed types
  • Stackalloc in nested expressions
  • Enhancement of interpolated verbatim strings

My thoughts

Default interface methods feel very weak. The default implementation is only usable if you see the type as the interface type. For me this renders the feature almost completely useless.

Enhanced pattern matching finally allowed things like matching on a tuple of types instead instead of having to match on the tuple itself. The original pattern matching felt like only a teaser compared to what we got now.

Switch-expressions combined with the new pattern matching is simply my favorite control-flow structure in the entire language. Using them with expression bodied method feels like writing functional code!

Nullable reference types are really important and I have the feature enabled since they arrived. Along with the attributes [MaybeNullWhen(...)] and friends I essentially never have to make the analyzer shut up with !, but plenty of mistakes are caught for me.

Indices and ranges feel kind of weak, I wish ranges weren't constrained to integers. Still, a welcomed addition, many of my x[x.Count - 1] snippets are now gone.

I've personally never used the ??= operator. I didn't see it justified, but again, this can be because of the domain I work with. No idea.

C# 9

Features

  • Records
  • Init only setters
  • Top-level statements
  • Pattern matching enhancements
  • Support for code generators

My thoughts

As much as I love how I can use records, I hate them as an addition. Classes could have gotten the extra syntax or functionality instead.

Init-only setters are neat, but in my opinion they are kind of showing that people just prefer the brace-styled construction of objects over the constructor syntax.

Top-level statements are... weird. They don't really make sense to me.

Source Generators are one of the best additions, but 2 things really ruin it: No code modification and the dependency management is awful.

Expected debugging experience of decorators

Problem

Decorators modify the source code. That means hiding some potentially important behaviour from the user, which they might want to verify.

Idea

The idea is simple. In addition to regular points where the debugger can stop, it will also go over decorators: first come the argument decorators, then outer decorators (decorators of the method).

Consider the following method:

[Memoization]
fun Method([CheckNotNull] value : string) -> string
    val f = value + " a"
    f + "b"

(written in psuedo-code)

Now assume we're at this line:

Method("abc")

and we step in. Then here's what happens:

First, we may or may not debug the argument decorator

If we step in, we will get into the helper method generated per this method per this parameter

If we step over or get into Method somehow otherwise, now that all arguments are skipped, we get to the outer decorator

Now we're to debug the outer decorator

If we step in, we will get into something similar to what we've seen with argument decorators. Let's step over, and we get into the method:

[WIP] Parameter type inference

As per #10 the current idea is to disable inference for public functions. So we're going to talk about non-public ones here.

First - permit, constrain later

Whenever a type is to be inferred, it is by default unconstrained generic. All constraints try to constrain as little as possible. For example, use of members on an instance would by default try to add a member constraint.

Automatic member constraint inference

This, in my opinion, would be quite a unique language feature. E. g.

let getName f = f.Name

is equivalent to F#'s

let inline getName (f : ^T when ^T : (member Name: 'a)) : 'a =
    (^T : (member Name: 'a) f)

(see sharplab)

The same logic goes for methods:

let computeThingy a =
    a.Method(a, 16) + a.AnotherMethod()

is close F#'s

let inline computeThingy (a : ^T when ^T : (member Method: ^T * int -> 'a) and ^T : (member AnotherMethod: unit -> 'b)) (when 'a * 'b : (member op_Add: () -> c)) : 'c  =
    ('a * 'b : (member op_Add: () -> c) (^T : (member Method: ^T * int -> 'a) (a, a, 16)), (^T : (member AnotherMethod: unit -> 'b) a ))

Which is a bit wordy.

Strict modifiers by default

I believe that many older languages (C++, Java and even C#) fell into the trap that they thought people would slap on the proper, stricted modifiers each time. For example, sealed, final and const are safety-related modifiers, that makes the code better. But since the code runs just fine without them, people tend not to care that much about them.

In my opinion Rust did a lot better with the modifiers. A variable is only mutable, if it's marked mut, same with references.

Right now in a C#-like language, the following could be done:

  • Change explicit sealed to explicit open (seal by default, explicitly allow inheritance)
  • Immutable variables by default (no such concept for locals yet), explicit mutability
  • Static functions by default, member explicitly (like taking a this parameter)
  • Probably a few more I forgot

[WIP] Language proposal 0.2

Goal of the document

The goal of this document is to expand on the ideas laid out in the 0.1 proposal, to slowly work towards a complete language. Programs in the 0.1 proposal could already do simple calculations. While this proposal won't extend the capabilities much, it's working towards a language that is able to build up abstractions with user-defined types and generics.

Scope of the features

The primary topic of this proposal will be the type system:

  • Generics
  • Overloading
  • Type inference
  • User defined record types
  • For loops

Just like last time, the proposal implicitly defines the initial syntax.

Generic functions

Generic type parameters can be introduced after the function name between [...]. For example:

func second[T, U](x: T, y: U): U = y;

The rationale for leaving <> is that they are binary operators, which can really complicate the compiler in very undesirable ways - see what C++ goes through while parsing, or what Rust introduces syntactically not to make it painful. The simplest is just to take an operator that already comes in pairs, and [] is already used by languages like Scala.

Function overloading

While functions were already proposed in 0.1, function overloading was unspecified. I see no reason to disallow it, overloading should stay. A concrete function signature should bind stronger than a generic one. For example:

func foo[T](v: T): T = v; // (1)

func foo(v: int): int = v; // (2)

func main() {
    var a = foo(true); // (1) is called
    var b = foo(1); // (2) is called
}

This should be simple enough, but once subtyping comes into play, the rules might be complicated, we should keep that in mind. With the current rules we can already produce ambiguous calls - meaning it's not too hard to overcomplicate this system:

func foo[T](x: T, y: int) : int = 0; // (1)
func foo[T](x: bool, y: T): int = 0; // (2)

func main() {
    var a = foo(true, 1); // Both (1) and (2) match the 'same amount'
}

Type inference

One of the main strengths of the language should be a way stronger type-inference than what C# allows. A good example on a permissive - but not ML-level - type-inference system can be found in Rust: signatures must be fully typed, but inference can work freely in the function-local scope.

Return type inference

Functions with inline expressions should allow return-type inference:

func f1(x: int) = x; // OK, inferred to be (int) -> int
func f2() = Console.WriteLine(""); // OK, inferred to be () -> unit
func f3() { // ERROR: functions with a body need explicit return type,
            // assumed to be unit otherwise
    return 1;
}

Variable type inference

Variables can be declared without type, even when they do not get assigned a value immediately:

var x: int = 4; // OK, explicitly typed, value matches
var y = 4; // OK, inferred immediately to be int from the value
var z: int; // OK, explicitly typed
var w;
w = 1; // OK, inferred to be int from usage
var q; // ERROR: Could not infer type of the variable

Generic type argument inference

When a generic function matches the best for a function call, the generic types would be inferred, no need to specify call arguments, just like in C#:

func foo[T](v: T): T = v;

func main() {
    foo(3); // T = int
}

The generic arguments can be explicitly specified too, in case it can not be inferred (or for explicitness):

func foo[T](v: T): T = v;

func main() {
    foo[int](3);
}

Type placeholder

The _ can be used as a placeholder type, which can be useful when working with generics, only wanting to specify some of the type arguments:

func foo[T, U](a: T, b: U): T = v;

func main() {
    foo[_, bool](3, true);
}

The _ essentially means to create a type variable that will be inferred by the compiler. It can be used in any context, just like any other type.

Incomplete inference

An incomplete inference will result in an error. For example:

var q; // Without any usage of q

Any type is considered incomplete, that contains type variables.

User-defined record types

See #41 for the design documentation.

For loops

I believe a single for-each should be fine, if we can ease the range-creation a bit. Something like:

for (i in range(0, 10)) {
    WriteLine("Hello, " + i);
}

The type could be specified after the variable name, like for (i: int in range(0, 10)) .... The variable would be a val, meaning it can't be reassigned.

Under the hood, it would be desugared into a while-loop, similarly to C#:

for (i in range(0, 10)) {
    WriteLine("Hello, " + i);
}
// Becomes
var enumerator = range(0, 10).GetEnumerator();
while (enumerator.MoveNext()) {
    val i = enumerator.Current;
    WriteLine("Hello, " + i);
}

Future ideas

Ideas that came up while writing this proposal:

  • named arguments for function calls
  • named explicit generic arguments
  • optional arguments (maybe even with non-constant expressions?)
  • optional generic type arguments

[WIP] Internals of trait implementations and generic constraints

Problem

The CLR currently does not support implementing interfaces for unowned types (see roles and extensions).

The basic idea is to use attributes instead of core IL features like interface implementation and generic constraints.

Generic constraints

Here I consider only constraints to traits.

type TraitConstraint inherits Attribute
    new : (toConstrain: string,  traitName: string, traitTypeArgs: string[])

For example, assume we want to write a function which adds two generic types. For that, I implement trait CanAdd<T, T, T>, which has operator + defined. Now I want to create function add whose type argument is constrained to this trait:

add<T>: (a, b) : T x T -> T
where T : CanAdd<T, T, T>
= a + b

Then internally (or from C#) it looks like:

[TraitConstraint(toConstrain: "T", traitName: "CanAdd", traitTypeArgs: new [] {"T", "T", "T"})]
static T Add<T>(T a, T b) => /* ... */

Implementing traits for types we don't own

The attribute:

type TraitImplementation inherits Attribute
    new : (target: Type, implementation: Type)

Assume we are defining a trait. Then, we want to implement it for types we do not have control over. Then, each our trait should have a list associated with it. Each element of this list is pair - (type, implementation of the trait for this type).

Assume we have a trait, its implementation, and use:

trait CanQuack<T>
{
    Quack : T -> ()
}

impl CanQuack<T> for Foo
{
    Quack a = ()
}

quack<T> (a : T) =
where T : CanQuack<T>
=
a.Quack()

quack ( new Foo() )

Then in .NET it will look like

[FreshTrait]
[TraitImplementation(target: typeof(Foo), implementation: typeof(CanQuack_Foo_Generated<T>))]
interface CanQuack<T>
{
    public void Quack();
}

[CompilerGeneratedType]
public static class CanQuack_Foo_Generated<T>
{
    public static void Quack(Foo foo) { }
}

static void quack<T>(T a)
{
    if a has no method Quack, throw
    if it does, make sure that this type exists in CanQuack's list of implemented types
    OR the list of implemented traits contains the needed one
}

new Foo().Quack() // inlined when used within Fresh

Implementing traits we don't own for types we own

The attribute:

type TypeImplementation inherits Attribute
    new : (targetTrait: Type, typeArgs: string[])

Assume this time that we own Person, but not CanQuack<T>:

type Person = {
    name: string;
    age: int;
}

impl CanQuack<string> for Person {
    Quack(s : string) => print(s + name);
}

So in .NET it looks like

[TypeImplementation(targetTrait: typeof(CanQuack<>), typeArgs: new [] { "string" })]
record Person(string name, int age)
{
    Quack(string s) => WriteLine(s + name);
}

Interop

So far my current thought is that within Fresh we inline methods, where we use complicated generic constraints (similar to F#'s SRTP with inline).

However, we cannot force compilers of other .NET langs do the same. So when a Fresh-written method with trait constraints is invoked from, say, C#, it will somehow need to execute correctly without inlining.

To do it, from the .NET perspective these methods will have looser constraints, so that any type argument which passes in Fresh, would pass when invoking this method in C#.

Then, we will need to use reflection to determine the right method to execute (by storing a table of traits/types correspondance, though we have to refine these details later). If there's no such method (or, if there is, but the provided type does not implement the trait), then we throw an exception in runtime.

In the long run we can implement static analyzers for C# and F# which could prevent some cases of providing a bad type to a Fresh-written method.

Let ... in construct

Motivation

When discussing the syntax for the matching construct in #44, the position of the keyword came up. Should it be when (expr) { ... } or expr when { ... }? The former would be consistent with the rest of the constructs. It came up that the latter is useful when matching on long expressions, as when (this is a very long expression) { ... } is not too fluent.

This discussion gave me an idea for proposing the let ... in construct for the language, as I believe it would resolve the problem for long expressions - and keeping the keyword consistently -, make the language a bit more fluent with expressions and help in other constructs with long expressions, not just the matching construct.

The feature

The general syntax would be (keywords could change):

let name = boundExpr in evalExpr

which would be equivalent to:

{
    val name = boundExpr;
    evalExpr
}

So in essence, this would bind a value to a name, local to the expression after the in keyword.

Examples of use

Match constructs

The problem initiating this proposal about long expressions in the match construct.

func foo(x: T): U =
    let matched = here comes the extremely long expression in
    when (matched) {
        ...
    };

Shortening other constructs

You couldn't factor out the discriminant without a block otherwise (ignore that I'm working with ints).

func solve_quadratic(a: int, b: int, c: int): (int, int) =
    let disc = b * b - 4 * a * c in
    if (disc >= 0) 
        (
            (-b + sqrt(disc)) / (2 * a),
            (-b + sqrt(disc)) / (2 * a)
        )
    else (0, 0);

The former construct can be rewritten with pattern matching, but this demonstrates, that a let ... in could be reused at more places than just that "special" case for when.

To be decided

Since we already have var and val, we could decide to extend the syntax for val to support this. I wouldn't consider var for this, as mutating the bound value here feels very wrong.

We can also just decide to go with let.

Separator for conditions, loop-members, blocks

There are a few different styles we could pick when designing the syntax for the basic control flow structures. The options are in no particular order.

Option 1: Required parentheses for the condition element

This is how C does it. Allows the parser to distinguish the condition from the block, if the block braces are omitted.

if (x) y();
if (x) y(); else z();
if (x) { y(); z(); }
for (i in list) print(i);
for (i in list) { print(i); log(i + 1); }

Option 2: Required braces for the block element

This "frees up" the condition, but for separability the block always has to be explicit.

if x { y(); }
if x { y(); } else { z(); }
if x { y(); z(); }
for i in list { print(i); }
for i in list { print(i); log(i + 1); }

Option 3: Separator keywords

Separator keywords like then and do reduce the number of separator tokens, as they don't need to be symmetric like parentheses.

if x then y();
if x then y(); else z();
if x then { y(); z(); }
for i in list do print(i);
for i in list do { print(i); log(i + 1); }

Note that the block syntax and semicolon usage are still up for discussion. Personally I really like option 3 so far, even with the current syntax. { } can be thought of as the block composing operator or something.

Metaprogramming and decorators

Introduction

One of the most fundamental features we should plan out is metaprogramming. Instead of incrementally introducing small features and then giving stronger tools that can implement those features, metaprogramming could be the sandbox of new features, that can later get their built-ins in the language. Notice, that this is the exact opposite of how languages usually progress.

In my opinion most languages will eventually need to support some kind of metaprogramming, or force their users to work around it. C# for example has Fody for IL weaving, and only introduced metaprogramming in the form of Source Generators in .NET 5 and tied it to Roslyn instead of making it a language feature.

My key point: Metaprogramming should be an early, first-class citizen that embraces the creation of new language features. It should not be an answer to feature creep, where designers tried to plug the endless holes in the language and then got tired of it.

Uses of metaprogramming

Generally, metaprogramming gives us the ability to give abstractions outside the classic functional boundaries. This usually manifests itself as some kind of code generation, where we can inspect syntactic/semantic parts of the program and inject/modify the source code.

Below I'd like to show a few real-world examples that could have been solved with metaprogramming instead of a language feature.

GetHashCode and Equals

Implementing GetHashCode and Equals on a C# class is tedious and repetitive. Surely the IDE can generate the implementation, but who will make sure they are up to date, when someone adds a field? In my opinion, the new feature records are a good example, where an earlier metaprogramming system could have completely eliminated the need for records.

Null-checks

In C# we often want to null-check reference-type parameters. It's a very repetitive process:

public static void Foo(T1 a1, T2 a2, ...)
{
    if (a1 is null) throw new ArgumentNullException(nameof(a1));
    if (a2 is null) throw new ArgumentNullException(nameof(a2));
    // ...
}

A decorator could simply inject the null-checks at the start of the method. It would only need to know the parameter names and that which parameters are reference types. Again, C# has decided to introduce a new operator (!!) for it. I believe this would be completely unnecessary with a proper metaprogramming solution.

Memoization

When we are doing a heavy computation and the computation itself is side-effect free, we can simply memoize the results in a dictionary like so:

private static readonly Dictionary<(T1, T2, T3), Foo> fooResults = new();
public static Foo CalculateFoo(T1 a, T2 b, T3 c)
{
    // If the value is already computed, return that
    if (fooResults.TryGetValue((a, b, c), out var result)) return result;
    // Do some heavy computation
    ...
    // Save the result for later
    fooResults.Add((a, b, c), result);
    return result;
}

Again, this is really repetitive and could be done with decorators. In fact, Python has a really beautiful solution for this. Currently with C# Source Generators we'd require partial types and an ugly API that would have to generate a proxy function with a different name passed in by the user as a string.

Others

There are a bunch of other uses for metaprogramming, like implementing Aspect Oriented Programming designs or a parser driven by a grammar written in an attribute, like rply.

Existing metaprogramming solutions

Here I'd like to summarize the different styles of metaprogramming existing out there. Knowing the existing solutions could help us settle with one for our language. The list is partially inspired by this Wikipedia article.

Macro systems

There are a various number of macro systems out there, ranging from primitive to very sophisticated. The gist of macros is that based on some pattern, the input syntax should be mapped to some output syntax - so it's very much like functions directly on syntax.

Text-based macros

Text-based macros are a very primitive concept, and can be found in old languages like C. C macros have no semantic knowledge of the language, all they do is paste text based on their definition. For example, you can define the mathematical max function with macros like so:

#define max(a, b) ((a) > (b) ? (a) : (b))

These macros are not hygienic, meaning that you can't even use them to reliably define a variable for some intermediate value.

Hygienic macros

Hygienic macros usually describe how they match their parameters on the syntax-tree level and map the output to another syntax tree description. The "hygienic" word puts the emphasis on only describing the shape, not worrying about things like name collisions. With hygienic macros you can safely introduce helper variables without the need for a complex mechanism to generate unique names.

For example, here is a macro for Rust that creates a Vec<T> (analogous to List<T> in .NET) from a sequence of values:

// macro_rules! <name> { ... } defines a new macro
macro_rules! vec {
    // This is a match arm, this means match 0 or more comma-separated expressions, binding the expression to the name 'x'
    // A macro could have multiple of these match arms, some use it to do macro-recursion for example
    ( $( $x:expr ),* ) => {
        // Open a block, as Rust blocks are expressions, and it will result in the constructed Vec<T>
        // This is part of the output!
        {
            // This is also part of the output, just creates a temporary vector
            let mut temp_vec = Vec::new();
            // This with the matching )* at the end expands the expressions bound to the name 'x'
            $(
                // For each expression $x we write a temp_vec.push(...) to the output
                temp_vec.push($x);
            )*
            // Writing the variable here means the block will evaluate to it
            temp_vec
        }
    };
}

// Usage
let my_vec = vec![1, 2, 3];

// Which expands into
let my_vec = {
    let mut temp_vec = Vec::new();
    temp_vec.push(1);
    temp_vec.push(2);
    temp_vec.push(3);
    temp_vec
};

Nim has decided to include macros in a more template-engine-y way, essentially allowing to paste in parts of the syntax tree, like string-interpolation. This is almost like Rust procedural-macros (see below), but with batteries included. An example:

# Repeats the passed in statement twice
macro twice(arg: untyped): untyped =
  result = quote do:
    `arg`
    `arg`

# Usage
twice echo "Hello world!"

# Expands into
echo "Hello world!"
echo "Hello world!"

These kinds of macros are a huge improvement over the text-based ones. Still, declarative Rust macros can get really ugly and they still don't solve many cases.

Procedural macros

Procedural macros essentially hand over the input for a function, and let that function spit out some other token stream as a substitution. Any computation can happen in between. They allow for a lot of flexibility, but they are usually cumbersome to develop. Rust supports them, but almost all procedural macros import the syn and quote crates to help them out. Procedural macros receive and hand back a token stream, so for syntax tree parsing users have to include syn, and for code templating they use quote.

There are a lot of variations in Rust, but I'll include just one derive-style macro, that helps the user implement a custom trait. The custom trait to implement:

trait TypeName {
    fn type_name(&self) -> String;
}

The derive-macro implementation:

#[proc_macro_derive(TypeName)]
pub fn derive(input: TokenStream) -> TokenStream {
    let DeriveInput { type_name, .. } = parse_macro_input!(input);
    let name_str = type_name.to_string();
    let output = quote! {
        impl TypeName for #type_name {
            fn type_name(&self) -> String { #name_str }
        }
    };
    output.into()
}

Usage:

#[derive(TypeName)]
struct Foo { ... }

I think that these are the most flexible solutions as far as macro systems go without semantic information. They can be a bit cumbersome to write, but we can aid them with proper utilities (for example shipping something like syn and quote with the feature). Rust managed to de-mistify things like how derive works under the hood with them, which meant that the compiler had less magic to do, and allowed users to add their own extensions.

Metaclasses

Metaclasses step into the territory of multi-level modeling. Metaclasses are to classes as classes to instances. Hence, the instances of metaclasses are classes. Sadly I haven't worked with them enough to justify writing an example about them, but this page shows a really neat example in Python. They seem like an interesting concept, but using something like this would imply that we'd want to heavily build on classes or some similar language feature.

Template metaprogramming

Template metaprogramming - or TMP for short - is essentially a higher level substitution mechanism in the compiler. One of the most notable languages with TMP is C++. C++ templates can be used to do a lot of fancy compile-time computations. The wiki page is full of great examples, so I won't include any here.

The Nim language also includes templates which are really close to their macros, operating on the AST. An example log template, that only prints while debugging:

# Definition
template log(msg: string) =
  if debug: stdout.writeLine(msg)

# Usage
log("Hello, world!")

The syntax is very similar to simple procedures.

Other solutions

  • The D programming language has a bunch of metaprogramming tools: templates, string mixins and template mixins
  • JAI macros are essentially functions that are guaranteed to be inlined in the AST. This means that all language features - like named arguments - automatically work on macros as well. The only extra feature they support is that they can access variables in their surrounding scope using a special notation.
  • Also JAI allows the user to write event hooks for the compiler. The handlers will receive the AST and allows to modify them.
  • LISP metaprogramming is beautiful in general, mainly because the language is homoiconic
  • C# Source Generators are very similar to what JAI does, but on source text level. In my opinion it's fairly inconvenient, as it litters user-code with partial and doesn't allow for modifying existing code, disabling the possibility for nice decorators. Working with strings is also very inconvenient, and the alternative of using Roslyn syntax-trees is hard to integrate with the semantic information we might work with in a Source Generator. To simply put it, it lacks tooling and fatally limits itself.

Decorators

The concept of decorators as metaprogramming elements arise from the Decorator Pattern being implemented with some metaprogramming feature. Very often it allows us to decorate some entity in our program in a very declarative way - usually in the form of annotations/attributes. Since decorators are - in many cases - a desirable way to do metaprogramming, I believe it's worth a section to talk about how different languages allow defining them.

Python

I believe one of the most beautiful ways to write decorators is coming from Python:

def uppercase_decorator(function):
    def wrapper():
        result = function()
        return result.upped()
    return wrapper

@uppercase_decorator
def say_hello():
    return 'Hello, World!'

print(say_hello()) # Prints HELLO, WORLD!

In Python decorators are nothing more than functions that are invoked with @ before the entity they wrap. They receive the wrapped entity as a parameter and return the transformed entity. This requires almost no new language concepts to be introduced and is fairly simple to understand.

Rust

Rust chose to shove all the metaprogramming - including decorators - into procedural macros. As mentioned before, this is really flexible, but really tedious to work with without the extra tools (syn and quote). Another problem is that there is no semantic information at the time the macros are invoked. Still, it's powerful enough to implement the most common trait derives and many-many other things.

Derive macros in Rust are essentially procedural macros that are invoked with the annotation/attribute syntax over entities and they append to the existing entity instead of modifying it. Procedural macros of course can modify their annotated entity, but then they are called attribute macros. This is a really insignificant convenience thing, derive macros simply can be written as attribute macros.

C#

Currently C# decorators are very limited in nature, because Source Generators do not allow for code modification, only addition. This means that while most derive-style features can be implemented, things like null-checks or memoization logic can't be simply injected. Since C# doesn't allow external definitions by default, this usually means that decorated elements have to be marked partial.

What we would like for the new language

Again, I'd like to emphasise that looking at how other languages have progressed, we should really get metaprogramming in early and as close to perfect as possible. We might be able to slow down feature-creep, or not have to implement certain features at all.

I think the prettiest decorator is from Python. The only extra language feature it requires is invoking with @, which would transform the entity after it. As much as I like it, I don't yet see a way to make this work without a lot of reflection and runtime overhead for a non-duck-typing language.

Rust and Nim seem to have powerful metaprogramming capabilities instead of source transformation and generation. They both rely on being able to quote and template source code, but Nim has the advantage of having a built-in for it. If we decide to go that way, we might also want to have it as a built-in feature instead of ending up with a module that just reimplements the language syntax and everyone includes anyway.

I'm not sure about if we want semantic information to be available to macros/decorators. They can certainly be handy, but the only such system I've seen so far is C# Source Generators.

Semicolon and brace significance

Following up #25 , the purpose of { } and ; came up.

Keeping semicolons and blocks

One idea was to have it similar to OCaml/F#/Rust, and ; would essentially chain statements, and a block would evaluate the last expression without a semicolon at the end. Example:

val a = {
    f();
    g();
    12 * 2
};
// 'a' is now 24

This would be essentially identical to

f();
g();
val a = 12 * 2;

The only difference is the extra lexical scope introduced.

I'm really not against this syntax. The braces for block are fairly familiar. Not needing semicolon for returned expressions would mean that (again, based on #25 ) the if-expression example could reduce to:

var a = if x() then y() else z(); // last semicolon terminating assignment

Not keeping semicolons or blocks, newline and indentation significance

Other languages require less punctuation or none at all in this regard. That usually means that either whitespace or newlines are significant to aid parsing.

I'm not a fan of dropping them, as some punctuation does give a little visual aid. Again, I'm open for discussions about this.

Case excluding operator

So

ThingDU =
    Ok of T
    [CanExclude] Error of string 
let myInt = parse("3.4") ?error -> throw ($"Error while parsing: {error}")

Or

Langs =
    GoodLangs =
        CSharp
        FSharp
    [CanExclude] BadLangs =
        JavaScript
        CantThingOfAnyOther
var lang = GetLang().Name ?badlang -> "We don't do bad langs here"

Goto, implicit loop labels

The design of goto

There is a lot of hate for goto, but I believe there's no reason not to have it: it's a powerful feature, even if it's easy to abuse by beginners. I believe the C# syntax is pretty good here, we could stick to that:

foo: // label
    goto foo;

The usual rule would apply, you can't jump out of a function or into another function.

Break and continue

I believe break and continue are really weak features by themselves and having goto makes them completely redundant. That doesn't mean that they have no use, so I believe we could make a compromise. We could have loops implicitly define a break and continue label that the user can jump to with goto. Example:

func main() {
    for (i in Range(0, 10)) {
        if (i rem 2 == 0) goto continue;
        WriteLine("i is $i");
        if (i == 7) goto break;
    }
}

Generalizing labels

This gave me an idea: In the future, we could even generalize this to have more label expressions. For example, breaking out of a nested loop could be:

func main() {
    for (i in Range(0, 10)) {
        for (j in Range(0, 10)) {
            goto break(2);
        }
    }
}

This would have the added benefit of not having to define a label, even when you break from nested loops, which is a relatively common use for goto in C#.

Colon as operator for tuples

While we all can probably agree that tuples will be created as (a, b) or (a, b, c) etc., I suggest (as a crazy idea) operator : for creating tuples.

Definition

Type

In fact it's neither ValueTuple nor Tuple. It's ColonTuple. It's implicitly convertable from/to ValueTuple and Tuple though. And it's immutable.

The reason I want it to be a different type is because in many cases it makes sense to use what was created by colon, but not by regular tuple (for example sequences and slicing, see below).

ColonTuple<T1, T2>

ColonTuple<int, int> impl IEnumerable<int>

ColonTuple<float, float> impl IEnumerable<float>

ColonTuple<double, double> impl IEnumerable<double>

ColonTuple<T1, T2, T3>

ColonTuple<int, int, int> impl IEnumerable<int>

ColonTuple<float, float, float> impl IEnumerable<float>

ColonTuple<double, double, double> impl IEnumerable<double>

ColonTuple<T1, T2, T3, T4>

ColonTuple<T1, T2, T3, T4, T5>

ColonTuple<T1, T2, T3, T4, T5, T6>

Syntax

E. g.

// (1, 2)
val tup = 1 : 2 

// (1, 2, 3)
val tup = 1 : 2 : 3

// ((1, 2), 3)
val tup = (1 : 2) : 3

// (1, unit)
val tup = (1 : )

// (unit, unit)
val tup = ( : )

// (unit, 1, unit)
val tup = (:1:)

It's low priority, so e. g. 1 * 2 : 3 would be (1 * 2, 3) and 1 : 3.Method() would be (1, 3.Method()).

But it's higher priority than comman, so Method(1 : 3, 5) would be Method((1, 3), 5)

What problems can be solved

DSL

This definitely helps with writing GUI, css, json, and probably many more. It'd create great opportunities for whatever looks like range or map. E. g.

css [
    "margin": "5pt"
    "top": "100px"
]

Similarly,

json [
    "property": [
        "aaa": 5
    ]
]

Dictionaries

They can be created from a sequence of tuples of TKey and TValue, so

val dict = [
    "Peter": "LPeter1997"
    "WhiteBlackGoose": "WhiteBlackGoose"
]

As sequences

Numeric colon types can be interpreted as sequences!

val squares1to100 = (1 : 10) map (x -> x ** 2)
val evenSquares1to100 = (1 : 2 : 100) map (x -> x ** 2)
for i in (1 : 3 : 10) do
    print i

As ranges for slicing

We can create overloads for BCL types by adding indexers for ColonTuple:

val list = [1, 2, 3, 4, 5, 6, 7][0 : 2 : 6][2 : 3]

(what those numbers mean in particular to indexers is yet to decide)

Nominal and structural typing

Introduction

Since most popular languages expose nominal typing or duck typing, it's worth to talk about the two concepts and their uses.

Nominal typing

Nominal typing is about naming your concepts and being able to refer to them uniquely. Languages like C, C++, C#, Java, Rust and Swift are nominally typed. For example, if you want to define an interface that users can implement, you give the interface a name, and document what concept it encapsulates:

// Implementing this interface means that your type is serializable to a text format.
interface ISerializable
{
    public void Serialize(StringBuilder sb);
}

When you implement this interface for your type, you know exactly what concept it encapsulates. When you write a type, it is not enough to simply implement the members of the interface, you have to explicitly state that you implement it. For example, the following will not compile, because the interface implementation is not stated explicitly:

class Person
{
    public string Name { get; init; }
    public int Age { get; init; }

    // This is not ISerializable.Serialize, because the class does not implement ISerializable
    public void Serialize(StringBuilder sb) =>
        sb.AppendLine($"Name: {Name}, Age: {Age}");
}

static void SerializeToFile(ISerializable s) { ... }
SerializeToFile(new Person() { ... }); // ERROR: Person does not implement ISerializable

The key takeaway is that in nominal typing you attach concepts to names/entities and not to the shape/implemented methods of a type.

Structural typing

Structural typing on the other hand does not care about explicitly stated concepts, it cares about how types look, what fields or methods they have. This is similar to C++ templates, where you do not state constraints for template variables, the substitution simply fails, if a type can not be used for a template:

// No explicit constraints on the type to ensure '+' exists for two types of T
template <typename T>
T add(T x, T y) {
    return x + y;
}

add(1, 2); // Ok, substituting integers into 'add' is fine
add(true, "hello"); // Error, substitution causes an error, no proper overload found

This means that constraints on types can mostly be expressed as a set of required fields and methods we expect from a type. For example, if we want to expect a generic type in Haxe to be hashable, we can write something like:

function compareHashes<T: { function hashCode():Int; }>(a:T, b:T):Bool {
    retrun a.hashCode() == b.hashCode();
}

This does not mean that we can't name structural components, we can of course alias them. But the important thing is that for nominal typing, two different interfaces with the same set of methods will still mean different things, for structural typing two different aliases with the same set of constraints will be perfectly equivalent.

Notable examples

Thoughts for the new language

To me it looks like most languages prefer primarily nominal typing. Its advantage is also it's disadvantage: You have to explicitly implement the concept for your type. This ensures that it's not just an accidental signature match, but an explicit statement that you do indeed implement the contract. This can be a major disadvantage in languages like C#, where you can't implement an interface externally for a type.

Some languages like C# turn to structural typing because constraint-based generics can be limiting, if there are not enough constraints to express the requirements. Languages like Rust allow for external trait implementations and their generic constraints are more sophisticated, allowing to require the existence of operators and such.

I'd say that a primarily nominal typing system could be desirable as long as we don't fall into the traps mentioned above. Allowing external trait/interface implementations could already improve a lot. Even if we do allow structural typing, I really wouldn't want to have a separate concept for that. I'd either have something like Haxe anonymous structures or allow to mark an interface constraint to mean structural equality, something like:

static void SerializeToFile<T>(T s)
    where T : structural ISerializable
{
    ...
}

Edit: One thing I forgot to mention in favor of nominal typing is that it can resolve name collisions between different interfaces. If two interfaces define a member with the same signature - which can be surprisingly common -, nominal typing can give a tool to disambiguate them. In languages like C# you can just write an explicit implementation, in Rust you can just use a fully qualified syntax on invocation. For structural typing you'd require very specific names, which can be tiring, verbose and still not avoid the problem 100%.

[WIP] Lambdas

Jump to

Overview

Lambda functions (or anonymous functions) are functions or methods that have no statically resolved name, they are usually bound to a local variable or passed as an argument straight away. Their primary use in the .NET ecosystem is passing around single-use methods, like for configuration, factories or inside LINQ methods. They are also usually the primary way to write closures.

Requirements towards the feature

In addition to being very concise and readable, I believe it should not diverge from regular functions too much. C# started out with a very different mechanism, but lambdas are slowly being merged together with regular methods in terms of functionality. As we have been suggested, we might be best off by shaving our original functions to get something more compact.

Syntax

Requirements

It should be very concise and readable.

C# way

Definition:

() => 5
a => 5
(int a) => 5
(int a, string b) => 5
... => { return 5; }

Type declaration:

Func<T1, T2, ..., TOut>

Kotlin way

Definition:

{ 5 }
{ it + 5 } // "it" is the default parameter name for single-parameter lambdas
{ a: int -> 5 }
{ a: int, b: string -> 5 }
{ ... -> return 5; }

Type declaration:

(T1) -> T2
(T1, T2) -> T3
(T1) -> (T1, T2)

Implicit lambdas

As I've mentioned before, a large percentage of lambdas (mostly in LINQ) create a lot more noise around them than what they should be. Examples:

var ys = xy.Select(x => x.ToFoo()); // Calling a member function on each requires a full lambda
var zs = ns.Select(x => x + 1); // Incrementing each number requires a lambda
var ws = ns.Select(x => Compute(x, 1)); // Just because I have another constant parameter, I need a lambda

You might say that currying or partial application can solve this, and yes, they do somewhat. But a more general feature could be to promote expressions with implicitly defined lambda parameters to be lambda expressions themselves. The previous examples with a made-up syntax, where $n means the nth lambda parameter:

var ys = xy.Select($0.ToFoo());
var zs = ns.Select($0 + 1);
var ws = ns.Select(Compute($0, 1));

This is more general in the sense, that it even works with multiple arguments and they can be reordered in any way.

Note that there is still a big, unresolved problem with this idea! Namely, it is unclear what subexpressions to promote to lambdas, where the "boundaries" are. This will likely require some separator characters, like in Swift or Kotlin.

Wip...

Internals

There are three ways

  1. Using .NET's Func and Action
  2. Creating our own type similar to how it does F#
  3. Creating our own interface to allow the user to manually implement it

Using .NET's Func and Action

Pros:

  1. 100% compatibility with C#
  2. No overhead converting/invoking Fresh's delegates into C#'s (because the same type)

Cons:

  1. C# delegates so far always add overhead.
  2. Two different things: Func and Action, whereas it'd be nice to have one

Creating our own type similar to how it does F#

Pros:

  1. We can inline them ourselves somehow (?)
  2. We have more control over them (e. g. we can add source code for string representation of a function in debug mode 🤔 )
  3. Can be implicitly converted to C#'s delegates and back

Cons:

  1. Incompatible with C# directly, so conversion (and potentially every invokation) will have an overhead

Creating our own interface to allow the user to manually implement it

Pros:

  1. Performance without hacks. All we need is to auto-generate readonly lambdas in structs and constrain callees with expected function to the given interface
  2. Flexibility - the user may want to implement the method themselves for some particular reason (eliminating any overhead at all, especially if we get extensions)

Cons:

  1. Not compatible with C#
  2. No implicit operators definable either - we will to either detect this case ourselves or write extension methods (or both) to convert forth and back

Operator/infix function `implies`

Problem

Imagine you want to perform the following common pattern: see, that if something is valid, then if its something else is true.

For example, you want to filter out people whose name surely doesn't end with "mia", however some accounts don't have names, and you can't say anything about them:

string? name = ...
if (name is null || name.EndsWith("mia"))
    Exclude(name);

Definition

Syntax

If we're going to do literal logical operators, like and, or, etc., then the syntax should be implies.

Otherwise, it could be ==>, for example.

Meaning

The meaning of A implies B is (not A) or B.

The operator priority of it should be the lowest, even lower than equality: a + b == 0 implies a - b < 0 is (a + b == 0) implies (a - b < 0).

Under the hood

It could be either an operator or infix function. The concern with infix functions is precedence, since implies should have a very low priority.

Examples

1.

string? name = ...
if (name is notnull implies name.EndsWith("mia"))
    return "These aren't the droids you're looking for";

2.

if (input.IsValid implies input.Length > 3)
    ...

[WIP] Declarations and definitions

Here I want to make an overview of two things: order of files in a project, order of declared symbols in a file.

Questions

  • Order of files: should the dependency graph of files be a tree, that is, if file A depends directly or indirectly on file B, should B be prohibited from depending directly or indirectly on A?
  • Order of files: if the order matters, should the included files be listed explicitly or automatically detect the order?
  • Order of symbols: can a function reference a symbol defined below in the file? Or maybe the order is the opposite?

C#

There's no order of files in C#. Any symbol within an assembly can reference any other assuming the visibility level allows.

C# Top-level statements

In C# top-level statements the following rules hold:

  1. Any declared symbol can reference any other declared symbol (assuming the visibility allows).
  2. Entrypoint code is in the top of the file.
  3. Functions come strictly after entrypoint code.
  4. Types come strictly after functions.

F#

  1. Order of files matters. Symbols from file A can reference symbols from file B only if B is higher in the hierarchy.
  2. The included files are listed explicitly in MSBuild (it's more of a tooling question perhaps, but it quite relates to the core of the language imo, especially given that MSBuild is the dominant build system for the .net langs).
  3. Within a file, A can reference B only if B is defined above A.
  4. Functions, types and values can be defined in any order within a file.
  5. Each file has some number of modules or namespaces.

OCaml

1, 3, 4 are the same as in F#.

TODO: is it necessary to specify the order of files in OCaml default build system?

  1. OCaml is shipped with ocamlopt which can figure out the correct order of files.
  2. File = module

Rust

In Rust the order of the files doesn't matter, but they form a strong hierarchy based on the file structure. Each folder has a mod.rs, that's essentially the root of the subhierarchy. It declares and marks submodules to be exposed to the outside of this hierarchy. Anything else is basically inaccessible from the outside.

Base language, compiler, tooling

If we could work out some minimal, core functionality, we could start working on the compiler architecture. I believe that a very small set of features should be sufficient. After that we could just fork the compiler to see if we like a feature or not, slowly building up the base language (this is sort of how Rust does it, but way smaller scale).

End goal

  • A minimal set of language features that are here to stay (doesn't have to stay syntactically, but semantically shouldn't change too much)
  • A compiler that manages to produce an executable/DLL/NuGet package
  • A VS Code language server that gives basic support, like error reporting and go-to definition (with LSP)
  • (Questionable, I've never done this before) A decent debugger, also for VS Code (with DAP)

Language features

I'm thinking of implementing the bare minimum.

  • A few primitive types
    • 32 bit integer
    • a void-type (maybe call it unit and have it be a zero-sized value?)
    • bool
  • Function definition
  • Function calls
  • Variable definition
  • Return statement
  • If-else expression (let it return a value)
  • Basic integer arithmetic operators (+, -, *, /, %)

No namespaces, visibility, classes, ... This is really about connecting up major components of the compiler. This is little enough that - if needed - we can just tear it out and reimplement almost everything without hurting the architecture. Again, this is so minimal , that syntax shouldn't matter yet.

Compiler

The focus should be the architecture. Again, the features are so minimal that those almost don't even matter. The compiler should follow a query-based architecture. More on the query-based compilers here and here.

We should also lay out the proper error handling method, not throwing on the first error. It is really hard to tear out a "stop-at-first-error" design, once it's been established. The one described in the Salsa video is good enough and should work.

The queries should be designed in a way that makes implementing the basic Language Server implementation simple.

The compiler should be asynchronous and allow passing in a cancellation token from day one. This is very important for tooling.

Language Server

The OmniSharp project has an LSP implementation, we can piggyback on that. The Language Server should:

  • Report compiler errors
  • Have go-to definition support
  • Tell the type of a symbol on hover

Debugger

Again, I'm unsure about how a debugger should look for a .NET language, I'm really inexperienced with writing debuggers. I'd be really happy, if we could write a Debug Adapter (again, OmniSharp has the protocol implemented) and have:

  • Breakpoints with continue
  • List locals and their values

Wish list (WIP)

Doable things

  1. Records: F#'s DUs + C#'s records + overridable equality on subtrees
  2. Deep type argument inference (e. g. you pass A : IInterface<B> in somewhere where it's IInterface, then T = B, IInterface = A)
  3. Concision. Includes type argument inference and type inference

Non-doable things

  1. Computation expressions or some other syntactically-modified inline decorators for code itself (not only methods)
  2. Value delegates (including capturing ones - apparently, Rust can do it (in some cases of course, not in all))
  3. Member constraints
  4. Somehow design HKTs :trollface~2:
  5. Tuples and other T =.Core's types to prefer structs over classes (looking at you, F#)
  6. Union of fields (but as a wrapper over Unsafe, not via field offsets >:| )
  7. Custom operators (ternary too?)
  8. Custom literal postfixes
  9. Syntax for units
  10. Force dispose or flow-dependent analysis (that ownership thingy right?)
  11. Constant structs (this one you can imitate by casting bytes ; of course this has some limitations, like the layout should be sequential, and all its members' layouts should be sequential ; or just create it and cash, I dunno, whatever)

[WIP] Language pre-proposal 0.2

IMPORTANT: The idea of building up the proposal from nothing has been scrapped, the actual 0.2 proposal can be found here

For the second feature set proposal I'd like to try a different approach. In the previous proposal (#33) I've laid out a set of features based on previous discussions, then we started modifying features and reducing scope based on the feedback. So the proposal built up too much, then we tore away what we couldn't agree on yet. It worked out fine, but I'd like to see how it would go the other way.

This proposal starts out empty, and we should collect issues related to the features we'd like to see in the 0.2 proposal. When we see that we've reached a decent amount of work, we'll make it into a proposal and refine from there.

Pre-Proposed so far for 0.2

  • Documentation comments (issue: #37)
  • User-defined types (issue: TBD)

Nullables being Result by default!

IMAGINE

Parse(string s) : int?
    match s with
    | "" -> Error "Fucking hell, it's empty, what's wrong with you, piece of crap"
    | null -> Error "Your mom is null"
    | other -> Ok (...)

Now usage

var parsed = Parse(Console.ReadLine()) ?error -> throw (error)

Sum-types/DUs vs inheritance

There are two big agendas when it comes to subtyping: inheritance and discriminated-unions. Both formulate, that a given type has multiple alternatives that can stand instead of the base type.

In my experience:

  • Class-based inheritance is rarely useful
  • Interface/trait implementations are the primary way of polymorphism
  • The only pain-point of interfaces is that they can't provide method implementations (technically in C# they can but only as long as you hold the interface type itself)
    • This shows that we primarily want to compose objects with the same interfaces instead of having an inheritance tree, the language should embrace composition instead
  • Wanting to associate data with things like enums is very common (which is a sign of the need for DUs)
  • Many times the exact subtypes are known and there is no need for extensions

I think that DUs and some kind of inheritance should be merged, as they are not orthogonal concepts. Of course, there is the expression problem which should be considered. Both the addition of data-types and the addition of functionality should be possible.

I believe that if we provide the things below, they would solve a lot of my concerns:

  • Mixins/traits for polymorphism
  • Multimethods for dispatch
  • Allowing to close (or maybe opening up should be explicit?) a hierarchy

These together would help in:

  • There would be only one way of polymorphism, which is trait/mixin implementations
  • Mixins allow for providing implementation alongside virtual functions, which embraces composition
  • Extension of functionality can be done with multimethods, easing on the expression problem
  • Closed hierarchies still allow for proper discriminated unions
  • Mixins/traits still allow the user to external extensions

[WIP] Type inference

Goal of the document

This document describes the type-inference the language allows for and the general principle we would want to follow to make the type inference practical and predictable.

General principle

The general rule that the rules should follow are:

Type inference should not generalize further than what the user does/explicitly states.

This means, that we should not search further for a common base-type or generalize as far as possible while making the code compile, but only involve types, which the user involves specifically. We believe this makes the inference algorithm way more predictable and local to the inferred element.

Inference rules

The following subsections will describe the inference (informally) for different elements with some samples. Hopefully the examples and explanations will make the rules clear.

Return type

Functions with inline expressions should allow return-type inference and the inferred return type should be whatever the determined expression type is.

func f1(x: int) = x; // OK, inferred to be (int) -> int
func f2() = Console.WriteLine(""); // OK, inferred to be () -> unit
func f3() { // ERROR: functions with a body need explicit return type,
            // assumed to be unit otherwise
    return 1;
}

Assignment, variables

A bit formally: If a variable - without explicit type specification - is assigned values v1: T1, v2: T2, ..., vn: Tn, then the variable is inferred to have the type Tm, where Tm is one of T1, ..., Tn and for all 1 <= i <= n: Tm is a supertype of Ti. If there is no such Tm, there is a type error.

Informally, if a variable is assigned multiple values, then the variable is inferred to the type of the assigned value, that's the base (or supertype) of all other values. If none of the types are the base type of other (for example, all are different derived types of some not mentioned base type, or they are just plain incompatible) then there is a type error.

Let's use the following C# class hierarchy for the examples:

interface ICommonType {}
class Base {}
class Derived1 : Base {}
class SubDerived : Derived1 {}
class Derived2 : Base, ICommonType {}
class Derived3 : Base, ICommonType {}

Trivially, if there's a single assignment, the variable is inferred to be the type of that expression:

{
    var x;
    x = Derived2();
    var y = Derived1();
} // x: Derived2, y: Derived1

When there is a common base type assigned, that will be the inferred type:

{
    var x;
    x = Derived1();
    x = Base();
    x = Derived2();
} // x: Base

Not mentioning a common base type results in a type error:

{
    var x;
    x = Derived1();
    x = Derived2();
} // TYPE ERROR: Could not infer type of 'x'

// Correct:
{
    var x: Base;
    x = Derived1();
    x = Derived2();
}

There is no generalization, the inferred type will always be the most general mentioned type:

{
    var x;
    x = Derived1();
    x = SubDerived();
} // x: Derived1

Interfaces are also fine base types:

{
    var x: ICommonType = Derived2();
    var y;
    y = Derived2();
    y = Derived3(); // This would be an error, if there were no more assignments
    y = x; // This allows for generalizing to the interface
} // y: ICommonType

Literal types

The type of literals is always a tricky question: 12 could be int, uint, float32, float64, or any other integral type. C# greedily infers this to be int, meaning that we have to use suffixes, if we want inference to properly kick in:

var a = 1; // int
var b = 1U; // uint
var c = 1.0; // double
var d = 1.0f; // float

A system that does not greedily do this is not much more complex, but removes significant noise from the literals. There are languages that are smarter about this, so we could also try to imitate that.

An integer literal could put a constraint on the inferred type, that it has to be an integral type that is able to hold the specified value. If usage can uniquely determine the type, that's great. If not, then we can use some default, like int (which is a decent default, given that C# uses it for literals already).

Branches

Since if-else returns a value, we need to resolve ambiguities there too. Still using the hierarchy defined above, the following should be correct:

var x = if (true) Derived1() else Base(); // x: Base

But the following should not be correct, because a common base was never mentioned:

var y = if (true) Derived1() else Derived2(); // ERROR

Specifying the common base type from the outside should resolve this:

var x: Base = if (true) Derived1() else Derived2(); // Ok

func f(): Base = if (true) Derived1() else Derived2(); // Ok

Which means that type annotations always get priority, when doing inference.

Generics

TODO

Loops as expressions for sequences

It came up in a discussion that since most control structures will return a value, we could also define some sensible evaluation value for loops too. There are two semantics that are sensible enough to consider (at least the ones we could think of) + the usual, safe defaults:

Singular return value

This would be what Rust does. The loop evaluates to a single value. In Rust this is only valid in the loop construct (infinite loop) and every break has an associated return value. Here is an example.

Sequential return value

This would essentially be implicit generators. Something like this:

println(string.Join(",", for i in 0:5 do i * i));

Could print 0, 1, 4, 9, 16.

This brings up a few questions, for example will this make the loop essentially lazy? What will the following piece of code do?

val squares = for i in 0:5 do { println("a"); i * i };

Will this immediately print the 5 as or only when squares is enumerated? Wouldn't this behavior be too misleading?

Evaluate to unit

The simplest semantic would be that all loops simply evaluate to unit. This means they can still be used where expressions can (like in arrow-bodied methods), but their result is essentially meaningless.

Inference

Explicit signatures (including type parameter constraints) for public (including protected) members.

Could be inferred if the member cannot be referenced from the user's code.

Replace the . member access operator with the more grammatically correct 's operator

The . member access operator make absolutely no logical or grammatical sense as the member access operator, as . is grammatically used to end sentences. Replacing said operator with the more grammatically correct 's member access operator would make the language as a whole a lot more readable as you wouldn't have to confuse a member access operation for the end of a sentence.

someInstance.someMember

The code in modern C# above would become the following:

someInstance's someMember

As is evident, this is much more readable and makes way more grammatical sense.

[WIP] Documentation comments

In my opinion a very unloved but important part of language design would be documentation comments. We read docs all day, we love good documentation but we usually dislike writing it. We could definitely help that a little in multiple ways.

Summary of the main styles

C# uses XML, which is horrific in multiple ways:

  • Many relatively common programming symbols need to be escaped, making inserting example code really-really painful.
  • The attribute names are really cryptic in places, like cref.
  • There are way too many reference tags, like see, paramref, typeparamref, ... Damn it, let me just use see everywhere!
  • I also don't think that anyone finds XML particularly easy to read or write.

Rust uses Markdown, which has many advantages, but also a few drawbacks:

  • It's a relatively well-known format.
  • It's an extremely flexible format, allows embedding code, almost arbitrary formatting capabilities.
  • No need for separate documentation files, a module-level documentation can serve as the "module-showcase".
  • It has simplified references.
  • It lacks a bit of structure, that they solve by conventions and expect certain sections to contain certain kind of information. For example, argument documentation will be in an # Arguments section, and in there will be an enumeration of some sort.
  • Since code snippets can be inserted, the compiler is even able to execute them, so there's automatic validation of sample code.

Many languages use javadoc-style tags:

  • It's extremely lightweight and easy to read/write.
  • There's a wide variety of existing tooling.
  • Inserting example code is possible, but really annoying, as they are done with the <pre> HTML tag, bringing in the same escaping problem as with C# XML docs.

My thoughts

I think that XML definitely has to go. I really like the idea of not having to write extra documentation outside of my module, so I'm leaning towards the Rust-way, having full markdown support. But I think that the Rust format could be helped a bit with some convenience syntax. I don't think it would be a horrible thing to add javadoc tags to markdown.

Also executing code in comments is a really great way to not have outdated examples get left in the docs, I'm a fan of that.

Syntax

Since the current comment syntax is //, we could make doc comments something similar, like: ///, /!, ... Personally I'm fine with ///, but you can suggest a syntax in the comments.

val and var

About val

val allows to make a readonly field/variable/property. Similar to what Kotlin has. It does not guarantee the structure being readonly, but the value itself is. Syntax:

val a = ...

About var

var is identical to C#'s var. It is normally highlighted by the editor.

Declaration:

var a = ...

Explicit type

Define it after :

var a : string = ...
val a : string = ...

Unresolved

Should we make reassignment more explicit, like this:

a <- a + 1

or more "usual", like this:

a += 1

[WIP] What I like and dislike about Kotlin

I'm not an expert in it at all, but from what I learned

I dislike

  • Poor pattern matching (no way to deconstruct a value)
  • Cannot unbox/deconstruct a value after checking its type or nullability, e. g. assume someFunc() returns Int?, this obviously doesn't work:
if (someFunc() != null && someFunc().method())

If we followed the C# way, it'd be

if (someFunc() is { } some && some.method())

as opposed to

if (someFunc()?.let { it.method() } != null)
  • No DUs

I like

  • val and var
  • Good syntax for functions (e. g. Int -> (String, Long))
  • Enforcement of nullables
  • tailrec
  • fun (...): Type syntax
  • No need for ;
  • Lambdas have it parameter unless explicitly specified, e. g.
someMethod({ it.method() })

as opposed to what we would have in C#

someMethod({ it -> it.method() })
  • Trailing lambdas can be taken outside:
someMethod(1, 2, 3, { it.method() })

->

someMethod(1, 2, 3) { it.method() }
  • Extension methods. Unlike C#, in Kotlin you can define them wherever needed, even as local functions. It encourages extension methods over static methods. It helps discovering a lot
  • data class unifies primary constructors, public/private immutable/mutable properties (by specifying var/val or not specifying at all)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.