hkust-taco / mlscript Goto Github PK

The MLscript programming language. Functional and object-oriented; structurally typed and sound; with powerful type inference. Soon to have full interop with TypeScript!

Home Page: https://hkust-taco.github.io/mlscript

License: MIT License

Scala 83.56% CSS 0.11% HTML 0.16% Standard ML 15.29% TypeScript 0.88%

functional-programming type-inference

mlscript's People

Contributors

Stargazers

Watchers

mlscript's Issues

Support "this types"

Would actually be quite straightforward to support. Example:

class A
  method Foo: Int -> this

class B: A
  method Foo n = console.log n; this

a = B{}
a.Foo 1 // : B

b: A = a
b.Foo 1 // : A

Insert prelude code in a neat way

The code generator often needs to insert some pre-defined code snippets (temporarily named “prelude code”) in the beginning of generated code. One example is with construct (#33, #35). Another example is error function. (In current JavaScript syntax, throw x is a statement rather than expression). Apart from them, I’d also like to implement prettyPrint as a prelude functions so the generated code is more readable. Current prelude code are implemented ad-hoc. It is not extensible and introduces mutable states to JSBackend class. To solve this, we may need a helper class maintaining which prelude code are needed.

`checkRegular` returns false negatives

class Foo[A]: { value: A }
class Bar[A]: Foo[Foo[A]]
//│ Defined class Foo

The TypeDef of Bar is not registered to the context even though no error are emitted. A quick inspection reveals that checkRegular returns false in this case.

Wrong semantics of re-declarations

Look at this example first.

Code:

def x = 0
def getX () = x
def x = 1
def y = getX ()

Types and results:

val x: 0 = 0
val getX: () -> 0 = () => x
val x: 1 = 1
val y: 0 = 1 // <---- notice this

My current solution is to rebind the re-declared x to another name. For example, generated code should be like this:

let x = 0;
let getX = () => x;
let x0 = 1;
let y = getX(); // Evaluates to `0`, which matches the type.

Don't repeat previously-shown locations in origin-provenance hints

Example from shared/src/test/diff/mlscript/BadMethods.mls:

class Simple2[A]: { a: A }
    method Get: A
//│ Defined class Simple2
//│ Declared Simple2.Get: ((Simple2['A],) & {_1: Simple2['A]}) -> 'A

class Simple3[A, B]: Simple2[A]
    method Get: B
//│ ╔══[ERROR] Type mismatch in method declaration:
//│ ║  l.157: 	    method Get: B
//│ ║         	           ^^^^^^
//│ ╟── class type parameter of type `B` is not an instance of type A
//│ ║  l.156: 	class Simple3[A, B]: Simple2[A]
//│ ║         	                 ^
//│ ╟── Note: constraint arises from class type parameter:
//│ ║  l.156: 	class Simple3[A, B]: Simple2[A]
//│ ║         	              ^
//│ ╟── Note: class type parameter B is defined at: 
//│ ║  l.156: 	class Simple3[A, B]: Simple2[A]
//│ ║         	                 ^
//│ ╟──       class type parameter A is defined at: 
//│ ║  l.156: 	class Simple3[A, B]: Simple2[A]
//│ ╙──       	              ^
//│ Defined class Simple3
//│ Declared Simple3.Get: ((Simple3['A, 'B],) & {_1: Simple3['A, 'B]}) -> 'B

We could use a mutable set to remember all the locations we've shown so far, so as not to show the same location again.

`TypeVariable.nameHint` lost in constraint solving and type simplification

class Test[A]: { x: A }
    method Mth[B]: (A -> B) -> B
    method Mth[B] (f: A -> B) = f this.x
//│ Defined class Test
//│ Declared Test.Mth: Test['A] -> ('A -> 'B) -> 'B
//│ Defined Test.Mth: Test['A] -> ('A -> 'a) -> 'a

The name hint for the type parameter B in the method definition is lost. I have identified two sources of this issue:

nameHint is not preserved in extrusion

mlscript/shared/src/main/scala/mlscript/ConstraintSolver.scala

Line 392 in 9930721

val nv = freshVar(tv.prov)(lvl)

(That's a downside of providing default arguments. You are not forced to review every invocation and provide the appropriate argument...)
nameHint is not unified when simplifying polar variables
Looking at the debug output of the above code snippet (after fixing 1.), the name hint is still present in the canonical inferred type:
//│ Canon: ((test<> & {Test#A: (A31 -> A31), x: A31}) -> ((((A31 | α32) -> (α33 & B34)) & α35) -> α33))
But it is later simplified away since it only cooccurs with α33 in one polarity:
//│ [sub] α32 -> None, B34 -> None, α35 -> None

Handle class type parameters not appearing in body

mlscript/shared/src/test/diff/mlscript/Methods.mls

Lines 109 to 112 in 193deba

    
           class Simple[A] 
        
               method Get: A 
        
           //│ Defined class Simple 
        
           //│ Declared Simple.Get: simple -> nothing

Currently polymorphic class types are represented by ClassTags, with no reference to the type parameters. Type parameters not appearing in the body are thus ignored. This leads to soundness problems as type variables lose connection with such type parameters.

Forbid instantiating abstract classes

For example, this should be rejected:

class C: {}
  method M: int
c = C {}
c.M

Tooling based formatter for consistent style and better readability

Being new to the project and Scala, I found the dense code difficult to navigate and understand. By having a tooling based formatter the styling will stay consistent with good settings for readability.

I found that scala metals vs code extension natively supports formatting by creating a .scalafmt.conf. The bare minimum file with default settings contains this -

version = "3.0.5"

More settings are here.

Support explicit and inferred self types

Example:

class A
  this: { x: int }
  method Foo = this.x  // ok

class B: A  // fine, transfers the `this`-type requirement

B{} // error: does not satisfy `this`-type

class C: A { x: 0 | 1 }

C{x = 0}   // ok

class A2
  // self inferred
  method Foo = this.x  // ok

A2{}  // illegal

class B2: A2

B2{}  // illegal

class C2: A2 { x: 0 | 1 }

C2{x = 0}  // ok

Properly support tuple field accessors typing

There are currently at least two ways in which they are broken:

While they work on simple types thanks to the case in ConstraintSolver:

t = (1, 2, 3)
//│ t: (1, 2, 3,)
//│  = [ 1, 2, 3 ]

t._1
//│ res: 1
//│    = undefined

they don't work for more complex cases, because the normal forms don't know about them:

t = (1, 2, 3) with {x = 1}
// t = (1, 2, 3)
//│ t: (1, 2, 3,) & {x: 1}
//│  = Array { '0': 1, '1': 2, '2': 3, x: 1 }

t._1
//│ ╔══[ERROR] Type mismatch in field selection:
//│ ║  l.301: 	t._1
//│ ║         	^^^^
//│ ╟── expression of type `(1, 2, 3,) & {x: 1}` does not match type `{_1: ?a}`
//│ ║  l.294: 	t = (1, 2, 3) with {x = 1}
//│ ║         	     ^^^^^^^^^^^^^^^^^^^^^
//│ ╟── but it flows into reference with expected type `{_1: ?a}`
//│ ║  l.301: 	t._1
//│ ╙──       	^

We can subvert them, causing a soundness problem:

t = (1, 2, 3) with {_1 = "oops"}
//│ t: (1, 2, 3,) & {_1: "oops"}
//│  = Array { '0': 1, '1': 2, '2': 3, _1: 'oops' }

(t: (int,int,int,))._1
//│ res: int
//│    = 'oops'

Harden type simplification and subsumption checking

There is a lot we can do to make sure that type simplification does the right thing. (Currently, the result of type simplification is only used to pretty-print type inference results, and there are some known bugs, such as this.)

For example, we should probably add a "full" testing mode that makes sure the simplified type is always equivalent to the raw inferred one. We could also experiment with using simplified types in following test blocks, which might actually improve the overall type inference running time (to verify). This could be done in tests run separately, making sure the new inferred types are the same (or at least, equivalent).

After we have this, the implementation of type simplification will become significantly hardened.

Recover parametricity using a `matchable` top type

Currently, parametricity is violated because we can pattern-match on anything, including abstract types, thereby retrieving information that should not have been accessible.

We can easily fix this by requiring pattern matching scrutinees to be subtypes of some "matchable" upper bound which all concrete types subtype (but not type parameters).

This is similar to Scala 3's Matchable type (https://docs.scala-lang.org/scala3/reference/other-new-features/matchable.html).

Add Array types

MLscript already supports tuples, which in JS/TS are represented using arrays.

We should add proper (immutable for now) array types which are supertypes of tuple types.

The easiest would be to have some Array[T] with subtyping relationships like (1, 2, 3) <: Array[1 | 2 | 3].

Note for later: it would be nice if we could type spreads, as in types of shapes such as (S, T, ...U, V, W, ...X) with an arbitrary number of "spliced" types. These types would precisely type the corresponding value spreads, and they would reduce when more information becomes known about the involved types. Cc @chengluyu.

Support precise type inference for element indexing

The goal is to support precise type inference for the indexing operator a[i].

For instance, if a turns out to have type Array<int>, then a[0] should have type int | undefined; but if a[0] turns out to have a more precise type such as [int, bool, nat], then a[0] should have type int.

(Note: the ... | undefined return type for imprecise indexing type inference is in the process of being implemented in mahzoun99#1.)

The way to do this is as follows.

We should add some meta information in type variables, in addition to upper and lower bounds:

A list of "indexed in" info (A, R), which specifies what type the type variable is used to index into A and that the type resulting from this operation is R.
A list of "indexed by" info (T, R), which specifies what type the type variable is indexed into by some concrete T and that the type resulting from this operation is R.

For example, if a: ?A and i: ?I , then typing a[i] will result in creating a new type variable ?R and adding (?A, ?R) to the "indexed in" info of type variable ?I.

Then, the job of type inference is simply making sure that all ?I::(?A, ?R) and ?A::(?I, ?R) info remain consistent, by propagating concrete bounds that are added to the relevant type variables.

More specifically: when constraining a type variable with a new concrete lower bound B in the rec function of ConstraintSolver.scala, we should check:

its "indexed in" info. For each such (?A, ?R), we should add (B, ?R) to it, and while doing so we should make sure that this (B, ?R) is consistent with the existing lower bounds in ?A;
its "indexed by" info. For each such (T, ?R), we should make sure that B is consistent with it.

Example: if add a lower bound 1 | 2 to type variable I0 which has "indexed in" info of (?A0, ?R0), then I add (1 | 2, ?R0) as "indexed by" into in ?A0. Then, if [S, T, U] is a lower bound on?A0, I should look at what the index 1 | 2 yields from that tuple, in this case T and U, and add these as lower bounds in R0.

There is one subtlety to deal with, though: if the (?A, ?R) info we want to register into some ?I has a higher level than ?A, we need to extrude its components to the correct level, using the extrude method.

Add mutable assignment

The scheme is to use an invariant Ref type (or call it Mut) everywhere, where we have Ref[T] <: T, and with a Ref: 'a -> Ref['a] constructor. For example we have { x = Ref 1 } : { x: Ref[0 | 1] } <: { x: int }

The advantage over explicitly making record fields optionally mutable is to make the implementation in MLscript easy and modular. But we need to put a restriction on where these Ref types can appear. The restriction could be checked in a pass after type checking:
One can only call the Ref constructor as part of a field initialization, as in { x = Ref 1 } or as part of a local variable initialization, as in let x = Ref 1 in ....

The code generator will simply ignore these Ref constructors, as in JS all fields are mutable by default.

Support `with` construct in JS

For instance:

class C[A]: { x: A }
//│ Defined class C

c = C{x = 1}
//│ c: c & {C#A: 'A | 1 .. 'A, x: 1}

d = c with { x = "hi"; y = 2 }
//│ d: c & {C#A: 'A | 1 .. 'A, y: 2, x: "hi"}

d.x
//│ res: "hi"

To do this, I guess we should create a new object, copy over all the fields of the previous object (there is a JS method to do that but I forget its name), and set the new fields in the new object, potentially overriding old ones. We may have to also explicitly set the prototype of the new object to the same thing as the old one, so we retain class instance-ness.

MLscript-core: Add short syntax reference to the web demo

Implement JS code generation for methods

Following up for #22 after is merged.

Code generator should special-case primitive-typed patterns

Example 1

def tmp = if true then 1 else "oops"
case tmp of { int -> tmp | string -> 0 }

Current output:

Runtime error occurred:
ReferenceError: int is not defined

Example 2

def tmp = if true then 1 else "oops"
case tmp of { 1 -> 1 | "oops" -> 0 }

Current output:

Runtime error occurred:
TypeError: Right-hand side of 'instanceof' is not an object

Support `undefined`, `null`, and related operations/features

First step: support undefined

add a new syntax form UnitLit(undefinedOrNull: Bool) extends Lit
make undefined a keyword in MLParser, parsing as UnitLit(true)
make undefined a builtin type in Typer.scala
update Typer's typeTerm to assign type undefined to keyword undefined
update JS code generation in JSBackend to account for undefined

Variance analysis ignores types inferred from method definitions

@twitu I just realized that variance analysis only looks at method declarations and simply ignores definitions...

:dv
class C0[A]
  method M0 (x: A) = x
//│ VARIANCE ANALYSIS
//│ ⬤ ITERATION 1
//│ | class C0  ± A73'
//│ | | upd[+] ⊤
//│ DONE
//│ Defined class C0[±A]
//│ Defined C0.M0: C0['A] -> 'A -> 'A
//│ ╔══[WARNING] Type definition C0 has bivariant type parameters:
//│ ║  l.41: 	class C0[A]
//│ ║        	      ^^
//│ ╟── A is irrelevant and may be removed
//│ ║  l.41: 	class C0[A]
//│ ╙──      	         ^

This is obviously wrong!

Solve annoying constraints in a smarter way

Currently, the annoying resolution routine is way too eager to decompose subtyping problems into smaller ones. For example, it turns anything of the shape A <: B & C into A <: B and A <: C. However, that results in exponential complexity for solving fairly common patterns, such as pattern matching:

class A[T]: { fA: T }
class B[T]: { fB: T }
class C[T]: { fC: T }
class D[T]: { fD: T }
class E[T]: { fE: T }
class F[T]: { fF: T }
class G[T]: { fG: T }
//│ Defined class A
//│ Defined class B
//│ Defined class C
//│ Defined class D
//│ Defined class E
//│ Defined class F
//│ Defined class G

:stats
def foo x = case x of {
  | A -> x.fA
  | B -> x.fB
  | C -> x.fC
  | D -> x.fD
  | E -> x.fE
  | F -> x.fF
  | G -> x.fG
  }
def arg: A[int] | B[int] | C[int] | D[int] | E[int] | F[int] | G[int]
foo arg
//│ foo: (A[?] & {fA: 'a} | B[?] & {fB: 'a} | C[?] & {fC: 'a} | D[?] & {fD: 'a} | E[?] & {fE: 'a} | F[?] & {fF: 'a} | G[?] & {fG: 'a}) -> 'a
//│ arg: A[int] & {fA: int} | B[int] & {fB: int} | C[int] & {fC: int} | D[int] & {fD: int} | E[int] & {fE: int} | F[int] & {fF: int} | G[int] & {fG: int}
//│ res: int
//│ constrain calls: 11627
//│ annoying  calls: 77290

(This took about 10s on my machine.)

This is because pattern matching infers types of the form A | B & ~A | C & ~A & ~B | ... (internally, before the type is simplified for showing to the user), which are implicitly put into conjunctive normal form during constraint solving, as in (A | B | C | ...) & (A | ~A | C | ...) & (A | B | ~A | ...) & (A | ~A | ~A | ...) & (A | B | ~B | ...) & (A | ~A | ~B | ...) | ....

Two things to note here:

inferred types like B & ~A can actually usually be reduced to just B thanks to single-inheritance of classes, but the solver does not currently do that;
if we were matching traits, we could't actually reduce these (lack of single-inheritance restriction for traits).

Thankfully, it is fairly easy to solve these constraints in polynomial time. We just need to put both sides in DNF first to whittle down the possible matches, and only decompose constraints when tricky things appear, such as type variables in only one of the RHS conjuncts.

Generate TypeScript type definition files from MLscript modules

This would allow reusing MLscript definitions from TypeScript.

I don't foresee any major difficulty to implement this, although the handling of recursive types will need the introduction of auxiliary type alias definitions on the TS side, as TS does not support "inline" recursive types such as (int -> ‘a) as 'a.

Simplify complements

This code:

def test3 x = case x of
  { 1 -> true
  | true -> true
  | _ -> false
  }

currently infers as:

test3: (1 | true | ~1 & ~true) -> bool

We do need to infer this shape of types when typing pattern matching, because it's possible the type variables originally attached to each case do not get simplified away:

def test3 x = case x of
  { 1 -> x
  | true -> true
  | _ -> false
  }
//│ test3: (1 & 'a | true | ~1 & ~true) -> (false | true | 'a)

So this simplification needs to be done much later, after removing polar type variables.

Add JS tests

The JS code-gen has become quite non-trivial and needs to be tested. We should update the diff-tests to also compile and run the JS, probably by shelling out to node.

Make sure to import the tests documented in #30

Handle polarity of `TypeRef` in `extrude`

mlscript/shared/src/main/scala/mlscript/ConstraintSolver.scala

Line 399 in 193deba

    
           case tr @ TypeRef(d, ts) => TypeRef(d, ts.map(extrude(_, lvl, pol)))(tr.prov, tr.ctx) // FIXME pol...

ConstraintSolver.extrude currently ignores the polarity of the type arguments in TypeRefs, which is unsound. This can be fixed by implementing polarity analysis, or by expanding the TypeRef before extrusion.

JavaScript code generation backlog

When working on #53, I met a lot of things that we could improved in the future. Here are them.

Find a neat way to store query results. Requirements:
- If the initialization fails, the variable should not be declared.
- Declaration should not pollute global context.
Source
~~Implement Class.Member instance calling convention. Source~~ This syntax might be changed in the new parser.
Add timeout to stream.read() otherwise we will occasionally suffer from infinite loops. Source (Fixed in #126)
Automatically check references to unimplemented variables. Source
~~Support merging arrays in withConstruct. Source TBD.~~ The semantics of with might be changed.

Some // TODO introduced in commits from #53. Source TBD.

As for 1 Aug, 2022. The // TODOs are as follows. They are not urgent at this moment.

Translation of Blk.

mlscript/shared/src/main/scala/mlscript/JSBackend.scala

Lines 177 to 178 in ba99965

    
           // TODO: find out if we need to support this. 
        
           case _: Def | _: TypeDef => throw CodeGenError("unexpected definitions in blocks")

Consider traits when sorting classes.

mlscript/shared/src/main/scala/mlscript/JSBackend.scala

Lines 482 to 485 in ba99965

    
           topLevelScope.getType(name) match { 
        
             case S(sym: ClassSymbol) => S(sym) 
        
             case S(sym: TraitSymbol) => N // TODO: inherit from traits 
        
             case S(sym: TypeAliasSymbol) =>

Improve parentheses in code generation.

mlscript/shared/src/main/scala/mlscript/codegen/Codegen.scala

Lines 415 to 416 in ba99965

    
           // TODO: Figure out how `=>` competes with other operators. 
        
           case L(expr: JSRecord) => expr.toSourceCode.parenthesized

Load built-in symbols from Typer.

mlscript/shared/src/main/scala/mlscript/codegen/Scope.scala

Lines 23 to 26 in ba99965

    
           def this(name: Str) = { 
        
             this(name, N) 
        
             // TODO: allow types and values to have the same name 
        
             // TODO: read built-in symbols from `Typer`.

mlscript/shared/src/main/scala/mlscript/codegen/Scope.scala

Lines 45 to 51 in ba99965

    
           // TODO: add `true`, `false`, and `error` to this list 
        
           register(TypeAliasSymbol("anything", Nil, Top)) 
        
           register(TypeAliasSymbol("nothing", Nil, Bot)) 
        
           // TODO: register them in the same way as `Typer` does. 
        
           Ls("int", "number", "bool", "string", "unit") foreach { name => 
        
             register(TypeAliasSymbol(name, Nil, TypeName(name))) 
        
           }

Trait test in pattern matching has not been implemented. This can be done via Symbol.

Support inheritance in JS code-gen

class A: {}
class B: A

Unexpected error: scala.NotImplementedError: an implementation is missing

This should do the obvious thing, using JS class extension, so that it also works in pattern matching.

Make functions take multiple parameters

The goal is to allow easier interaction with JS environments.

Make FunctionType take a TupleType in the LHS instead of a general SimpleType
Interpret calls like f(a, b, c) appropriately
Interpret calls like f 1 as f(1), passing a single argument

Other possible improvements:

Named arguments, as in f(a, x = c)
Auto-currying (type-based), as in def f(a, b) = a + b; ... f 1 2 ...

Allow consuming TypeScript types and trees from MLscript

The main idea is to interface with the TypeScript compiler and leverage its parser and type checker.

The benefits would be twofold:

We can reuse TS types from TS libraries or JS libraries with existing TS bindings, such as those in the popular DefinitelyTyped project. This would be a huge boon and would make MLscript actually usable for practical, real-world coding.
We can reuse the TS parser to retrieve functions that are missing some type annotations (for example, missing the return type, or plain JS functions missing all types), and still perform type inference for them.

The second point is actually really interesting on its own. Together with #57, it would mean we could use MLscript as a standalone TypeScript type-inferencer for JavaScript (and also for partially-annotated TypeScript files).

Remove type bloat/duplication when showing class type arguments

The way class type arguments are currently expanded into type ranges is not actually sustainable and creates really ugly/bloated things, especially when recursive types are involved.

For example, see the outrageous example in Arrays.mls and here: #87 (comment).

Here is a rather minimal example:

def c: C['a] as 'a
//│ c: 'a | C[C['c .. 'd] & 'a .. 'b] as 'b
//│  = <missing implementation>

def c: C[C['a]] as 'a
//│ c: 'a | C[C['c .. 'd] .. C[C[C['e .. 'b] .. 'f] & 'a as 'e .. 'b]] as 'b
//│  = <missing implementation>

A better way would be to output properly bounded type variables in some instances.

Simplify positive intersection co-occurrences too

The following is currently not simplified:

class Some[A]
//│ Defined class Some

def f: (Some['A] & 'this) -> (Some['A] & 'this)
//│ f: (Some['A] & 'this) -> (Some['A] & 'this)
//│  = <missing implementation>

But we could simplify it to (Some['A] & 'this) -> 'this because 'this always co-occurs with Some['A] negatively.

We could also probably do the same for unions, though it would be less straightforward to implement since we put types in DNF, not CNF.

Simplification bug with methods and type parameters

Discovered in c87010f.

:NoJS

class Some[a]: { val: a }
  method Get: a
//│ Defined class Some[+a]
//│ Declared Some.Get: Some['a] -> 'a

// simplification bug here
def g x = (x.Get 0, x.Get "a")
//│ g: Some[nothing -> ('a | 'b)] -> ('b, 'a,)

g (error: Some[nothing -> 0])
//│ ╔══[ERROR] Type mismatch in application:
//│ ║  l.12: 	g (error: Some[nothing -> 0])
//│ ║        	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
//│ ╟── string literal of type `"a"` does not match type `nothing`
//│ ║  l.9: 	def g x = (x.Get 0, x.Get "a")
//│ ║       	                          ^^^
//│ ╟── Note: constraint arises from type reference:
//│ ║  l.12: 	g (error: Some[nothing -> 0])
//│ ╙──      	               ^^^^^^^
//│ res: (0, 0,) | error

// expected behavior
def g (x: Some['a]) = (x.val 0, x.val "a")
//│ g: Some["a" -> 'a & 0 -> 'b] -> ('b, 'a,)

g (error: Some[nothing -> 0])
//│ ╔══[ERROR] Type mismatch in application:
//│ ║  l.28: 	g (error: Some[nothing -> 0])
//│ ║        	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
//│ ╟── string literal of type `"a"` does not match type `nothing`
//│ ║  l.25: 	def g (x: Some['a]) = (x.val 0, x.val "a")
//│ ║        	                                      ^^^
//│ ╟── Note: constraint arises from type reference:
//│ ║  l.28: 	g (error: Some[nothing -> 0])
//│ ╙──      	               ^^^^^^^
//│ res: (0, 0,) | error

In the first definition of g using x.Get, the inferred type is erroneously printed with a nothing. This appears to be a simplification bug since actually passing something of the function parameter type to g results in an error. The error message also refers to type "a", which does not appear in the printed type of g. The expected behavior should be identical to the second definition of g using x.val.

Don't generate ill-formed TS polymorphic types

Currently we use parameter lists in places which are not valid in TS.

Example:

:ts
let t = { a = 5; b = "world"} in if t.a == 1 then 1 else (fun f -> f.b )
//│ res: {b: 'a} -> 'a | 1
//│ // start ts
//│ export declare const res<a>: (arg1: {b: a}) => a | 1
//│ // end ts

We should just fail TS generation here, or widen the type until it's representable.

Sound and improved treatment of polymorphism and type variables

We should not generalize non-function values, which is unsound in the presence of mutation.

Also, it would be good to handle all 'a variables as rigid and have a different ?a syntax for flexible ones.

Also, we should be able to refer to the same variables across type annotations in a given def. So we should store type variables in the typing context, not just in an inner map of typeType2.

Invalid names in `TypeDef`s

Should we only process TypeDefs with valid (i.e. capitalized and distinct) names?

mlscript/shared/src/test/diff/mlscript/BadTypes.mls

Lines 3 to 7 in 0723a1a

    
           class oops: { x: int } 
        
           //│ ╔══[ERROR] Type names must start with a capital letter 
        
           //│ ║  l.3: 	class oops: { x: int } 
        
           //│ ╙──     	      ^^^^ 
        
           //│ Defined class oops

Fix the handling of free type variables in type definitions

As exposed in 2d12cba, these are currently unsound.

We could either forbid them or make them into implicit type parameters behind the scenes.

Emit error for duplicate type parameters

class Test[A, A]: { x: A }
//│ Defined class Test

t = Test { x = 42 }
//│ t: Test['A | 'A0 .. 42 & 'A | 'A & 'A0, 'A | 'A0 .. 42 & 'A | 'A & 'A0] with {x: 42}

t : Test[bool, int]
//│ res: Test[bool | int .. nothing, bool | int .. nothing]

t : Test[int, bool]
//│ ╔══[ERROR] Type mismatch in type ascription:
//│ ║  l.10: 	t : Test[int, bool]
//│ ║        	^
//│ ╟── expression of type `42` does not match type `bool`
//│ ║  l.4: 	t = Test { x = 42 }
//│ ║       	               ^^
//│ ╟── but it flows into reference with expected type `Test[int, bool]`
//│ ║  l.10: 	t : Test[int, bool]
//│ ║        	^
//│ ╟── Note: constraint arises from type reference:
//│ ║  l.10: 	t : Test[int, bool]
//│ ║        	              ^^^^
//│ ╟── from record type:
//│ ║  l.1: 	class Test[A, A]: { x: A }
//│ ╙──     	                  ^^^^^^^^
//│ res: Test[bool | int .. nothing, bool | int .. nothing]

Currently the last one takes precedence, and the output contains duplicate type variables. (It breaks surprisingly few things besides ascription.)

NodeJS REPL integration swallows errors

As we've seen before, such as here: #65 (comment)

When there is a top-level syntax error in the NodeJS REPL, nothing in the test output indicates that anything is wrong. Even the :ShowREPL option does not show the syntax error output.

We should have a way of more effectively debugging such issues. Ideally by properly reporting the error, though that may be annoying to implement. At the very least, :ShowREPL should let us know what's going on by printing all the NodeJS output.

Support recursive definitions in JS

Currently, recursive definitions don't work:

:js
rec def foo =
  let _ = console.log "hey" in foo
//│ // Query 1
//│ let foo1;
//│ foo1 = (function (_) {
//│   return foo;
//│ })(console.log("hey"));
//│ res = foo1;
//│ // End of generated code
//│ foo: nothing
//│    = [Function: foo]

:js
rec def foo() =
  let _ = console.log "hey" in foo()
//│ // Query 1
//│ let foo2;
//│ foo2 = () => (function (_) {
//│   return foo1();
//│ })(console.log("hey"));
//│ res = foo2;
//│ // End of generated code
//│ foo: () -> nothing
//│    = [Function: foo2]

Weirdly, sometimes they do seem to work:

// TODO define concat
:e
def concat: string -> string -> string
concat s t = s + t
//│ concat: string -> string -> string
//│ int -> int -> int
//│   <:  concat:
//│ string -> string -> string
//│ ╔══[ERROR] Type mismatch in def definition:
//│ ║  l.56: 	concat s t = s + t
//│ ║        	^^^^^^^^^^^^^^^^^^
//│ ╟── expression of type `string` does not match type `int`
//│ ║  l.55: 	def concat: string -> string -> string
//│ ║        	            ^^^^^^
//│ ╟── Note: constraint arises from reference:
//│ ║  l.56: 	concat s t = s + t
//│ ╙──      	             ^
//│ ╔══[ERROR] Type mismatch in def definition:
//│ ║  l.56: 	concat s t = s + t
//│ ║        	^^^^^^^^^^^^^^^^^^
//│ ╟── expression of type `int` does not match type `string`
//│ ║  l.56: 	concat s t = s + t
//│ ║        	             ^^^^^
//│ ╟── but it flows into function of type `?a -> ?b`
//│ ║  l.56: 	concat s t = s + t
//│ ║        	         ^^^^^^^^^
//│ ╟── which does not match type `string -> string`
//│ ╟── Note: constraint arises from type reference:
//│ ║  l.55: 	def concat: string -> string -> string
//│ ╙──      	                                ^^^^^^
//│       = [Function: concat]

:js
rec def test n acc =
  if n == 0 then acc
  else concat (test (n - 1) (concat acc "+")) "!"
//│ // Query 1
//│ let test;
//│ test = (n) => (acc) => (n == 0 ? acc : concat(test(n - 1)(concat(acc)("+")))("!"));
//│ res = test;
//│ // End of generated code
//│ test: int -> string -> string
//│     = [Function: test]

test 1 "hello"
//│ res: string
//│    = 'hello+!'

test 10 "hello"
//│ res: string
//│    = 'hello++++++++++!!!!!!!!!!'

Is this inconsistency a bug in the handling of scopes/names?

Other examples:

:js
def pow x =
  let rec go n =
    if n > 0 then go (n - 1) * x
    else 1
  in go
//│ // Query 1
//│ let pow;
//│ pow = (x) => (function (go) {
//│   return go;
//│ })((n) => (n > 0 ? go(n - 1) * x : 1));
//│ res = pow;
//│ // End of generated code
//│ pow: int -> int -> int
//│    = [Function: pow]

pow 3 4
//│ res: int
//│ Unexpected runtime error
//│   ReferenceError: go is not defined

:js
rec def pow n = { call = fun x -> if n > 0 then (pow (n - 1)).call x * x else 1 }
//│ // Query 1
//│ let pow1;
//│ pow1 = (n) => { call: (x) => (n > 0 ? pow(n - 1).call(x) * x : 1) };
//│ res = pow1;
//│ // End of generated code
//│ pow: int -> {call: int -> int}
//│    = [Function: pow1]

(pow 3).call 4
//│ res: int
//│ Unexpected runtime error
//│   TypeError: Cannot read properties of undefined (reading 'call')

Support shadowing in JS code-gen

def tmp = 1
def tmp = succ tmp

Runtime error occurred:
SyntaxError: Identifier 'tmp' has already been declared

Implement variance analysis

This will be important to simplify inferred types further.

Implement method parsing and type checking

As in the later versions of the paper. This is just an issue to track the progress of the current implementation in https://github.com/hkust-taco/mlscript/tree/methods

Implement `originProv`

As discussed in the meeting.

Can access fields from `anything`?

To reproduce, add tests with the following:

res = if true then (1,2, "hello") else (true, 3)
res._1
//│ res: anything
//│    = [ 1, 2, 'hello' ]
//│ res: 1 | true
//│    = undefined

It is tested using the code in master branch.

Support argument list and array splices `...xs`

Support typing and type inference for argument list and array splices.

Examples: f(a, b, ...c, d, ...e) and [a, b, ...c, d, ...e].

Type inference should do its best to infer the most precise types, but it may sometimes hit ambiguities, such as when constraining (a, b) <: (...?s, ...?t). In such cases, reporting a type error will be fine, prompting users to write a type annotation in order to remove the ambiguity.

Simplify the `with` construct semantics

Currently, this construct can override fields from classes with incompatible types while keeping the class identity (nominal tag). This is sound because the bare identity only informs about runtime instance testing and not about any specific field types – for that, one needs the full class type.

The flip side of the coin is that we cannot assume anythings from bare class identity although we'd often like to. This makes the types inferred from pattern matching very precise but also weird looking/alien, and it means the type simplifier has to work a lot harder when trying to reconstruct good looking full class types. It also pushes us to represent the field sets of type normal forms extensively rather than intensively, making handling them inefficient. Also, the concept of bare nominal tags doesn't have an equivalent in TS, making inferring them so easily extra awkward for interop.

It would be much nicer to make with strip class identity (both statically and dynamically), so that we can assume that class nominal tags imply the corresponding fields. So with will only ever create plain records. This is incidentally already what it does with tuples.

A different construct could be used to extend class values while getting a class instance as a result instead of a plain record. Something like new Foo {...foo; x = v} instead of the current foo with {x = v}. Making the target class Foo explicit solves the issue by curbing the feature's expressiveness and dynamicity.

JS code-gen is unhygienic

Example:

def f x = case 1 of { 1 -> x }

Generates, for the body of f:

(x) => (function (x) { if (x == 1) { return x; } throw new Error("non-exhaustive case expression"); })(1)

The normal way of dealing with this is the "gensym" approach, for "generating symbols". For every name x you want to use, you call gensym("x"), and a new name distinct with every other name in the program is picked.

We could also have a more local version that only picks names different from names currently in scope, which would avoid generating too many harder-to-read unique names.

Support basic operators in MLParser

Such as 2 + 2 instead of add 2 2.

	class Simple[A]
	method Get: A
	//│ Defined class Simple
	//│ Declared Simple.Get: simple -> nothing

	// TODO: find out if we need to support this.
	case _: Def \| _: TypeDef => throw CodeGenError("unexpected definitions in blocks")

	topLevelScope.getType(name) match {
	case S(sym: ClassSymbol) => S(sym)
	case S(sym: TraitSymbol) => N // TODO: inherit from traits
	case S(sym: TypeAliasSymbol) =>

	// TODO: Figure out how `=>` competes with other operators.
	case L(expr: JSRecord) => expr.toSourceCode.parenthesized

	def this(name: Str) = {
	this(name, N)
	// TODO: allow types and values to have the same name
	// TODO: read built-in symbols from `Typer`.

	// TODO: add `true`, `false`, and `error` to this list
	register(TypeAliasSymbol("anything", Nil, Top))
	register(TypeAliasSymbol("nothing", Nil, Bot))
	// TODO: register them in the same way as `Typer` does.
	Ls("int", "number", "bool", "string", "unit") foreach { name =>
	register(TypeAliasSymbol(name, Nil, TypeName(name)))
	}

	class oops: { x: int }
	//│ ╔══[ERROR] Type names must start with a capital letter
	//│ ║ l.3: class oops: { x: int }
	//│ ╙── ^^^^
	//│ Defined class oops

hkust-taco / mlscript Goto Github PK

mlscript's People

Contributors

Stargazers

Watchers

Forkers

mlscript's Issues

Example 1

Example 2

Recommend Projects

Recommend Topics

Recommend Org