Code Monkey home page Code Monkey logo

Comments (6)

erszcz avatar erszcz commented on August 30, 2024 2

Hi @xxdavid!

I still think the Local Type Inference approach is doable in our case.
If you agree, I will now dig deeper in the paper and try to extend the theory to our situation (which in particular includes type unions and any()).

Sounds great! I'm pretty sure such an addition will be welcome by everyone 👍

I was trying to get a more complete picture of how Gradualizer works with polymorphism (I also came to an interesting realisation, will write about it later).

I'm quite intrigued now! I'm looking forward to learning your findings.

I have a few thoughts to share, too, I just have to find a while to write them down.

from gradualizer.

xxdavid avatar xxdavid commented on August 30, 2024 2

Hi!

So I've been researching Local Type Inference (LTI) for the past months, and here's what I found out.

In essence, the LTI algorithm goes like this. When inferring the type of a polymorphic function application, you first generate constraints on involved type variables by declaring that the actual arguments should be subtypes of the function parameters. For each type variable, you manage a single lower and a single upper bound (these are the constraints). You start with a lower bound equal to Bot/none(), and a lower bound equal to Top. When you want to combine constraints for a type variable, you make LUB of the lower bounds and GLB of the upper bounds. Whenever we get constraints such that a lower bound of a type variable is not a subtype of an upper bound of the type variable, we raise a type error. At the end (of the call-expression type-checking), you look at the position of each involved type variable in relation to the function result type. If the type variable X is constant (not appearing) or covariant in the result type, you assign X its lower bound. If it is contravariant, you assign it its upper bound.

For example, suppose that we have these functions:

  • filter([A], fun((A) -> boolean())) -> [A]
  • integers() -> [integer()]
  • positive(number()) -> boolean()

then the call filter(integers(), positive()) would be typed in the following way:

  • we start with the constraints Bot <: A <: Top
  • we verify that [integer()] <: [A] and obtain the constraint that integer() <: A, which we combine with Bot <: A <: Top into integer() <: A <: Top (because LUB of Bot and integer() is integer())
  • we verify that (number() -> boolean()) <: (A -> boolean()) which yields us A <: number(), we then again combine it with the current constraints and get integer() <: A <: number()
  • we verify that indeed integer() <: number()
  • A is covariant in [A], so we take the lower bound of A, integer(), and use it as the type of the whole call expression

The LTI comprises (mainly) of these components:

  • subtyping relation
  • greatest lower bound (GLB) and lowest upper bound (LUB) relations
  • constraint generation relation
  • variable elimination relations
  • minimal substitution algorithm
  • rule for application of polymorphic functions

The first four are dependent on the set of types, whereas the last two are not (mostly). This means that if you can provide the first four relations for your set of types, you can probably have LTI. Therefore unions, tuples, maps and things like that should pose no problem in implementing the LTI, as we already have three of four implemented in Gradualizer. In our case the subtyping and constraint generation relations are both implemented via the subtype/3 function, and GLB and LUB have their own functions. We don't have variable elimination but that should be pretty straightforward to implement.

A thing that is much harder to add is gradual typing. I haven't found any paper combining LTI with gradual typing. I've been playing with it quite a lot myself and at the end I came up with a trivial extension of the original LTI algorithm that works nicely in most cases and, as it turned out, is also used in practice.

In pure subtyping (without any), we have the subtyping relation that forms a partial order. With gradual typing, we add the consistency relation (to quote the original definition, two types are consistent iff they are equal where they are both known, i.e., where they are not any). In practice we usually combine these two relations into the consistent-subtyping relation (like our subtype/3) which is neither transitive nor antisymmetric, and thus is not a partial order.

A simple approach to adding gradual typing to the LTI would be (as also suggested by @zuiderkwast here) to just ignore all anys and instead use Bot in the lower bounds and Top in upper bounds. But this does not really work because we would for example infer none() for id(dynamic) where dynamic :: any(), which is not what we want.

Another possible and trivial change to the LTI is to use consistent-subtyping in all places where subtyping was originally used, and to add any to the syntax of types and treat it like every other type. This approach actually seems to work nicely in most cases. Here are just a few examples:

  • id(dynamic)any()
  • id(integers)[integer()]
  • filter(integers, positive())integers()
  • map([dynamic()], positive)[boolean()]
  • append(integers, dynamic)[integer() | any()]
    • this one is interesting because here the union with any() acts as a dynamic type with a lower bound, i.e., we don't know precisely what type the expression has, but it is surely a list that must include integers and maybe also something else

given this typing environment: id(A) -> A, filter([A], fun((A) -> boolean())) -> [A], integers :: [integer()], positive(number()) -> boolean(), map([A], fun((A) -> boolean())) -> [B], dynamic :: any(), append([A], A) -> [A].

This approach is also taken by eqWAlizer as we can see in the docs (1, 2) and in the code.

A problem with this approach is that it lacks theoretical foundations, that is, we don't know (formally) how good the modified algorithm exactly is. The property of always inferring the most informative type, proved in the paper, no longer makes sense as we are not dealing with a partial order here. For example, all of the following types would be valid outputs according to the original property when inferring type of the expression id(some_bool, some_int): {boolean(), integer()}, {any(), integer()}, {boolean(), any()}, {any(), any()}, any(). Clearly, {boolean(), integer()} is the best of these. On the other hand, when inferring type of the expression id(completely_unknown), it is better to infer any()rather than for example boolean() or none().

It should also be noted that this method is a bit weaker than carrying all the constraints throughout a function and solving them only at the end. For example, we would infer [Hd | Rest] = [any()] for filter([dynamic()], positive) and thus would not raise an error when subsequently calling atom_to_binary(Hd). In the current implementation of Gradualizer, we would record that Hd has an upper bound of number() and later also a lower bound of atom(), and that these two are not compatible. This might be the reason why @josefs once wrote that the constraints must be solved at the end.

However, I think that having the LTI is still worth it. First and foremost, it enables us to provide better error messages which are local (each variable has a concrete type) and which can pinpoint the problematic line (as shown in the first post). Second, I think it would greatly simplify our codebase as constraints would probably be needed only in subtype/3 (and related functions), and in not all of the other functions that now have to carry and update them (as they would deal only with concrete types). Third, this is not a major argument, but also many other tools are based on LTI. I've mentioned eqWAlizer but also the prepared Elixir type system is going to have LTI (but one based on set-theoretic types so the theory does not apply to us). I've also checked TypeScript and Flow and based on the fact that you can hover on a variable (bound to the result of a polymorphic call) and see its inferred type, I also assume they use some kind of LTI (Flow has even announced it some time ago).

What do you think? If you agree, I will start with the implementation. I originally wanted to do the theoretical part first (and prove how good/strong the gradual LTI is) and then implement it. But I feel a bit stuck now, so I would rather reverse the order, given that eqWAlizer has already successfully implemented it in a similar way.

from gradualizer.

xxdavid avatar xxdavid commented on August 30, 2024 1

Hi @erszcz!

Thank you for your perspective and sorry for taking it so long for me to respond.

I think it was good that you saved @josefs' solver code, cleaned it, and merged to master, even though it wasn't complete. Without it, we probably would not have it.

In the previous weeks I was trying to get a more complete picture of how Gradualizer works with polymorphism (I also came to an interesting realisation, will write about it later). And I still think the Local Type Inference approach is doable in our case.
If you agree, I will now dig deeper in the paper and try to extend the theory to our situation (which in particular includes type unions and any()). And if everything works out, I'll try to implement it in Gradualizer. I'll post updates in this thread.

And I also agree that the LTI approach should also fix the case you linked, thanks for pointing it out!

from gradualizer.

xxdavid avatar xxdavid commented on August 30, 2024 1

Hi! I have some progress on the LTI implementation to share. I've implemented the core of LTI into Gradualizer, and while it still needs quite a lot of refactoring and tuning for edge cases, it already behaves well in some cases.

I am very happy about how accurate error messages it now produces. Take a look:

1 -module(test_lti).
2 
3 -spec id(A) -> A.
4 id(X) -> X.
5
6 f() ->
7     Answer = id(42),
8     atom_to_list(Answer).

Old:

test_lti.erl: Lower bound 42 of type variable A_typechecker_3742_2 on line 6 is not a subtype of atom()

New:

test_lti.erl: The variable on line 8 at column 18 is expected to have type atom() but it has type 42

f() ->
    Answer = id(42),
    atom_to_list(Answer).
                 ^^^^^^

Notice that there are no type variables involved and that the precise location of the error is printed. It's because at line 8 we already know that Answer is of type 42.

The second type of errors related to polymorphism may be in the call itself. Here I came up with a new error message (not fully polished yet) and I really like how informative it is.

1 -module(test_lti).
2
3 -spec map([A], fun((A) -> B)) -> [B].
4 map([], _F) -> [];
5 map([X | Xs], F) -> [F(X) | map(Xs, F)].
6
7 g() ->
8     map([1, 2, 3], fun atom_to_binary/1),
9     ok.

Old:

test_lti.erl: Lower bound 1 of type variable A_typechecker_3742_5 on line 7 is not a subtype of atom()

New:

test_lti.erl: The type variable A in the call to map on line 8 at column 5 is instantiated to both 1..3 and atom(), and these types are not compatible.

g() ->
    map([1, 2, 3], fun atom_to_binary/1),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Relevant bindings:
    map :: fun(([A], fun((A) -> B)) -> [B])
    [1, 2, 3] :: [1..3, ...]
    fun atom_to_binary/1 :: fun((atom()) -> binary())

So I think we are headed in the right direction. I'll continue working on it and I expect that in a few weeks, I'll have it ready for code review. 🙂

from gradualizer.

erszcz avatar erszcz commented on August 30, 2024 1

Wow! These results are awesome 🎉 Keep up the great work, @xxdavid!

from gradualizer.

erszcz avatar erszcz commented on August 30, 2024

Hi, @xxdavid!

Interesting findings! I haven't read Pierce and Turner's Local Type Inference completely, so I have only a superficial understanding of the paper - mostly what they achieved, not how exactly. However, your reasoning seems correct and I'm sure there's room to improve the constraint solver.

What I know for certain is that the constraint solver code that's merged to the master branch was code that I salvaged from an old @josefs's PR which, according to him, never got completely finished. I rebased it onto the then current master, tried running on some examples and it turned out it does report some errors which were previously false negatives, so I cleaned it up a bit, added some tests, and we merged it. Otherwise, it would probably bitrot even more, whereas now it provides at least a little more feedback. It could definitely be improved, though.

On top of what you said, I think there's at least one more place where, I think, solving constraints could help make the message more precise - https://github.com/josefs/Gradualizer/blob/master/src/typechecker.erl#L3349-L3352. There might be more, but I don't have examples at hand.

In any case, good thinking!

from gradualizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.