Code Monkey home page Code Monkey logo

tree-sitter-unison's Introduction

Tree sitter for Unison

This tree sitter grammar uses some logic borrowed from the Haskell tree sitter grammar, particularly the way it handles indentation tracking for code blocks.

Every feature of the Unison language is implemented here.

If you have recommendations for improvements, create an issue, please. Thank you!

tree-sitter-unison's People

Contributors

kylegoetz avatar chuwy avatar dependabot[bot] avatar fmguerreiro avatar blinxen avatar zetashift avatar

Stargazers

Rod Gaither avatar Ananda Umamil avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tree-sitter-unison's Issues

Enable use in WASM by replacing usages of `strcat` with `strncat`

When compiling Tree-sitter grammars to WASM, only a limited set of functions from the C stdlib are available. The list of functions is defined here: https://github.com/tree-sitter/tree-sitter/blob/master/lib/src/wasm/stdlib-symbols.txt.

One use case for compiling grammars to WASM is adding them as extensions to the Zed editor. @zetashift has begun work on a Unison extension for Zed, here: https://github.com/zetashift/unison-zed.

It looks like the WASM build is currently failing due to a use of strcat. This function is not available under Tree-sitter's WASM runtime, but its safer cousin, strncat is (as of the latest master branch of Tree-sitter).

To check that the grammar works in WASM, you can run the tree-sitter build-wasm command. This might be a good thing to do in CI.

Note that the strncat function was just added on master, and isn't yet in the latest release.

complex FA failure

> z = (*) a b -> a * b fails to parse:

(unison [0, 0] - [0, 22]
  (watch_expression [0, 0] - [0, 22]
    (function_application [0, 2] - [0, 22]
      function_name: (wordy_id [0, 2] - [0, 3])
      (ERROR [0, 4] - [0, 5]
        (kw_equals [0, 4] - [0, 5]))
      (function_application [0, 6] - [0, 22]
        function_name: (operator [0, 7] - [0, 8])
        (wordy_id [0, 10] - [0, 11])
        (literal_function [0, 12] - [0, 18]
          param: (wordy_id [0, 12] - [0, 13])
          (arrow_symbol [0, 14] - [0, 16])
          (wordy_id [0, 17] - [0, 18]))
        (operator [0, 19] - [0, 20])
        (wordy_id [0, 21] - [0, 22])))))

Operator definition breaks the grammar

Operators in Unison are typically defined in parentheses:

(Numeric.>=) : Numeric -> Numeric -> Boolean
(Numeric.>=) = todo "implement"

This results in multiple errors in playground. Even if I replace >= with foo.

Duplicate symbols in ld

Hey!

Sorry, if it's a misconfiguration rather than a bug, I know nothing about tree-sitter, but I tried to install the tree-sitter-unison in Neovim a got the following error:

nvim-treesitter[unison]: Error during compilation                                                                         
duplicate symbol '_just' in:                                                                                              
    /tmp/scanner-131ce1.o                                                                                                 
    /tmp/maybe-d5ab1f.o                                                                                                   
duplicate symbol '_justLong' in:                                                                                          
    /tmp/scanner-131ce1.o                                                                                                 
    /tmp/maybe-d5ab1f.o                                                                                                   
duplicate symbol '_nothing' in:                                                                                           
    /tmp/scanner-131ce1.o                                                                                                 
    /tmp/maybe-d5ab1f.o                                                                                                   
duplicate symbol '_justDouble' in:                                                                                        
    /tmp/scanner-131ce1.o                                                                                                 
    /tmp/maybe-d5ab1f.o                                                                                                   
ld: 4 duplicate symbols for architecture x86_64                                                                           
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)  

Hex-formatted numbers not yet supported?

On version 1.0.2, getting some error nodes when parsing this bit of code (taken from https://exercism.org/tracks/unison/exercises/zebra-puzzle/solutions/runarorama):

zebraPuzzle.eachHouse : '{Each} Nat
zebraPuzzle.eachHouse = do each [0x10,0x08,0x04,0x02,0x01]

output:

term_declaration:
    type_signature:
      path:
      wordy_id:
      type_signature_colon:
      term_type:
        delayed:
          effect:
            wordy_id:
          wordy_id:
    term_definition:
      path:
      wordy_id:
      kw_equals:
      delay_block:
        do:
        function_application:
          wordy_id:
          literal_list:
            ERROR:
              ERROR:
              ERROR:
              ERROR:
              ERROR:
              ERROR:

abilities blowing up

for example, "one line structural ability":

structural ability Throw e where throw : e ->{Throw e} a

is yielding

(unison
      +(ERROR
        +(identifier)
        +(wordy_id)
        +(wordy_id)
        +(ERROR
          +(UNEXPECTED '))
        +(UNEXPECTED 'T')
        +(UNEXPECTED '\0')))
-      (ability_declaration
-        (structural)
-        (ability)
-        (type_constructor
-          (wordy_id)
-          (type_parameter))
-        (where)
-        (request_operation
-          (request)
-          (type))))

watch expressions should also allow watch /binds/

according to the unison FileParser.hs,

-- A stanza is either a watch expression like:
--   > 1 + x
--   > z = x + 1
-- Or it is a binding like:
--   foo : Nat -> Nat
--   foo x = x + 42

data Stanza v term
  = WatchBinding UF.WatchKind Ann ((Ann, v), term)
  | WatchExpression UF.WatchKind Text Ann term
  | Binding ((Ann, v), term)
  | Bindings [((Ann, v), term)]
  deriving (Foldable, Traversable, Functor)

WatchBinding and WatchExpression are parsed using the combined function watchExpression:

watchExpression = do
      (kind, guid, ann) <- watched
      _ <- guardEmptyWatch ann
      msum
        [ WatchBinding kind ann <$> TermParser.binding,
          WatchExpression kind guid ann <$> TermParser.blockTerm
        ]

Here, the crucial functions are:

watched :: (Var v) => P v (UF.WatchKind, Text, Ann)
watched = P.try $ do
  kind <- optional wordyIdString
  guid <- uniqueName 10
  op <- optional (L.payload <$> P.lookAhead symbolyIdString)
  guard (op == Just ">")
  tok <- anyToken
  guard $ maybe True (`L.touches` tok) kind
  pure (maybe UF.RegularWatch L.payload kind, guid, maybe mempty ann kind <> ann tok)

-- | Rules for the annotation of the resulting binding is as follows:
-- * If the binding has a type signature, the top level scope of the annotation for the type
-- Ann node will contain the _entire_ binding, including the type signature.
-- * The body expression of the binding contains the entire lhs (including the name of the
-- binding) and the entire body.
-- * If the binding is a lambda, the  lambda node includes the entire LHS of the binding,
-- including the name as well.
binding :: forall v. (Var v) => P v ((Ann, v), Term v Ann)
binding = label "binding" do
  typ <- optional typedecl
  -- a ++ b = ...
  let infixLhs = do
        (arg1, op) <-
          P.try $
            (,) <$> prefixDefinitionName <*> infixDefinitionName
        arg2 <- prefixDefinitionName
        pure (ann arg1, op, [arg1, arg2])
  let prefixLhs = do
        v <- prefixDefinitionName
        vs <- many prefixDefinitionName
        pure (ann v, v, vs)
  let lhs :: P v (Ann, L.Token v, [L.Token v])
      lhs = infixLhs <|> prefixLhs
  case typ of
    Nothing -> do
      -- we haven't seen a type annotation, so lookahead to '=' before commit
      (lhsLoc, name, args) <- P.try (lhs <* P.lookAhead (openBlockWith "="))
      body <- block "="
      verifyRelativeName' (fmap Name.unsafeFromVar name)
      pure $ mkBinding (lhsLoc <> ann body) (L.payload name) args body
    Just (nameT, typ) -> do
      (lhsLoc, name, args) <- lhs
      verifyRelativeName' (fmap Name.unsafeFromVar name)
      when (L.payload name /= L.payload nameT) $
        customFailure $
          SignatureNeedsAccompanyingBody nameT
      body <- block "="
      pure $
        fmap
          (\e -> Term.ann (ann nameT <> ann e) e typ)
          (mkBinding (ann lhsLoc <> ann body) (L.payload name) args body)
  where
    mkBinding loc f [] body = ((loc, f), body)
    mkBinding loc f args body =
      ((loc, f), Term.lam' (loc <> ann body) (L.payload <$> args) body)

-- We disallow type annotations and lambdas,
-- just function application and operators
blockTerm :: (Var v) => TermP v
blockTerm = lam term <|> infixAppOrBooleanOp

varid being matched with empty string

x = let y = 5
        z = 1
        y + z

results in (observe term_name: (varid [0, 9] - [0, 9])):

(unison [0, 0] - [2, 13]
  (ERROR [0, 0] - [2, 13]
    (varid [0, 0] - [0, 1])
    (kw_equals [0, 2] - [0, 3])
    (kw_let [0, 4] - [0, 7])
    (stmt [0, 8] - [0, 9]
      (identifier [0, 8] - [0, 9]))
    (stmt [0, 9] - [0, 13]
      (term_definition [0, 9] - [0, 13]
        term_name: (varid [0, 9] - [0, 9])
        (kw_equals [0, 10] - [0, 11])
        (nat [0, 12] - [0, 13])))

test: "list literal" failing

x = [1, 2, 3]

being parsed as

 (literal_list
            (ERROR
              (UNEXPECTED '1')
              (UNEXPECTED '2')
              (UNEXPECTED '3'))))))

rather than (literal_list (nat) (nat) (nat))

wordy_id not matching "a"

wordy_id: $ => choice('a', regex.varid),

this matches "a" but

wordy_id: $ => regex.varid,

does not match "a"

comments should not affect indentation

blah = cases
       {- A comment -}
          -- A one-line comment
     0 -> "hi"
     1 -> "bye"

The indentation of the block and inline comments should be ignored to scan for the 0 indentation

investigate nested identifiers

          Thanks!

Btw, not sure if related or not, but I also discovered that identifiers like namespaces.>=.nested are not accepted at the moment. Could be changed in future if I understood it right.

Originally posted by @chuwy in #39 (comment)

Basic comments at start of file do not parse correctly

I have done zero debugging other than consistently seeing this in Nova, so it is possible if unlikely?) that it is a bug in Nova, but this simple test case should do the trick to find out: just start a file with this—

-- this is a comment

At least in Nova's use of the parser, it parses as an operator instead of a comment start. Regular doc blocks and all other items I have tried parse correctly in that position, though.

Consider committing `grammar.json`

While keeping parser.c out of version control is reasonable (and in fact recommended by upstream), the generated grammar.json is much smaller and has meaningful diffs. The benefit of keeping this in the repo is that downstream users (like nvim-treesitter) do not require node for building and can just do tree-sitter g src/grammar.json && tree-sitter build -o parser.so (which is what nvim-treesitter is transitioning to for 1.0).

In addition, consider making proper releases including the generated artifacts (which is what nvim-treesitter will require for "stable" parsers). Upstream is in the process of adding reusable workflows and tree-sitter support to make this easier.

The unique modifier is optional now

Parsing breaks because the unique modifier now isn't required:

type core.Recipe = {
  metadata: [Text],
  description: Text,
  ingredients: [Text],
  steps: [Step]
}

image

support @rewrite blocks

For an example, see here

eitherToOptional e a =
  @rewrite
    term Left e ==> None
    term Right a ==> Some a
    case Left e ==> None
    case Right a ==> Some a
    signature e a . Either e a ==> Optional a
eitherToOptional

use clauses

need to support use clauses, which must permit operators

function literal

y = x -> x + 2

(unison
      (term_declaration
        (term_definition
          (wordy_id)
          (kw_equals)
          (function_application
            (identifier)
-            (function_param
-              (function_application
-                (operator)
-                (function_param
-                  (function_application
-                    (identifier
-                      (wordy_id))
-                    (function_param
-                      (function_application
-                        (operator)
-                        (function_param
-                          (nat))))))))))))
+            (operator)
+            (nat)))))

parsing error re prefix op

> (+) 1 1 bar has parsing errors

should eval as

(function app
  (function app
    (function app
      (op)
      (nat))
    (nat))
    (wordy_id))

byte literal

x = 0xsdeadbeef should be parsing to byte literal but not. I'm getting

(unison
-      (ERROR
-        (wordy_id)
-        (kw_equals)
-        (nat)
-        (UNEXPECTED 'x')))
+      (term_declaration
+        (term_definition
+          (wordy_id)
+          (kw_equals)
+          (literal_byte))))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.