Code Monkey home page Code Monkey logo

tree-sitter-haskell's People

Contributors

414owen avatar amaanq avatar brandonspark avatar crumbtoo avatar dependabot[bot] avatar farbodsz avatar kazatsuyu avatar lf- avatar lunixbochs avatar mattmassicotte avatar maxbrunsfeld avatar maxdeviant avatar molleweide avatar patrickt avatar philipturnbull avatar rewinfrey avatar seasonedfish avatar tek avatar ubavic avatar vixietsq avatar wenkokke avatar zweimach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tree-sitter-haskell's Issues

Update on the state of the grammar

Today @maxbrunsfeld and I met to talk about the current state of the Haskell grammar. We were interested in trying to get master back to a happy state. However, after untangling some of the conflicts with the grammar we found the remaining conflicts were increasing in complexity, which prompted us to re-evaluate our approach to parsing Haskell.

The original goal for this grammar was to be able to parse 100% of semantic, which is an open source program analysis library we maintain and use at GitHub. Semantic uses several default language extensions as well as additional language extensions in specific modules. Supporting these language extensions is a great goal, but adds a considerable amount of complexity to the grammar in terms of the number of productions used and handling the conflicts they can generate.

Max and I agreed that the current state of the grammar is extremely complex, including the external scanner. As more and more language extension support was added during the initial development of the grammar, the number of conflicts increased, and has the unfortunate side-effect of making it very hard to reason about how to fix conflicts or make changes.

I proposed to start a fresh grammar targeting only Haskel 2010 with an external scanner for correctly parsing Haskell's whitespace sensitive layout rules. From there, Max and I think a layered strategy to support the most common language extensions would be appropriate (similar to the way the typescript parser works in conjunction with the javascript parser). We think this approach, with an emphasis on long-term maintainability, will make it easier for future iterations to make changes to the grammar, support new language extensions, and also make it easier for community contributions. Max and I also agree that it may not be possible for tree-sitter to fully support all of Haskell's language extensions as tree-sitter is a generic incremental parsing library, and Haskell is one of the more complex programming languages because of its language extension system. Time and experience will tell.

@maxbrunsfeld please feel free to include any other important points I may have left out. 🙇

Incorrect LAYOUT_CLOSE_BRACE applied too early by scanner

This edge case currently causes the scanner to close the last where clause prematurely, resulting in an error state:

runStateR m s = loop s m
 where
   loop s' (E u q) = case decompose u of
     Left  u'  -> case decompose u' of
       Right Reader -> k s' s'
    where k s'' = q >>> loop s''

layout elements with implicit semicolon break queries

When a semicolon is deduced from indent in the scanner, the next node is skipped when I use tree-sitter for neovim highlighting.

Example:

import Foo.Bar
import Foo.Bar

In this case, only the first line is highlighted, and the second line doesn't even appear in the tree displayed by :TSPlaygroundToggle. However, with tree-sitter parse, it is correctly included in the tree.

On the other hand, with an explicit semicolon:

import Foo.Bar;
import Foo.Bar

the second line is highlighted as well.

The scanner consumes the newline as part of the implicit semicolon. Maybe that's the problem?

@maxbrunsfeld @patrickt any advice?

Grammar does not fully support -XBlockArguments extension

GHC 8.6 comes with an extension, “block arguments”, that entails a drastic change to the parser. You can find the change specification here, but the tl;dr is that you can now often omit parentheses and $ invocations associated with do blocks:

local f do
  thing1
  thing2

rather than

local f $ do
  thing1
  thing2

This has drastic effects on the grammar—in the words of the proposal author:

Unless a special care is taken, an implementation will add a large number of shift-reduce conflicts to the parser, due to the reliance on the meta-rule mentioned above

This is a nontrivial change, but it’s one we should ultimately make.

Native and WASM parsers behave differently

For instance, if we parse examples/postgrest/test/Main.hs:

  • using native, the top-level node is haskell;
  • using wasm, the top-level node is ERROR.

In this case, the problem seems to be absent module headers.

Help with creating a new parser!

Hi guys,

First of all, thanks for the amazingly organized and commented code; couldn't find anything like this for a month!

So, I'm willing to use your external scanner as a base for my own, and I don't seem to get what's needed to compose a simple parser.
My parser is straightforward, just keeps consuming characters and advances the lexer until it encounters "{", ";", "}" or white space; so I thought the following would work, but no 😢 :

bool non_identifier_char(const uint32_t c) { return iswspace(c) || eq(';')(c) || eq('{') || eq('}') || eq('$'); };
const bool non_identifier_chars(State & state) { return non_identifier_char(state::next_char(state)); };

// If identifier symbol is active, fail if not an identifier char
Parser identifier = sym(Sym::identifier)(iff(cond::non_identifier_chars)(fail));
// Do nothing else, just check for identifiers
all = identifier;

Can anyone help? Thanks in advance!

Parse errors when using DerivingVia

This happens to me when using neovim with nvim-treesitter.

Steps to reproduce:

  1. git clone https://github.com/luc-tielen/souffle-haskell
  2. open the file: "DerivingViaSpec.hs", notice the highlighting is not fully working
  3. in neovim: :TSPlaygroundToggle, errors show up in the AST.

Probably some uncovered case because it is using a lot of extensions/type-level features?

Outermost function when using $ operator isn't parsed as a function

When I have multiple functions chained together using the $ operator, the outermost function doesn't get highlighted as a function.

The problem

Here is my code:

exactMatches :: Code -> Code -> Int
exactMatches actualPegs guessPegs =
    length $ filter foundMatch $ zip actualPegs guessPegs where
        foundMatch :: (Peg, Peg) -> Bool
        foundMatch (actual, guess) = actual == guess

Here is what it looks like with the onedark.nvim colorscheme in neovim:
image
The outermost function, length, is not highlighted the same color as the functions filter and zip.
So, I suspect that tree-sitter isn't parsing it as a function.

Expected behavior

I expected length to be highlighted the same color as other functions.
When I use parentheses instead, length is highlighted correctly.
image

Grammar does not support CPP Directives

CPP directives like the following (taken from the lens repository) are not supported:

#if MIN_VERSION_base(4,10,0)
----------------------------------------------------------------------------
-- CompactionFailed
----------------------------------------------------------------------------

-- | Compaction found an object that cannot be compacted.
-- Functions cannot be compacted, nor can mutable objects or pinned objects.
class AsCompactionFailed t where
  -- | Information about why a compaction failed.
  --
  -- @
  -- '_CompactionFailed' :: 'Prism'' 'CompactionFailed' ()
  -- '_CompactionFailed' :: 'Prism'' 'SomeException'    ()
  -- @
  _CompactionFailed :: Prism' t String

instance AsCompactionFailed CompactionFailed where
  _CompactionFailed = _Wrapping CompactionFailed
  {-# INLINE _CompactionFailed #-}

instance AsCompactionFailed SomeException where
  _CompactionFailed = exception._Wrapping CompactionFailed
  {-# INLINE _CompactionFailed #-}

pattern CompactionFailed_ e <- (preview _CompactionFailed -> Just e) where
  CompactionFailed_ e = review _CompactionFailed e
#endif

Originally I considered parsing CPP directives and their contents as an extra. This solution is sub-optimal because the resulting parse tree is lossy. Treating each possible CPP directive supported by Haskell as a production in the grammar would prevent the loss, but would add additional symbols and further increase generation and compile time.

Fails to build on Windows

Hi, we've imported the grammar into helix, now that #27 was addressed! A bug report was just opened (helix-editor/helix#117) regarding building the grammar on Windows, it seems that there's some errors with scanner.cc:

  languages\tree-sitter-haskell\src\scanner.cc(1555): error C3861: 'to_string': identifier not found
  languages\tree-sitter-haskell\src\scanner.cc(1556): error C3861: 'to_string': identifier not found
  languages\tree-sitter-haskell\src\scanner.cc(1557): error C3536: 'col': cannot be used before it is initialized
  languages\tree-sitter-haskell\src\scanner.cc(1557): error C2676: binary '+': 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>' does not define this operator or a conversion to a type acceptable to the predefined operator
  languages\tree-sitter-haskell\src\scanner.cc(1557): error C2672: 'operator __surrogate_func': no matching overloaded function found
  exit code: 2

Typing a quasiquote causes tree-sitter-haskell (and by extension, neovim) to segfault

Steps to reproduce:

  1. Open nvim
  2. :e Hmm.hs
  3. Type: [i|. The final | will cause a segfault

GDB tells us:

#0  0x00007ffff7bb73e9 in malloc () from /nix/store/9bh3986bpragfjmr32gay8p95k91q4gy-glibc-2.33-47/lib/libc.so.6                                                             
#1  0x00007fffe8fef8d5 in operator new(unsigned long) () from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                         
#2  0x00007fffe8fe6a87 in std::_Function_base::_Base_manager<state::mark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::$_1>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) () from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so               
#3  0x00007fffe8feb8c4 in std::function<void (State&)>::function(std::function<void (State&)> const&) ()                                                                     
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#4  0x00007fffe8fdfc71 in parser::effect(std::function<void (State&)>) () from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so        
#5  0x00007fffe8fe0956 in parser::mark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()                                                   
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#6  0x00007fffe8fea49a in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#7  0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#8  0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#9  0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#10 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#11 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#12 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#13 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#14 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#15 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#16 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()                                                  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#17 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()               utf-8       haskell  All    1:3  
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so                                                                               
#18 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#19 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
   from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so

Comments following function included in function pattern

For functions with a do block, the comments following the function get included in the function, for example:

f = do a

-- | haddock
g = b

here the function pattern will include all of f and the doc comment of g. This isn't the case when there is no do block:

f = a

-- | haddock
g = b

in this case it works as I expected, only matching f = a.

I tested this out using the latest commit on the master branch, using the following tree sitter query:

(function rhs: (_) @function.inside) @function.around

(both captures end up including the doc comment)

`undefined symbol: tree_sitter_haskell_external_scanner_create`

I think I've followed instructions correctly:

  • I cloned this repo
  • I ran tree-sitter generate using Tree Sitter 0.2
  • I added Haskell manually to my Neovim config
  • I ran TSInstall haskell which succeeded without an error

Now, when I open a Haskell file, I get:

E5108: Error executing lua Failed to load parser: uv_dlopen: /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so: undefined symbol: tree_sitter_haskell_external_scanner_create

I'm not really sure what's up at this point. The generated .so certainly has tree_sitter_haskell_external_scanner_create when I lookup strings haskell.so

as of `381dca04`, parsing hangs on any input when called from Rust

Hey there! It appears that as of 381dca0, this grammar fails to terminate on any Haskell file when called from the tree-sitter library for Rust (I used 0.19.3.) The behavior shows up as parsing failing to terminate. I haven't found any syntax in particular that triggers or avoids this; it happens on every file I tested (even a blank file).

I bisected, and 1b00b9f (the commit before 381dca0) was the last commit where parsing terminated.

Here's a SSCCE project that demonstrates the issue: https://git.bytes.zone/brian/tree-sitter-haskell-sscce. I used Nix to get a reproducible environment for this. Happy to help y'all get it set up locally if necessary!

scanner counts newlines multiple times

In the wake of neovim merging support for TS 0.19, I am getting back to it.

First thing I noticed is that the line numbers are off – certain scanner results cause the line number to be incremented by two when parsing a newline.
I am fiddling around with how these get emitted in the scanner, so far it's still unclear to me what the exact reason is; the only thing I achieved so far is that they are counted even more times.

As far as I can tell, there are these factors:

  • Having /\n/ in extras
  • Having /\n/ in externals
  • Actually emitting the newline symbol in the scanner

@maxbrunsfeld Would greatly appreciate some advice!

UnicodeSyntax support

I may be holding it wrong, but at least some unicode symbols are not supported as syntax:

e.g.:

processStateUpdater 
   a m.
  (NOMInput a, UpdateMonad m) 
  Config 
  u  a 
  StateT (ProcessState a) m ([NOMError], ByteString)

gives me

(haskell [0, 0] - [5, 52]
  (top_splice [0, 0] - [5, 52]
    (exp_infix [0, 0] - [5, 52]
      (exp_apply [0, 0] - [1, 9]
        (exp_name [0, 0] - [0, 19]
          (variable [0, 0] - [0, 19]))
        (ERROR [0, 20] - [1, 5]
          (ERROR [0, 20] - [0, 23]))
        (exp_name [1, 6] - [1, 7]
          (variable [1, 6] - [1, 7]))
        (exp_name [1, 8] - [1, 9]
          (variable [1, 8] - [1, 9])))
      (operator [1, 9] - [1, 10])
      (exp_apply [2, 2] - [5, 52]
        (exp_tuple [2, 2] - [2, 29]
          (exp_apply [2, 3] - [2, 13]
            (exp_name [2, 3] - [2, 11]
              (constructor [2, 3] - [2, 11]))
            (exp_name [2, 12] - [2, 13]
              (variable [2, 12] - [2, 13])))
          (comma [2, 13] - [2, 14])
          (exp_apply [2, 15] - [2, 28]
            (exp_name [2, 15] - [2, 26]
              (constructor [2, 15] - [2, 26]))
            (exp_name [2, 27] - [2, 28]
              (variable [2, 27] - [2, 28]))))
        (ERROR [2, 30] - [2, 33]
          (ERROR [2, 30] - [2, 33]))
        (exp_name [3, 2] - [3, 8]
          (constructor [3, 2] - [3, 8]))
        (ERROR [3, 9] - [3, 12]
          (ERROR [3, 9] - [3, 12]))
        (exp_name [4, 2] - [4, 3]
          (variable [4, 2] - [4, 3]))
        (ERROR [4, 4] - [4, 7]
          (ERROR [4, 4] - [4, 7]))
        (exp_name [5, 2] - [5, 8]
          (constructor [5, 2] - [5, 8]))
        (exp_parens [5, 9] - [5, 25]
          (exp_apply [5, 10] - [5, 24]
            (exp_name [5, 10] - [5, 22]
              (constructor [5, 10] - [5, 22]))
            (exp_name [5, 23] - [5, 24]
              (variable [5, 23] - [5, 24]))))
        (exp_name [5, 26] - [5, 27]
          (variable [5, 26] - [5, 27]))
        (exp_tuple [5, 28] - [5, 52]
          (exp_list [5, 29] - [5, 39]
            (exp_name [5, 30] - [5, 38]
              (constructor [5, 30] - [5, 38])))
          (comma [5, 39] - [5, 40])
          (exp_name [5, 41] - [5, 51]
            (constructor [5, 41] - [5, 51])))))))

scanner .... handling of layout_ symbols not clear

In the scanner I don't get this part.
Especially, what is lookahead == 0 ?
I don't see the connection that should exist between newlines/indentation and the LAYOUT_SEMICOLON.

...

if (lexer->lookahead != '\n') return false;
    advance(lexer); 

    bool next_token_is_comment = false;
    uint32_t indent_length = 0;
    for (;;) {
      if (lexer->lookahead == '\n') {
        indent_length = 0;
        advance(lexer);
      } else if (lexer->lookahead == ' ') {
        indent_length++;
        advance(lexer);
      } else if (lexer->lookahead == '\t') {
        indent_length += 8;
        advance(lexer);
      } else if (lexer->lookahead == 0) {
        if (valid_symbols[LAYOUT_SEMICOLON]) {
          lexer->result_symbol = LAYOUT_SEMICOLON;
          return true;
        }
        if (valid_symbols[LAYOUT_CLOSE_BRACE]) {
          lexer->result_symbol = LAYOUT_CLOSE_BRACE;
          return true;
        }
        return false;
      } else {
        if (lexer->lookahead == '-' || lexer->lookahead == '{') {
          advance(lexer);
          next_token_is_comment = lexer->lookahead == '-';
        }
        break;
      }
    }
...

Unable to setup on mac

I have this error while trying to setup tree-sitter haskell on a mac:

image

Do you have any ideas what I might be doing wrong?
I tried updating gcc on mac, but this still happens

Error with a file using quasiquotes

If I open tree-sitter-playground on this file, I get a bunch of error nodes, though the file compiles just fine:

{-# language BlockArguments #-}
{-# language QuasiQuotes #-}
{-# language RecordWildCards #-}
{-# language TemplateHaskell #-}

module CircuitHub.PNP.Camera
  ( -- * Initialising Pylon
    Pylon
  , withPylon

    -- * Opening cameras
  , Camera
  , maxNumBuffers
  , withCamera
  , startGrabbing
  , onImageGrabbed
  , PylonImage
  , cloneGrabResultImage
  , saveTiff
  ) where

-- StateVar
import Data.StateVar ( SettableStateVar, makeSettableStateVar )

-- base
import Foreign.C.Types ( CInt )
import Foreign.C ( withCString )
import Foreign.Ptr ( Ptr )

-- inline-c
import qualified Language.C.Inline as C

-- inline-c-cpp
import qualified Language.C.Inline.Cpp as C
import qualified Language.C.Inline.Cpp.Exceptions as C

-- pnp
import CircuitHub.PNP.Camera.Context ( CGrabResultPtr, CInstantCamera, CPylonImage, pylonCtx )

-- unliftio
import UnliftIO ( MonadIO, MonadUnliftIO, bracket, bracket_, liftIO, withRunInIO )


C.context (C.cppCtx <> C.funCtx <> pylonCtx)


C.include "pylon/PylonIncludes.h"


C.include "HardwareTriggerConfiguration.h"


C.include "SaveHandler.h"


C.include "RTSSignal.h"


C.using "namespace Pylon"


C.using "namespace CircuitHub"


data Pylon = Pylon


-- | Initialize the Pylon SDK. Pylon must be initialized before any other
-- functions can be called.
withPylon :: MonadUnliftIO m => (Pylon -> m a) -> m a
withPylon k = bracket_ create destroy $ k Pylon
  where
    create = liftIO do
      [C.exp| void { PylonInitialize() } |]

    destroy = liftIO do
      [C.exp| void { PylonTerminate() } |]


-- | The Pylon SDK doesn't do a good job of dealing with interupted syscalls,
-- which can happen when GHC's RTS sends VALRM signals. This wrapped blocks
-- these signals, allowing foreign code to execute without interuption.
withBlockedSignals :: MonadUnliftIO m => m a -> m a
withBlockedSignals k = bracket block unblock \_ -> k
  where
    block = liftIO do
      [C.exp| void* { new RtsSignalBlocker() } |]

    unblock rtsSignalBlocker = liftIO do
      [C.exp| void { delete (RtsSignalBlocker*)$(void* rtsSignalBlocker) } |]


-- | The maximum number of buffers available by the camera's grab loop.
maxNumBuffers :: Camera -> SettableStateVar CInt
maxNumBuffers (Camera cameraPtr) = makeSettableStateVar \n ->
  [C.throwBlock| void { $(CInstantCamera* cameraPtr)->MaxNumBuffer = $(int n); } |]


newtype Camera = Camera (Ptr CInstantCamera)


-- | Open a Basler camera.
withCamera :: MonadUnliftIO m => Pylon -> (Camera -> m a) -> m a
withCamera _ = bracket create destroy
  where
    create = withBlockedSignals $ liftIO $ Camera <$>
      [C.throwBlock| CInstantCamera* {
        CInstantCamera* camera = new CInstantCamera(CTlFactory::GetInstance().CreateFirstDevice());
        camera->RegisterConfiguration( new HardwareTriggerConfiguration, RegistrationMode_ReplaceAll, Cleanup_Delete );
        return camera;
      }|]

    destroy (Camera camera) = liftIO do
      [C.exp| void { delete $(CInstantCamera* camera) } |]


startGrabbing :: MonadIO m => Camera -> m ()
startGrabbing (Camera cameraPtr) = liftIO do
  [C.throwBlock| void {
    $(CInstantCamera* cameraPtr)->StartGrabbing(
      GrabStrategy_OneByOne,
      GrabLoop_ProvidedByInstantCamera
    );
  } |]


newtype GrabResultPtr = GrabResultPtr (Ptr CGrabResultPtr)


onImageGrabbed :: MonadUnliftIO m => Camera -> (Camera -> GrabResultPtr -> m ()) -> m ()
onImageGrabbed (Camera cameraPtr) callback = withRunInIO \run -> do
  callbackPtr <- liftIO do
    $( C.mkFunPtr [t| Ptr CInstantCamera -> Ptr CGrabResultPtr -> IO () |] ) \cameraPtr' cGrabResultPtrPtr ->
      run $ callback (Camera cameraPtr') (GrabResultPtr cGrabResultPtrPtr)

  [C.throwBlock| void {
      $(CInstantCamera* cameraPtr)->RegisterImageEventHandler(
        new SaveHandler( $(void (*callbackPtr)(CInstantCamera*, const CGrabResultPtr*)) ),
        RegistrationMode_Append,
        Cleanup_Delete
      );
    } |]


newtype PylonImage = PylonImage (Ptr CPylonImage)


cloneGrabResultImage :: MonadUnliftIO m => GrabResultPtr -> m PylonImage
cloneGrabResultImage (GrabResultPtr grabResultPtr) = liftIO $ PylonImage <$> liftIO do
  [C.throwBlock| CPylonImage* {
      CPylonImage src;
      src.AttachGrabResultBuffer(*$(CGrabResultPtr* grabResultPtr));

      CPylonImage* pylonImage = new CPylonImage();
      pylonImage->CopyImage(src);

      return pylonImage;
    } |]


saveTiff :: MonadIO m => PylonImage -> FilePath -> m ()
saveTiff (PylonImage pylonImage) destination = liftIO do
  withCString destination \destPtr ->
    [C.throwBlock| void {
      $(CPylonImage* pylonImage)->Save(ImageFileFormat_Tiff, $(char* destPtr));
    }|]

Haskell indentation rules

Running the test "function_declarations" with this broken corpus file

...
=================================================
Function Declarations With Let
=================================================

f = let y = x
x = 1
  in y

...

doesn't result in the expected indentation error.
Is correct indentation a goal for this grammar ? if so, could you point me to where this needs to be implemented ?

problem with multiline comments

Having the closing '-}' in a new line, the multiline comment is parsed correctly:
multiline-comment-ok

But if it is on the same line as some comment, it is parsed as ERROR
multiline-comment-error
multiline-comment-error2

Freezes on typing stuff at the end of a file

The whole editor would freeze (and spike CPU usage to 99%) if I type some stuff at the end of a file.

I haven't been able to identify the exact sequence of actions to reproduce the freeze, because it would almost always freeze when I try to type anything.

After some tinkering I think the problem comes from the scanner.

Instance with associated type, following TH top level splice, misparsed as function

I have done a run of executing tree-sitter-haskell against my employer Mercury's (closed source) codebase (350k LOC) and found a few parser bugs, which I have committed simplified testcases for on https://github.com/lf-/tree-sitter-haskell/tree/top-level-splices-oopsie. I am going to try to fix the ones that I think I can fix over the next few weeks.


I'm filing this incorrect parse as a bug because I am baffled by it and don't know how to fix it. My best guess is that for some reason the function parser gets greedy and ignores the semicolons?

The following test case misparses as a function node. Notably, if the instance is changed to have an associated value or such, it will parse correctly.

================================================================================
template haskell: top level splice without parens, but weird
================================================================================

someTemplateHaskell $(spliceOne)

instance SomeClass Something where
  type Assoc Something = ()

---

AST:

(haskell [0, 0] - [4, 0]
  (function [0, 0] - [3, 27]
    name: (variable [0, 0] - [0, 19])
    patterns: (patterns [0, 20] - [3, 22]
      (splice [0, 20] - [0, 32]
        (exp_parens [0, 21] - [0, 32]
          (exp_name [0, 22] - [0, 31]
            (variable [0, 22] - [0, 31]))))
      (pat_name [2, 0] - [2, 8]
        (variable [2, 0] - [2, 8]))
      (pat_name [2, 9] - [2, 18]
        (constructor [2, 9] - [2, 18]))
      (pat_name [2, 19] - [2, 28]
        (constructor [2, 19] - [2, 28]))
      (pat_name [2, 29] - [2, 34]
        (variable [2, 29] - [2, 34]))
      (pat_name [3, 2] - [3, 6]
        (variable [3, 2] - [3, 6]))
      (pat_name [3, 7] - [3, 12]
        (constructor [3, 7] - [3, 12]))
      (pat_name [3, 13] - [3, 22]
        (constructor [3, 13] - [3, 22])))
    rhs: (exp_literal [3, 25] - [3, 27]
      (con_unit [3, 25] - [3, 27]))))

WASM build fails with error

Using the wasm generated with npx tree-sitter build-wasm fails with the following error:

Aborted(Assertion failed: undefined symbol `_ZNSt3__24cerrE`. 
    perhaps a side module was not linked in? 
    if this global was expected to arrive from a system library, 
    try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)

node_module version error when testing grammar inside atom-beta...

getting this, when I do

  • npm run-script build inside tree-sitter-haskell
  • apm-beta rebuild inside language-haskell (the --dev linked atom-package that is using /tree-sitter-haskell as a subdir inside /grammars)

atom-beta

installing this in devDependencies electron-rebuild": "^1.7.3
and running ./node_modules/.bin/electron-rebuild --version 1.7.11 --force inside the tree-sitter-... module resolved this error for me

...so is using npm's 'electron-rebuild' the valid solution for this

Grammar does not support GHC 8.6 source plugins

If we find a {-# OPTIONS -fplugin=SomePlugin #-} pragma atop a source file, we should bail out (or take some other appropriate action), as GHC source plugins can run arbitrary transformations on source code. An example of source-plugin-modified code that we can’t parse (without pulling in the plugin architecture, which is of course out of scope for a plain parser) can be found here.

Nested multi-line comments not parsing

Currently the grammar does not support nested multi-line comments:

{-
  {-
  -}
-}

And also errors when parsing pragmas within comments:

{-
{-# INLINE a #-}
-}

The external scanner will need to handle comments in a similar way as the tree-sitter OCaml scanner.

Terrible performances on medium and big files

Good morning,

I like tree-sitter haskell very much but it seems it considerably slows when a file pass a certain number of characters. I don't actually know if the cause is the file's pattern complexity or anything but this is very penalizing...

Here's an example of slow file if you wan't to reproduce it :

module Evaluator
    ( evaluate,
      evaluateRepl,
      evaluateDefines,
      Value (..),
      Context,
    ) where

import Text.Read (readMaybe)
import Data.Maybe (fromMaybe)
import Control.Exception (throw)

import qualified Data.Map.Strict as Map

import Parser (Expression (..))
import Exception (HExceptions (EvaluationException))

type Context = Map.Map String Value

data Function = Defined [String] Expression | Builtin ([Value] -> Value) | Spe (Context -> [Expression] -> Value)
data Value = Function Function | Number Int | String String | List [Value] | Nil

instance Show Value where
    show (Function _) = "#<procedure>"
    show (Number n)   = show n
    show (String s)   = s
    show (List   l)   = Evaluator.showList l
    show Nil          = "()"

showList :: [Value] -> String
showList []         = "()"
showList [x, Nil]   = '(' : show x ++ ")"
showList (first:xs) = '(' : show first ++ showList' xs

showList' :: [Value] -> String
showList' [v, Nil] = (' ': show v) ++ ")"
showList' [v]      = (" . " ++ show v) ++ ")"
showList' (v:xs)   = (' ' : show v) ++ showList' xs
showList' []       = ")"

evaluateDefines :: [Expression] -> Context
evaluateDefines = evaluateDefines' baseContext

evaluateDefines' :: Context -> [Expression] -> Context
evaluateDefines' c []                                  = c
evaluateDefines' c (Seq (Atom "define" : define) : xs) = evaluateDefines' (fst $ evaluateDefine c define) xs
evaluateDefines' c (_                            : xs) = evaluateDefines' c xs

evaluate :: [Expression] -> [Value]
evaluate = evaluate' baseContext

evaluate' :: Context -> [Expression] -> [Value]
evaluate' _ []                                  = []
evaluate' c (Seq (Atom "define" : define) : xs) = evaluate' (fst $ evaluateDefine c define) xs
evaluate' c (expr:xs)                           = evaluateExpr c expr : evaluate' c xs

evaluateRepl :: Context -> [Expression] -> (Context, [Value])
evaluateRepl = evaluateRepl' []

evaluateRepl' :: [Value] -> Context -> [Expression] -> (Context, [Value])
evaluateRepl' v c []                                  = (c, reverse v)
evaluateRepl' v c (Seq (Atom "define" : define) : xs) = evaluateRepl'' v xs $ evaluateDefine c define
evaluateRepl' v c (expr:xs)                           = evaluateRepl' (evaluateExpr c expr : v) c xs

evaluateRepl'' :: [Value] -> [Expression] -> (Context, String) -> (Context, [Value])
evaluateRepl'' v (expr:xs) (c, name) = evaluateRepl' (evaluateExpr c expr : String name : v) c xs
evaluateRepl'' v []        (c, name) = (c, reverse $ String name : v)

evaluateDefine :: Context -> [Expression] -> (Context, String)
evaluateDefine c [Atom symbol, expr]              = (Map.insert symbol (evaluateExpr c expr) c, symbol)
evaluateDefine c [Seq (Atom symbol : args), func] = (Map.insert symbol (createFunction args func) c, symbol)
evaluateDefine _ _                                = throw $ EvaluationException "define : Invalid arguments"

createFunction :: [Expression] -> Expression -> Value
createFunction args func = Function $ Defined (map asAtom args) func

evaluateExpr :: Context -> Expression -> Value
evaluateExpr _ (Quoted expr) = evaluateQuoted expr
evaluateExpr c (Seq exprs)   = evaluateSeq c exprs
evaluateExpr c (Atom a)      = evaluateAtom c a

evaluateAtom :: Context -> String -> Value
evaluateAtom c s = Map.lookup s c
                ?: ((Number <$> readMaybe s)
                ?: throw (EvaluationException (show s ++ " is not a variable")))

evaluateSeq :: Context -> [Expression] -> Value
evaluateSeq _ []        = Nil
evaluateSeq c (expr:xs) = evaluateSeq' c (evaluateExpr c expr) xs

evaluateSeq' :: Context -> Value -> [Expression] -> Value
evaluateSeq' c (Function (Spe s)) exprs = s c exprs
evaluateSeq' c v exprs                  = evaluateSeq'' c $ v:map (evaluateExpr c) exprs

evaluateSeq'' :: Context -> [Value] -> Value
evaluateSeq'' c (Function f : xs) = invokeFunction c f xs
evaluateSeq'' _ []                = Nil
evaluateSeq'' _ _                 = throw $ EvaluationException "Sequence is not a procedure"

evaluateQuoted :: Expression -> Value
evaluateQuoted (Atom a)   = evaluateQuotedAtom a
evaluateQuoted (Seq  [])  = Nil
evaluateQuoted (Seq  q)   = List $ evaluateQuotedSeq q
evaluateQuoted (Quoted q) = evaluateQuoted q

evaluateQuotedAtom :: String -> Value
evaluateQuotedAtom s = (Number <$> readMaybe s) ?: String s

evaluateQuotedSeq :: [Expression] -> [Value]
evaluateQuotedSeq = foldr ((:) . evaluateQuoted) [Nil]

invokeFunction :: Context -> Function -> [Value] -> Value
invokeFunction _ (Builtin b)            args = b args
invokeFunction c (Defined symbols func) args = evaluateExpr (functionContext c symbols args) func
invokeFunction _ (Spe _)                _    = throw $ EvaluationException "The impossible has happened"

functionContext :: Context -> [String] -> [Value] -> Context
functionContext c (symbol:sxs) (value:vxs) = functionContext (Map.insert symbol value c) sxs vxs
functionContext c []           []          = c
functionContext _ _            _           = throw $ EvaluationException "Invalid number of arguments"

baseContext :: Context
baseContext = Map.fromList builtins

builtins :: [(String, Value)]
builtins = [("+",      Function $ Builtin add),
            ("-",      Function $ Builtin sub),
            ("*",      Function $ Builtin mult),
            ("div",    Function $ Builtin division),
            ("mod",    Function $ Builtin modulo),
            ("<",      Function $ Builtin inferior),
            ("eq?",    Function $ Builtin eq),
            ("atom?",  Function $ Builtin atom),
            ("cons",   Function $ Builtin cons),
            ("car",    Function $ Builtin car),
            ("cdr",    Function $ Builtin cdr),
            ("cond",   Function $ Spe cond),
            ("lambda", Function $ Spe lambda),
            ("let"   , Function $ Spe slet),
            ("quote" , Function $ Spe quote),
            ("#t" ,    String "#t"),
            ("#f" ,    String "#f")
           ]

add :: [Value] -> Value
add = Number . sum . map asNumber

sub :: [Value] -> Value
sub [Number n]       = Number $ -n
sub (Number n:xs)    = Number $ foldl (-) n $ map asNumber xs
sub _                = throw $ EvaluationException "- : Invalid arguments"

mult :: [Value] -> Value
mult = Number . product . map asNumber

division :: [Value] -> Value
division [Number lhs, Number rhs] = Number $ quot lhs rhs
division [_         , _]          = throw $ EvaluationException "div : Invalid arguments"
division _                        = throw $ EvaluationException "div : Invalid number of arguments"

modulo :: [Value] -> Value
modulo [Number lhs, Number rhs] = Number $ mod lhs rhs
modulo [_         , _]          = throw $ EvaluationException "mod : Invalid arguments"
modulo _                        = throw $ EvaluationException "mod : Invalid number of arguments"

inferior :: [Value] -> Value
inferior [Number lhs, Number rhs] = fromBool $ (<) lhs rhs
inferior [_         , _]          = throw $ EvaluationException "< : Invalid arguments"
inferior _                        = throw $ EvaluationException "< : Invalid number of arguments"

cons :: [Value] -> Value
cons [List l, Nil] = List l
cons [lhs, List l] = List $ lhs:l
cons [lhs, rhs]    = List [lhs, rhs]
cons _             = throw $ EvaluationException "cons : Invalid number of arguments"

car :: [Value] -> Value
car [List (f : _)] = f
car _              = throw $ EvaluationException "car : Invalid arguments"

cdr :: [Value] -> Value
cdr [List [_, v]]  = v
cdr [List (_ : l)] = List l
cdr _              = throw $ EvaluationException "cdr : Invalid arguments"

cond :: Context -> [Expression] -> Value
cond c (Seq [expr, ret] : xs) = cond' c (evaluateExpr c expr) ret xs
cond _ _                      = throw $ EvaluationException "cond : invalid branch"

cond' :: Context -> Value -> Expression -> [Expression] -> Value
cond' c (String "#f") _   xs = cond c xs
cond' c _             ret _  = evaluateExpr c ret

eq :: [Value] -> Value
eq [Number lhs, Number rhs] | lhs == rhs = fromBool True
eq [String lhs, String rhs] | lhs == rhs = fromBool True
eq [Nil       , Nil       ]              = fromBool True
eq [_         , _         ]              = fromBool False
eq _                                     = throw $ EvaluationException "eq? : Invalid number of arguments"

atom :: [Value] -> Value
atom []       = throw $ EvaluationException "atom? : no argument"
atom [List _] = fromBool False
atom _        = fromBool True

lambda :: Context -> [Expression] -> Value
lambda _ [args, func] = lambda' args func
lambda _ _            = throw $ EvaluationException "lambda : Invalid number of arguments"

lambda' :: Expression -> Expression -> Value
lambda' (Seq args) func = Function $ Defined (map asAtom args) func
lambda' _ _             = throw $ EvaluationException "lambda : Invalid arguments"

slet :: Context -> [Expression] -> Value
slet c [Seq defs, expr] = evaluateExpr (letContext c defs) expr
slet _ _                = throw $ EvaluationException "let : Invalid number of arguments"

letContext :: Context -> [Expression] -> Context
letContext c (Seq [Atom key, value] : xs) = letContext (Map.insert key (evaluateExpr c value) c) xs
letContext c []                           = c
letContext _ _                            = throw $ EvaluationException "let : Invalid variable declaration"

quote :: Context -> [Expression] -> Value
quote _ [expr] = evaluateQuoted expr
quote _ _      = throw $ EvaluationException "quote : Invalid arguments"

fromBool :: Bool -> Value
fromBool True  = String "#t"
fromBool False = String "#f"

asAtom :: Expression -> String
asAtom (Atom a) = a
asAtom _        = throw $ EvaluationException "Invalid atom"

asNumber :: Value -> Int
asNumber (Number n) = n
asNumber v          = throw $ EvaluationException $ show v ++ " is not a number"

(?:) :: Maybe a -> a -> a
(?:) = flip fromMaybe

Configuration:

  • Nixos
  • Nvim 0.6.0-dev (upstream)
  • Tree-sitter 0.19.3

Thank you for your help !

infixr and infixl not respected

Using infixr and infixl for a given operator should result in different parse-trees.

For example, parsing infixl 7 +>; a +> c +> d and infixr 7 +>; a +> c +> d results in the same parse tree, but really we would expect the +> operator to be right-associative in the second example.

For completeness, I've included the output of tree-sitter parse with those inputs below.

Now, I don't even know if it is possible to amend this issue while using tree-sitter, or if you want to, so feel free to close this issue at your convenience.

$ tree-sitter parse <(echo 'infixl 7 +>; a +> c +> d')
(haskell [0, 0] - [1, 0]
  (fixity [0, 0] - [0, 11]
    (integer [0, 7] - [0, 8])
    (varop [0, 9] - [0, 11]
      (operator [0, 9] - [0, 11])))
  (top_splice [0, 13] - [0, 24]
    (exp_infix [0, 13] - [0, 24]
      (exp_infix [0, 13] - [0, 19]
        (exp_name [0, 13] - [0, 14]
          (variable [0, 13] - [0, 14]))
        (operator [0, 15] - [0, 17])
        (exp_name [0, 18] - [0, 19]
          (variable [0, 18] - [0, 19])))
      (operator [0, 20] - [0, 22])
      (exp_name [0, 23] - [0, 24]
        (variable [0, 23] - [0, 24])))))

$ tree-sitter parse <(echo 'infixr 7 +>; a +> c +> d')
(haskell [0, 0] - [1, 0]
  (fixity [0, 0] - [0, 11]
    (integer [0, 7] - [0, 8])
    (varop [0, 9] - [0, 11]
      (operator [0, 9] - [0, 11])))
  (top_splice [0, 13] - [0, 24]
    (exp_infix [0, 13] - [0, 24]
      (exp_infix [0, 13] - [0, 19]
        (exp_name [0, 13] - [0, 14]
          (variable [0, 13] - [0, 14]))
        (operator [0, 15] - [0, 17])
        (exp_name [0, 18] - [0, 19]
          (variable [0, 18] - [0, 19])))
      (operator [0, 20] - [0, 22])
      (exp_name [0, 23] - [0, 24]
        (variable [0, 23] - [0, 24])))))

How do I generate a valid WASM file

We need c++14 here but there is no way to pass --std=c++14 to emcc used by tree-sitter build-wasm
The Makefile mentions a patched version of web-tree-sitter I've tried it but does not solve the issue;

When I use the generated WASM file (with npx tree-sitter build-wasm) I get:

bad export type for `_ZNSt3__25ctypeIcE2idE`: undefined

I think that is related to build-wasm not building with c++14 so how was it done here? (Makefile does not specify c++14)

Please help me steal your scanner

Greetings!

I'm working on a tree-sitter grammar for Agda, a language whose syntax is heavily influenced by Haskell. Like Haskell, Agda relies on spaces for indentation, which requires some scanner preprocessing magic. And that leads me to the magical scanner you've built for Haskell.

However, I've failed to transplant the scanner to Agda.
Here's what I've done:

  1. copy src/scanner.cc to the corresponding location at tree-sitter-agda and replace all occurrences of haskell to agda.
  2. add "src/scanner.cc" to binding.gyp at tree-sitter-agda.
  3. copy the externals from grammar.js.

I get the following error message when building the project.

> [email protected] build-scanner /Users/banacorn/node/tree-sitter-agda
> node-gyp build --debug

  CC(target) Debug/obj.target/tree_sitter_agda_binding/src/parser.o
../src/parser.c:26238:43: error: use of undeclared identifier 'sym__layout_semicolon'
  [ts_external_token__layout_semicolon] = sym__layout_semicolon,
                                          ^
../src/parser.c:26239:44: error: use of undeclared identifier 'sym__layout_open_brace'
  [ts_external_token__layout_open_brace] = sym__layout_open_brace,
                                           ^
../src/parser.c:26240:45: error: use of undeclared identifier 'sym__layout_close_brace'
  [ts_external_token__layout_close_brace] = sym__layout_close_brace,
                                            ^
../src/parser.c:26258:6: error: use of undeclared identifier 'sym__layout_semicolon'
    [sym__layout_semicolon] = ACTIONS(1),
     ^
../src/parser.c:26259:6: error: use of undeclared identifier 'sym__layout_open_brace'
    [sym__layout_open_brace] = ACTIONS(1),
     ^
../src/parser.c:26260:6: error: use of undeclared identifier 'sym__layout_close_brace'
    [sym__layout_close_brace] = ACTIONS(1),
     ^
6 errors generated.
make: *** [Debug/obj.target/tree_sitter_agda_binding/src/parser.o] Error 1

Could you please point out where I may have done wrong? Thanks!

Out-of-bound read in Scanner::scan

Fuzzing with AddressSanitizer triggers an oob-read:

==32493==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020002affee at pc 0x00000051a4dd bp 0x7ffee1c62390 sp 0x7ffee1c62388
READ of size 2 at 0x6020002affee thread T0
SCARINESS: 24 (2-byte-read-heap-buffer-overflow-far-from-bounds)
    @0 0x51a4dc in (anonymous namespace)::Scanner::scan(TSLexer*, bool const*) /src/octofuzz/tree-sitter/test/fixtures/grammars/haskell/src/scanner.cc:316:54
    @1 0x76d2d1 in parser__lex /src/octofuzz/tree-sitter/src/runtime/parser.c:257:11
    @2 0x767d97 in parser__get_lookahead /src/octofuzz/tree-sitter/src/runtime/parser.c:489:12
    @3 0x76519a in parser__advance /src/octofuzz/tree-sitter/src/runtime/parser.c:1081:21
    @4 0x7642c9 in parser_parse /src/octofuzz/tree-sitter/src/runtime/parser.c:1303:9
    @5 0x75ccbf in ts_document_parse_with_options /src/octofuzz/tree-sitter/src/runtime/document.c:137:16

which corresponds to:

if (indent_length_stack.size() > 0) {
if (indent_length < indent_length_stack.back()) {
while (indent_length < indent_length_stack.back()) {
if (indent_length_stack.size() > 0) {
indent_length_stack.pop_back();
}
queued_close_brace_count++;
}

indent_length_stack.size() > 0 is not guaranteed to be true after the first iteration of the while loop. When indent_length_stack.size() == 0 then indent_length_stack.back() on line 316 triggers an out-of-bound read.

Three example inputs that trigger the issue (hexdumped):

00000000  6d 0d 62 0a 6f                                    |m.b.o|
00000005

00000000  0d 4d 0a 3d                                       |.M.=|
00000004

00000000  6d 28 0a 71                                       |m(.q|
00000004

I think the fix looks like:

diff --git a/src/scanner.cc b/src/scanner.cc
index 7e6faa4..3ff7faa 100644
--- a/src/scanner.cc
+++ b/src/scanner.cc
@@ -313,10 +313,8 @@ struct Scanner {

       if (indent_length_stack.size() > 0) {
         if (indent_length < indent_length_stack.back()) {
-          while (indent_length < indent_length_stack.back()) {
-            if (indent_length_stack.size() > 0) {
-              indent_length_stack.pop_back();
-            }
+          while (indent_length_stack.size() > 0 && indent_length < indent_length_stack.back()) {
+            indent_length_stack.pop_back();
             queued_close_brace_count++;
           }

but I'm not 100% sure that gives the same behaviour 🤷‍♂️

none, left beef, closed

using a couple of easy-to-get-at measures find some popular repos to test the parser on.

Our current set is:
number of stars
number of logged in viewers last 30 days
number of commiters last 30 days

(these are defined in hive.entities.repos and thus are extremely easy for me to grab - i'll include sql below. Let me know if some others occur to you, whether they are in entities.repos or not! )

Misparse of explicit-braced code

I was reading some GHC sources when I found my highlighting was broken.

Here's a minimized example (which does compile with GHC):

module OhNo where
lookupIdInScope :: Monad m => m ()
lookupIdInScope
  = do { case () of
           () -> do { return () } }
  where
    blah = ()

AST:

module: module [0, 7] - [0, 11]
where [0, 12] - [0, 17]
ERROR [1, 0] - [6, 13]
  pat_typed [1, 0] - [2, 15]
    pattern: pat_name [1, 0] - [1, 15]
      variable [1, 0] - [1, 15]
    type: context [1, 19] - [2, 15]
      constraint [1, 19] - [1, 26]
        class: class_name [1, 19] - [1, 24]
          type [1, 19] - [1, 24]
        type_name [1, 25] - [1, 26]
          type_variable [1, 25] - [1, 26]
      type_apply [1, 30] - [2, 15]
        type_name [1, 30] - [1, 31]
          type_variable [1, 30] - [1, 31]
        type_literal [1, 32] - [1, 34]
          con_unit [1, 32] - [1, 34]
        type_name [2, 0] - [2, 15]
          type_variable [2, 0] - [2, 15]
  exp_literal [3, 14] - [3, 16]
    con_unit [3, 14] - [3, 16]
  alt [4, 11] - [4, 33]
    pat_literal [4, 11] - [4, 13]
      con_unit [4, 11] - [4, 13]
    exp_do [4, 17] - [4, 33]
      stmt [4, 22] - [4, 31]
        exp_apply [4, 22] - [4, 31]
          exp_name [4, 22] - [4, 28]
            variable [4, 22] - [4, 28]
          exp_literal [4, 29] - [4, 31]
            con_unit [4, 29] - [4, 31]
  ERROR [4, 34] - [6, 10]
  pat_literal [6, 11] - [6, 13]
    con_unit [6, 11] - [6, 13]

Update to latest tree-sitter version

The version mismatch is creating problems with the Rust binding, where Language from tree-sitter 0.19.4 is different from Language from tree-sitter 0.20.6. I'm unsure if this just requires an update to cargo.toml or if this would need to be an update for the whole repository. This may be a problem where cargo.toml simply hasn't been regenerated and tree-sitter actually is up to date. I'm not sure how versioning works exactly.

Published most recent version to npm

This package hasn't been published to npm in a while, and the most recently published version no longer compiles with the recent versions of tree-sitter.

Would be possible to publish a more recent version? What is currently blocking a new version release?

Is this effort going forward ?

From your perspective, is there a conceivable way out of the scanner-/layout issues ?
I am just referring to "Core Haskell" as even that would be a major step in Haskell editing / IDE-like support. There is hardly another incremental Haskell parser anywhere(?) to my knowledge, and it would be sad if this effort will not go forward anymore.
The added complexity of source plugins and some language extensions, I think, can be ignored for the moment.

Ranges of syntax-nodes starting are off by one line

In the following, the package 'tree-sitter-syntax-visualizer' highlights the selected span, which is computed by the language's tree-sitter grammar.
The span (hopefully the original one from the parser) is at the very bottom of each screenshot.

With tree-sitter-python, I get these (correct) spans in Python code:
python-1
python-2

With tree-sitter-haskell (current master, locally installed/referenced in the language-haskell package), I get these spans:

this looks good..
haskell-1

...but these look, if not completely wrong, but at least the starting line seems wrong
haskell-2
haskell-3
haskell-4

Comparing the Python scanner.cc to the one from Haskell, I have a wild guess, that maybe for the special token-type LAYOUT_SEMICOLON, there is a missing invocation of

advance(lexer, ...);

?

unmatched parentheses

I understand this is WIP - but I noticed missing closing parens cause problems. Is this something that needs work in the scanner? I'd be willing to look into it.

test = ((1)

test = 2
(module [0, 0] - [3, 0]
  (ERROR [0, 0] - [2, 8]
    (variable_identifier [0, 0] - [0, 4])
    (infix_operator_application [0, 8] - [2, 8]
      (function_application [0, 8] - [2, 4]
        (parenthesized_expression [0, 8] - [0, 11]
          (integer [0, 9] - [0, 10]))
        (variable_identifier [2, 0] - [2, 4]))
      (variable_operator [2, 5] - [2, 6]
        (variable_symbol [2, 5] - [2, 6]))
      (integer [2, 7] - [2, 8]))))

release for 0.15.x

There're some api changes on 0.15.x

code from 0.13.0 is working as expected, just want to follow the problems to update to last API version @maxbrunsfeld any blockers?

"undefined symbol: tree_sitter_haskell_external_scanner_create" when running "tree-sitter test"

Version: tree-sitter 0.20.7

Hi! I'm trying to re-use the grammar for learning with a haskell-like language, an I get this message when I run tree-sitter test:

❯ tree-sitter test          
Error opening dynamic library "/home/hecate/.cache/tree-sitter/lib/haskell.so"

Caused by:
    /home/hecate/.cache/tree-sitter/lib/haskell.so: undefined symbol: tree_sitter_haskell_external_scanner_create

It's fairly hard to look up on search engines. Any idea what could be the cause?

Crashing (possibly while editing markdown)

Context of the crash:

I was editing some markdown containing haskell, while also having pretty big haskell files open.

~ » coredumpctl debug
           PID: 436573 (nvim)
           UID: 1000 (jade)
           GID: 100 (users)
        Signal: 6 (ABRT)
     Timestamp: Wed 2022-05-18 12:35:22 PDT (2min 21s ago)
  Command Line: /run/current-system/sw/bin/nvim --cmd $'let g:loaded_node_provider=0' --cmd $'let g:loaded_python_provider=0' --cmd $'let g:python3_host_prog=\'/nix/store/mv12ajfnyndzdc1isj0kgmwdjm61n023-neovim-0.7.0/bin/nvim-python3\'' --cmd $'let g:ruby_host_prog=\'/nix/store/mv12ajfnyndzdc1isj0kgmwdjm61n023-neovim-0.7.0/bin/nvim-ruby\'' -S a.vim
    Executable: /nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/bin/nvim
 Control Group: /user.slice/user-1000.slice/[email protected]/app.slice/app-alacritty-ec1cd60ff21046dd880e5716a8652bdd.scope
          Unit: [email protected]
     User Unit: app-alacritty-ec1cd60ff21046dd880e5716a8652bdd.scope
         Slice: user-1000.slice
     Owner UID: 1000 (jade)
       Boot ID: 350d1f0cacfd430f83ff6cb345ec866e
    Machine ID: 4fc42215004f4b53bc919a5207a4b10e
      Hostname: chonkpad
       Storage: /var/lib/systemd/coredump/core.nvim.1000.350d1f0cacfd430f83ff6cb345ec866e.436573.1652902522000000.zst (present)
     Disk Size: 25.9M
       Message: Process 436573 (nvim) of user 1000 dumped core.

                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/typescript.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/ruby.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/yaml.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/markdown.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/bash.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/make.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/jsdoc.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/nix.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/comment.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so without build-id.
                Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/json.so without build-id.
                Module linux-vdso.so.1 with build-id 4d8f4ed93b54cb340698507a4a43e87763b45c66
                Module libstdc++.so.6 without build-id.
                Module libgcc_s.so.1 without build-id.
                Module ld-linux-x86-64.so.2 with build-id b9f66b930ff8f91e4f0c5a5166a2a646b8dd7392
                Module libpthread.so.0 with build-id 0fb27e00574442bff3b8e065ea25ee63a2a0a9a7
                Module libc.so.6 with build-id 3f866b74dd769cad8eb7a7cad6229ee4a6824184
                Module libluajit-5.1.so.2 without build-id.
                Module libutil.so.1 with build-id aa9275b88f13303064d81bd40899c4a86e5aa694
                Module libm.so.6 with build-id 995265d7140c8259c70e0e4ceef5651d8c37ab54
                Module libtree-sitter.so.0 without build-id.
                Module libunibilium.so.4 without build-id.
                Module libtermkey.so.1 without build-id.
                Module libvterm.so.0 without build-id.
                Module libmsgpackc.so.2 without build-id.
                Module librt.so.1 with build-id 51805a6bde589e18188284277aba28e598ed5020
                Module libdl.so.2 with build-id 6c0e4c7d7e709d6d0b6a41dd881875f8a3dafd80
                Module libuv.so.1 without build-id.
                Module libluv.so.1 without build-id.
                Module nvim without build-id.
                Stack trace of thread 436573:
                #0  0x00007f8a7c113adf __pthread_kill_implementation (libc.so.6 + 0x8badf)
                #1  0x00007f8a7c0c9062 raise (libc.so.6 + 0x41062)
                #2  0x00007f8a7c0b445c abort (libc.so.6 + 0x2c45c)
                #3  0x00007f8a7c0b4395 __assert_fail_base.cold.0 (libc.so.6 + 0x2c395)
                #4  0x00007f8a7c0c2082 __assert_fail (libc.so.6 + 0x3a082)
                #5  0x00007f8a6da1823f n/a (/home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so + 0x923f)
                ELF object binary architecture: AMD x86-64

Reading symbols from /nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/bin/nvim...
(No debugging symbols found in /nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/bin/nvim)

warning: Can't open file /run/nscd/dboLhMRf (deleted) during file-backed mapping note processing
[New LWP 436573]
[New LWP 436973]
[New LWP 436972]
[New LWP 436971]
[New LWP 436974]
[New LWP 436574]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libthread_db.so.1".
Core was generated by `/run/current-system/sw/bin/nvim --cmd let g:loaded_node_provider=0 --cmd let g:'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f8a7c113adf in __pthread_kill_implementation ()
   from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
[Current thread is 1 (Thread 0x7f8a7c064740 (LWP 436573))]
warning: File "/nix/store/69brclzxp7mg927k6986hrfzyd1hpqgd-gcc-11.2.0-lib/lib/libstdc++.so.6.0.29-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/nix/store/pv1vnwdlqscmyvv1yqgpdw3hbh0flnrh-gcc-11.3.0-lib".
To enable execution of this file add
	add-auto-load-safe-path /nix/store/69brclzxp7mg927k6986hrfzyd1hpqgd-gcc-11.2.0-lib/lib/libstdc++.so.6.0.29-gdb.py
line to your configuration file "/home/jade/.config/gdb/gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/jade/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) c
The program is not being run.
(gdb) bt
#0  0x00007f8a7c113adf in __pthread_kill_implementation ()
   from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#1  0x00007f8a7c0c9062 in raise () from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#2  0x00007f8a7c0b445c in abort () from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#3  0x00007f8a7c0b4395 in __assert_fail_base.cold.0 ()
   from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#4  0x00007f8a7c0c2082 in __assert_fail () from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#5  0x00007f8a6da1823f in tree_sitter_haskell_external_scanner_serialize ()
   from /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#6  0x00007f8a7c404f34 in ts_parser_parse ()
   from /nix/store/b0zvjj1gf6axy2sdqk1j2ddnf6zm55x6-tree-sitter-0.20.6/lib/libtree-sitter.so.0
#7  0x00000000005b11ce in parser_parse ()
#8  0x00007f8a7c292a36 in ?? ()
   from /nix/store/qcynznv8nr1kk0zrwjbw90pfik1yv0hs-luajit-2.1.0-2022-04-05-env/lib/libluajit-5.1.so.2
#9  0x00007f8a7c2ed334 in lua_pcall ()
   from /nix/store/qcynznv8nr1kk0zrwjbw90pfik1yv0hs-luajit-2.1.0-2022-04-05-env/lib/libluajit-5.1.so.2
#10 0x00000000005a0d48 in nlua_pcall.lto_priv ()
#11 0x00000000005af198 in nlua_call_ref ()
#12 0x0000000000736aaf in decor_provider_invoke.constprop ()
#13 0x00000000004b1d8a in decor_providers_invoke_buf ()
#14 0x0000000000678d4a in update_screen ()
#15 0x00000000004c43a2 in ins_compl_show_pum ()
#16 0x00000000004c4894 in ins_compl_new_leader.lto_priv ()
#17 0x00000000004c7a4e in insert_execute ()
#18 0x00000000006c84f0 in state_enter ()
#19 0x00000000004c3796 in edit ()
#20 0x00000000005fa64d in invoke_edit ()
#21 0x00000000005f210d in normal_execute.lto_priv ()
#22 0x00000000006c84f0 in state_enter ()
#23 0x00000000005edabb in normal_enter ()
#24 0x000000000044dcda in main ()
(gdb)

I can't provide the coredump as it contains confidential information, but let me know if there's anything I can get out of it that would be useful to debug.

I just ran :TSUpdate, so I should be on the latest version of the parser. nvim-treesitter was version 9069849, and my nvim is this:

:ver
NVIM v0.7.0
Build type: Release
LuaJIT 2.1.0-beta3
Compiled by nixbld

Features: +acl +iconv +tui
See ":help feature-compile"

   system vimrc file: "$VIM/sysinit.vim"
  fall-back for $VIM: "/nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/share/nvim"

Run :checkhealth for more info

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.