tree-sitter / tree-sitter-haskell Goto Github PK
View Code? Open in Web Editor NEWHaskell grammar for tree-sitter.
License: MIT License
Haskell grammar for tree-sitter.
License: MIT License
Today @maxbrunsfeld and I met to talk about the current state of the Haskell grammar. We were interested in trying to get master back to a happy state. However, after untangling some of the conflicts with the grammar we found the remaining conflicts were increasing in complexity, which prompted us to re-evaluate our approach to parsing Haskell.
The original goal for this grammar was to be able to parse 100% of semantic, which is an open source program analysis library we maintain and use at GitHub. Semantic uses several default language extensions as well as additional language extensions in specific modules. Supporting these language extensions is a great goal, but adds a considerable amount of complexity to the grammar in terms of the number of productions used and handling the conflicts they can generate.
Max and I agreed that the current state of the grammar is extremely complex, including the external scanner. As more and more language extension support was added during the initial development of the grammar, the number of conflicts increased, and has the unfortunate side-effect of making it very hard to reason about how to fix conflicts or make changes.
I proposed to start a fresh grammar targeting only Haskel 2010 with an external scanner for correctly parsing Haskell's whitespace sensitive layout rules. From there, Max and I think a layered strategy to support the most common language extensions would be appropriate (similar to the way the typescript parser works in conjunction with the javascript parser). We think this approach, with an emphasis on long-term maintainability, will make it easier for future iterations to make changes to the grammar, support new language extensions, and also make it easier for community contributions. Max and I also agree that it may not be possible for tree-sitter to fully support all of Haskell's language extensions as tree-sitter is a generic incremental parsing library, and Haskell is one of the more complex programming languages because of its language extension system. Time and experience will tell.
@maxbrunsfeld please feel free to include any other important points I may have left out. 🙇
This edge case currently causes the scanner to close the last where
clause prematurely, resulting in an error state:
runStateR m s = loop s m
where
loop s' (E u q) = case decompose u of
Left u' -> case decompose u' of
Right Reader -> k s' s'
where k s'' = q >>> loop s''
When a semicolon is deduced from indent in the scanner, the next node is skipped when I use tree-sitter for neovim highlighting.
Example:
import Foo.Bar
import Foo.Bar
In this case, only the first line is highlighted, and the second line doesn't even appear in the tree displayed by :TSPlaygroundToggle
. However, with tree-sitter parse
, it is correctly included in the tree.
On the other hand, with an explicit semicolon:
import Foo.Bar;
import Foo.Bar
the second line is highlighted as well.
The scanner consumes the newline as part of the implicit semicolon. Maybe that's the problem?
@maxbrunsfeld @patrickt any advice?
GHC 8.6 comes with an extension, “block arguments”, that entails a drastic change to the parser. You can find the change specification here, but the tl;dr is that you can now often omit parentheses and $
invocations associated with do
blocks:
local f do
thing1
thing2
rather than
local f $ do
thing1
thing2
This has drastic effects on the grammar—in the words of the proposal author:
Unless a special care is taken, an implementation will add a large number of shift-reduce conflicts to the parser, due to the reliance on the meta-rule mentioned above
This is a nontrivial change, but it’s one we should ultimately make.
For instance, if we parse examples/postgrest/test/Main.hs
:
haskell
;ERROR
.In this case, the problem seems to be absent module headers.
Hi guys,
First of all, thanks for the amazingly organized and commented code; couldn't find anything like this for a month!
So, I'm willing to use your external scanner as a base for my own, and I don't seem to get what's needed to compose a simple parser.
My parser is straightforward, just keeps consuming characters and advances the lexer until it encounters "{", ";", "}" or white space; so I thought the following would work, but no 😢 :
bool non_identifier_char(const uint32_t c) { return iswspace(c) || eq(';')(c) || eq('{') || eq('}') || eq('$'); };
const bool non_identifier_chars(State & state) { return non_identifier_char(state::next_char(state)); };
// If identifier symbol is active, fail if not an identifier char
Parser identifier = sym(Sym::identifier)(iff(cond::non_identifier_chars)(fail));
// Do nothing else, just check for identifiers
all = identifier;
Can anyone help? Thanks in advance!
This happens to me when using neovim with nvim-treesitter.
Steps to reproduce:
:TSPlaygroundToggle
, errors show up in the AST.Probably some uncovered case because it is using a lot of extensions/type-level features?
When I have multiple functions chained together using the $ operator, the outermost function doesn't get highlighted as a function.
Here is my code:
exactMatches :: Code -> Code -> Int
exactMatches actualPegs guessPegs =
length $ filter foundMatch $ zip actualPegs guessPegs where
foundMatch :: (Peg, Peg) -> Bool
foundMatch (actual, guess) = actual == guess
Here is what it looks like with the onedark.nvim colorscheme in neovim:
The outermost function, length
, is not highlighted the same color as the functions filter
and zip
.
So, I suspect that tree-sitter isn't parsing it as a function.
I expected length
to be highlighted the same color as other functions.
When I use parentheses instead, length
is highlighted correctly.
CPP directives like the following (taken from the lens
repository) are not supported:
#if MIN_VERSION_base(4,10,0)
----------------------------------------------------------------------------
-- CompactionFailed
----------------------------------------------------------------------------
-- | Compaction found an object that cannot be compacted.
-- Functions cannot be compacted, nor can mutable objects or pinned objects.
class AsCompactionFailed t where
-- | Information about why a compaction failed.
--
-- @
-- '_CompactionFailed' :: 'Prism'' 'CompactionFailed' ()
-- '_CompactionFailed' :: 'Prism'' 'SomeException' ()
-- @
_CompactionFailed :: Prism' t String
instance AsCompactionFailed CompactionFailed where
_CompactionFailed = _Wrapping CompactionFailed
{-# INLINE _CompactionFailed #-}
instance AsCompactionFailed SomeException where
_CompactionFailed = exception._Wrapping CompactionFailed
{-# INLINE _CompactionFailed #-}
pattern CompactionFailed_ e <- (preview _CompactionFailed -> Just e) where
CompactionFailed_ e = review _CompactionFailed e
#endif
Originally I considered parsing CPP directives and their contents as an extra
. This solution is sub-optimal because the resulting parse tree is lossy. Treating each possible CPP directive supported by Haskell as a production in the grammar would prevent the loss, but would add additional symbols and further increase generation and compile time.
Hi, we've imported the grammar into helix, now that #27 was addressed! A bug report was just opened (helix-editor/helix#117) regarding building the grammar on Windows, it seems that there's some errors with scanner.cc
:
languages\tree-sitter-haskell\src\scanner.cc(1555): error C3861: 'to_string': identifier not found
languages\tree-sitter-haskell\src\scanner.cc(1556): error C3861: 'to_string': identifier not found
languages\tree-sitter-haskell\src\scanner.cc(1557): error C3536: 'col': cannot be used before it is initialized
languages\tree-sitter-haskell\src\scanner.cc(1557): error C2676: binary '+': 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>' does not define this operator or a conversion to a type acceptable to the predefined operator
languages\tree-sitter-haskell\src\scanner.cc(1557): error C2672: 'operator __surrogate_func': no matching overloaded function found
exit code: 2
Steps to reproduce:
nvim
:e Hmm.hs
[i|
. The final |
will cause a segfaultGDB tells us:
#0 0x00007ffff7bb73e9 in malloc () from /nix/store/9bh3986bpragfjmr32gay8p95k91q4gy-glibc-2.33-47/lib/libc.so.6
#1 0x00007fffe8fef8d5 in operator new(unsigned long) () from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#2 0x00007fffe8fe6a87 in std::_Function_base::_Base_manager<state::mark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::$_1>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) () from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#3 0x00007fffe8feb8c4 in std::function<void (State&)>::function(std::function<void (State&)> const&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#4 0x00007fffe8fdfc71 in parser::effect(std::function<void (State&)>) () from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#5 0x00007fffe8fe0956 in parser::mark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#6 0x00007fffe8fea49a in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#7 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#8 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#9 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#10 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#11 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#12 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#13 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#14 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#15 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#16 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#17 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) () utf-8 haskell All 1:3
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#18 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#19 0x00007fffe8fea7a0 in std::_Function_handler<Result (State&), logic::$_42>::_M_invoke(std::_Any_data const&, State&) ()
from /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
For functions with a do
block, the comments following the function get included in the function, for example:
f = do a
-- | haddock
g = b
here the function pattern will include all of f
and the doc comment of g
. This isn't the case when there is no do
block:
f = a
-- | haddock
g = b
in this case it works as I expected, only matching f = a
.
I tested this out using the latest commit on the master branch, using the following tree sitter query:
(function rhs: (_) @function.inside) @function.around
(both captures end up including the doc comment)
I see that LambdaCase, ScopedTypeVariables and DataKinds are mentioned in the test directory. There are also references in other issues to template Haskell.
I think I've followed instructions correctly:
tree-sitter generate
using Tree Sitter 0.2TSInstall haskell
which succeeded without an errorNow, when I open a Haskell file, I get:
E5108: Error executing lua Failed to load parser: uv_dlopen: /home/ollie/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so: undefined symbol: tree_sitter_haskell_external_scanner_create
I'm not really sure what's up at this point. The generated .so
certainly has tree_sitter_haskell_external_scanner_create
when I lookup strings haskell.so
Hey there! It appears that as of 381dca0, this grammar fails to terminate on any Haskell file when called from the tree-sitter library for Rust (I used 0.19.3.) The behavior shows up as parsing failing to terminate. I haven't found any syntax in particular that triggers or avoids this; it happens on every file I tested (even a blank file).
I bisected, and 1b00b9f (the commit before 381dca0) was the last commit where parsing terminated.
Here's a SSCCE project that demonstrates the issue: https://git.bytes.zone/brian/tree-sitter-haskell-sscce. I used Nix to get a reproducible environment for this. Happy to help y'all get it set up locally if necessary!
In the wake of neovim merging support for TS 0.19, I am getting back to it.
First thing I noticed is that the line numbers are off – certain scanner results cause the line number to be incremented by two when parsing a newline.
I am fiddling around with how these get emitted in the scanner, so far it's still unclear to me what the exact reason is; the only thing I achieved so far is that they are counted even more times.
As far as I can tell, there are these factors:
/\n/
in extras
/\n/
in externals
@maxbrunsfeld Would greatly appreciate some advice!
I may be holding it wrong, but at least some unicode symbols are not supported as syntax:
e.g.:
processStateUpdater ∷
∀ a m.
(NOMInput a, UpdateMonad m) ⇒
Config →
u a →
StateT (ProcessState a) m ([NOMError], ByteString)
gives me
(haskell [0, 0] - [5, 52]
(top_splice [0, 0] - [5, 52]
(exp_infix [0, 0] - [5, 52]
(exp_apply [0, 0] - [1, 9]
(exp_name [0, 0] - [0, 19]
(variable [0, 0] - [0, 19]))
(ERROR [0, 20] - [1, 5]
(ERROR [0, 20] - [0, 23]))
(exp_name [1, 6] - [1, 7]
(variable [1, 6] - [1, 7]))
(exp_name [1, 8] - [1, 9]
(variable [1, 8] - [1, 9])))
(operator [1, 9] - [1, 10])
(exp_apply [2, 2] - [5, 52]
(exp_tuple [2, 2] - [2, 29]
(exp_apply [2, 3] - [2, 13]
(exp_name [2, 3] - [2, 11]
(constructor [2, 3] - [2, 11]))
(exp_name [2, 12] - [2, 13]
(variable [2, 12] - [2, 13])))
(comma [2, 13] - [2, 14])
(exp_apply [2, 15] - [2, 28]
(exp_name [2, 15] - [2, 26]
(constructor [2, 15] - [2, 26]))
(exp_name [2, 27] - [2, 28]
(variable [2, 27] - [2, 28]))))
(ERROR [2, 30] - [2, 33]
(ERROR [2, 30] - [2, 33]))
(exp_name [3, 2] - [3, 8]
(constructor [3, 2] - [3, 8]))
(ERROR [3, 9] - [3, 12]
(ERROR [3, 9] - [3, 12]))
(exp_name [4, 2] - [4, 3]
(variable [4, 2] - [4, 3]))
(ERROR [4, 4] - [4, 7]
(ERROR [4, 4] - [4, 7]))
(exp_name [5, 2] - [5, 8]
(constructor [5, 2] - [5, 8]))
(exp_parens [5, 9] - [5, 25]
(exp_apply [5, 10] - [5, 24]
(exp_name [5, 10] - [5, 22]
(constructor [5, 10] - [5, 22]))
(exp_name [5, 23] - [5, 24]
(variable [5, 23] - [5, 24]))))
(exp_name [5, 26] - [5, 27]
(variable [5, 26] - [5, 27]))
(exp_tuple [5, 28] - [5, 52]
(exp_list [5, 29] - [5, 39]
(exp_name [5, 30] - [5, 38]
(constructor [5, 30] - [5, 38])))
(comma [5, 39] - [5, 40])
(exp_name [5, 41] - [5, 51]
(constructor [5, 41] - [5, 51])))))))
foo x = x + 1
-- Hello world
bar y = y + 1
The position of the comment here covers all the preceding blank lines.
In the scanner I don't get this part.
Especially, what is lookahead == 0
?
I don't see the connection that should exist between newlines/indentation and the LAYOUT_SEMICOLON.
...
if (lexer->lookahead != '\n') return false;
advance(lexer);
bool next_token_is_comment = false;
uint32_t indent_length = 0;
for (;;) {
if (lexer->lookahead == '\n') {
indent_length = 0;
advance(lexer);
} else if (lexer->lookahead == ' ') {
indent_length++;
advance(lexer);
} else if (lexer->lookahead == '\t') {
indent_length += 8;
advance(lexer);
} else if (lexer->lookahead == 0) {
if (valid_symbols[LAYOUT_SEMICOLON]) {
lexer->result_symbol = LAYOUT_SEMICOLON;
return true;
}
if (valid_symbols[LAYOUT_CLOSE_BRACE]) {
lexer->result_symbol = LAYOUT_CLOSE_BRACE;
return true;
}
return false;
} else {
if (lexer->lookahead == '-' || lexer->lookahead == '{') {
advance(lexer);
next_token_is_comment = lexer->lookahead == '-';
}
break;
}
}
...
If I open tree-sitter-playground
on this file, I get a bunch of error nodes, though the file compiles just fine:
{-# language BlockArguments #-}
{-# language QuasiQuotes #-}
{-# language RecordWildCards #-}
{-# language TemplateHaskell #-}
module CircuitHub.PNP.Camera
( -- * Initialising Pylon
Pylon
, withPylon
-- * Opening cameras
, Camera
, maxNumBuffers
, withCamera
, startGrabbing
, onImageGrabbed
, PylonImage
, cloneGrabResultImage
, saveTiff
) where
-- StateVar
import Data.StateVar ( SettableStateVar, makeSettableStateVar )
-- base
import Foreign.C.Types ( CInt )
import Foreign.C ( withCString )
import Foreign.Ptr ( Ptr )
-- inline-c
import qualified Language.C.Inline as C
-- inline-c-cpp
import qualified Language.C.Inline.Cpp as C
import qualified Language.C.Inline.Cpp.Exceptions as C
-- pnp
import CircuitHub.PNP.Camera.Context ( CGrabResultPtr, CInstantCamera, CPylonImage, pylonCtx )
-- unliftio
import UnliftIO ( MonadIO, MonadUnliftIO, bracket, bracket_, liftIO, withRunInIO )
C.context (C.cppCtx <> C.funCtx <> pylonCtx)
C.include "pylon/PylonIncludes.h"
C.include "HardwareTriggerConfiguration.h"
C.include "SaveHandler.h"
C.include "RTSSignal.h"
C.using "namespace Pylon"
C.using "namespace CircuitHub"
data Pylon = Pylon
-- | Initialize the Pylon SDK. Pylon must be initialized before any other
-- functions can be called.
withPylon :: MonadUnliftIO m => (Pylon -> m a) -> m a
withPylon k = bracket_ create destroy $ k Pylon
where
create = liftIO do
[C.exp| void { PylonInitialize() } |]
destroy = liftIO do
[C.exp| void { PylonTerminate() } |]
-- | The Pylon SDK doesn't do a good job of dealing with interupted syscalls,
-- which can happen when GHC's RTS sends VALRM signals. This wrapped blocks
-- these signals, allowing foreign code to execute without interuption.
withBlockedSignals :: MonadUnliftIO m => m a -> m a
withBlockedSignals k = bracket block unblock \_ -> k
where
block = liftIO do
[C.exp| void* { new RtsSignalBlocker() } |]
unblock rtsSignalBlocker = liftIO do
[C.exp| void { delete (RtsSignalBlocker*)$(void* rtsSignalBlocker) } |]
-- | The maximum number of buffers available by the camera's grab loop.
maxNumBuffers :: Camera -> SettableStateVar CInt
maxNumBuffers (Camera cameraPtr) = makeSettableStateVar \n ->
[C.throwBlock| void { $(CInstantCamera* cameraPtr)->MaxNumBuffer = $(int n); } |]
newtype Camera = Camera (Ptr CInstantCamera)
-- | Open a Basler camera.
withCamera :: MonadUnliftIO m => Pylon -> (Camera -> m a) -> m a
withCamera _ = bracket create destroy
where
create = withBlockedSignals $ liftIO $ Camera <$>
[C.throwBlock| CInstantCamera* {
CInstantCamera* camera = new CInstantCamera(CTlFactory::GetInstance().CreateFirstDevice());
camera->RegisterConfiguration( new HardwareTriggerConfiguration, RegistrationMode_ReplaceAll, Cleanup_Delete );
return camera;
}|]
destroy (Camera camera) = liftIO do
[C.exp| void { delete $(CInstantCamera* camera) } |]
startGrabbing :: MonadIO m => Camera -> m ()
startGrabbing (Camera cameraPtr) = liftIO do
[C.throwBlock| void {
$(CInstantCamera* cameraPtr)->StartGrabbing(
GrabStrategy_OneByOne,
GrabLoop_ProvidedByInstantCamera
);
} |]
newtype GrabResultPtr = GrabResultPtr (Ptr CGrabResultPtr)
onImageGrabbed :: MonadUnliftIO m => Camera -> (Camera -> GrabResultPtr -> m ()) -> m ()
onImageGrabbed (Camera cameraPtr) callback = withRunInIO \run -> do
callbackPtr <- liftIO do
$( C.mkFunPtr [t| Ptr CInstantCamera -> Ptr CGrabResultPtr -> IO () |] ) \cameraPtr' cGrabResultPtrPtr ->
run $ callback (Camera cameraPtr') (GrabResultPtr cGrabResultPtrPtr)
[C.throwBlock| void {
$(CInstantCamera* cameraPtr)->RegisterImageEventHandler(
new SaveHandler( $(void (*callbackPtr)(CInstantCamera*, const CGrabResultPtr*)) ),
RegistrationMode_Append,
Cleanup_Delete
);
} |]
newtype PylonImage = PylonImage (Ptr CPylonImage)
cloneGrabResultImage :: MonadUnliftIO m => GrabResultPtr -> m PylonImage
cloneGrabResultImage (GrabResultPtr grabResultPtr) = liftIO $ PylonImage <$> liftIO do
[C.throwBlock| CPylonImage* {
CPylonImage src;
src.AttachGrabResultBuffer(*$(CGrabResultPtr* grabResultPtr));
CPylonImage* pylonImage = new CPylonImage();
pylonImage->CopyImage(src);
return pylonImage;
} |]
saveTiff :: MonadIO m => PylonImage -> FilePath -> m ()
saveTiff (PylonImage pylonImage) destination = liftIO do
withCString destination \destPtr ->
[C.throwBlock| void {
$(CPylonImage* pylonImage)->Save(ImageFileFormat_Tiff, $(char* destPtr));
}|]
Running the test "function_declarations" with this broken corpus file
...
=================================================
Function Declarations With Let
=================================================
f = let y = x
x = 1
in y
...
doesn't result in the expected indentation error.
Is correct indentation a goal for this grammar ? if so, could you point me to where this needs to be implemented ?
Firstly: It's fantastic to see this working with Haskell!
I notice that with treesitter playground things don't seem to be updating (and I assume that this indicates a deeper problem)
Here's a recording of me editing a C file and a Haskell file, demonstrating the difference in experience:
https://asciinema.org/a/J38cw04yWrtdxFYfjWPwZXAvd
I'm using neovim master (as of today) and treesitter 0.19.5
The whole editor would freeze (and spike CPU usage to 99%) if I type some stuff at the end of a file.
I haven't been able to identify the exact sequence of actions to reproduce the freeze, because it would almost always freeze when I try to type anything.
After some tinkering I think the problem comes from the scanner.
I have done a run of executing tree-sitter-haskell against my employer Mercury's (closed source) codebase (350k LOC) and found a few parser bugs, which I have committed simplified testcases for on https://github.com/lf-/tree-sitter-haskell/tree/top-level-splices-oopsie. I am going to try to fix the ones that I think I can fix over the next few weeks.
I'm filing this incorrect parse as a bug because I am baffled by it and don't know how to fix it. My best guess is that for some reason the function parser gets greedy and ignores the semicolons?
The following test case misparses as a function
node. Notably, if the instance
is changed to have an associated value or such, it will parse correctly.
================================================================================
template haskell: top level splice without parens, but weird
================================================================================
someTemplateHaskell $(spliceOne)
instance SomeClass Something where
type Assoc Something = ()
---
AST:
(haskell [0, 0] - [4, 0]
(function [0, 0] - [3, 27]
name: (variable [0, 0] - [0, 19])
patterns: (patterns [0, 20] - [3, 22]
(splice [0, 20] - [0, 32]
(exp_parens [0, 21] - [0, 32]
(exp_name [0, 22] - [0, 31]
(variable [0, 22] - [0, 31]))))
(pat_name [2, 0] - [2, 8]
(variable [2, 0] - [2, 8]))
(pat_name [2, 9] - [2, 18]
(constructor [2, 9] - [2, 18]))
(pat_name [2, 19] - [2, 28]
(constructor [2, 19] - [2, 28]))
(pat_name [2, 29] - [2, 34]
(variable [2, 29] - [2, 34]))
(pat_name [3, 2] - [3, 6]
(variable [3, 2] - [3, 6]))
(pat_name [3, 7] - [3, 12]
(constructor [3, 7] - [3, 12]))
(pat_name [3, 13] - [3, 22]
(constructor [3, 13] - [3, 22])))
rhs: (exp_literal [3, 25] - [3, 27]
(con_unit [3, 25] - [3, 27]))))
Using the wasm generated with npx tree-sitter build-wasm
fails with the following error:
Aborted(Assertion failed: undefined symbol `_ZNSt3__24cerrE`.
perhaps a side module was not linked in?
if this global was expected to arrive from a system library,
try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
getting this, when I do
npm run-script build
inside tree-sitter-haskellapm-beta rebuild
inside language-haskell (the --dev linked atom-package that is using /tree-sitter-haskell as a subdir inside /grammars)installing this in devDependencies electron-rebuild": "^1.7.3
and running ./node_modules/.bin/electron-rebuild --version 1.7.11 --force
inside the tree-sitter-... module resolved this error for me
...so is using npm's 'electron-rebuild' the valid solution for this
If we find a {-# OPTIONS -fplugin=SomePlugin #-}
pragma atop a source file, we should bail out (or take some other appropriate action), as GHC source plugins can run arbitrary transformations on source code. An example of source-plugin-modified code that we can’t parse (without pulling in the plugin architecture, which is of course out of scope for a plain parser) can be found here.
ranges in code with explicit braces + semicolons are correct, when selecting the syntax-element function_declaration
ranges in code with implicit (virtual) braces + (virtual) semicolons are not as expected, when selecting the syntax-element ....they should ideally be the same !
(here, the selection shows the range of the exact same node as above)
Currently the grammar does not support nested multi-line comments:
{-
{-
-}
-}
And also errors when parsing pragmas within comments:
{-
{-# INLINE a #-}
-}
The external scanner will need to handle comments in a similar way as the tree-sitter OCaml scanner.
Good morning,
I like tree-sitter haskell very much but it seems it considerably slows when a file pass a certain number of characters. I don't actually know if the cause is the file's pattern complexity or anything but this is very penalizing...
Here's an example of slow file if you wan't to reproduce it :
module Evaluator
( evaluate,
evaluateRepl,
evaluateDefines,
Value (..),
Context,
) where
import Text.Read (readMaybe)
import Data.Maybe (fromMaybe)
import Control.Exception (throw)
import qualified Data.Map.Strict as Map
import Parser (Expression (..))
import Exception (HExceptions (EvaluationException))
type Context = Map.Map String Value
data Function = Defined [String] Expression | Builtin ([Value] -> Value) | Spe (Context -> [Expression] -> Value)
data Value = Function Function | Number Int | String String | List [Value] | Nil
instance Show Value where
show (Function _) = "#<procedure>"
show (Number n) = show n
show (String s) = s
show (List l) = Evaluator.showList l
show Nil = "()"
showList :: [Value] -> String
showList [] = "()"
showList [x, Nil] = '(' : show x ++ ")"
showList (first:xs) = '(' : show first ++ showList' xs
showList' :: [Value] -> String
showList' [v, Nil] = (' ': show v) ++ ")"
showList' [v] = (" . " ++ show v) ++ ")"
showList' (v:xs) = (' ' : show v) ++ showList' xs
showList' [] = ")"
evaluateDefines :: [Expression] -> Context
evaluateDefines = evaluateDefines' baseContext
evaluateDefines' :: Context -> [Expression] -> Context
evaluateDefines' c [] = c
evaluateDefines' c (Seq (Atom "define" : define) : xs) = evaluateDefines' (fst $ evaluateDefine c define) xs
evaluateDefines' c (_ : xs) = evaluateDefines' c xs
evaluate :: [Expression] -> [Value]
evaluate = evaluate' baseContext
evaluate' :: Context -> [Expression] -> [Value]
evaluate' _ [] = []
evaluate' c (Seq (Atom "define" : define) : xs) = evaluate' (fst $ evaluateDefine c define) xs
evaluate' c (expr:xs) = evaluateExpr c expr : evaluate' c xs
evaluateRepl :: Context -> [Expression] -> (Context, [Value])
evaluateRepl = evaluateRepl' []
evaluateRepl' :: [Value] -> Context -> [Expression] -> (Context, [Value])
evaluateRepl' v c [] = (c, reverse v)
evaluateRepl' v c (Seq (Atom "define" : define) : xs) = evaluateRepl'' v xs $ evaluateDefine c define
evaluateRepl' v c (expr:xs) = evaluateRepl' (evaluateExpr c expr : v) c xs
evaluateRepl'' :: [Value] -> [Expression] -> (Context, String) -> (Context, [Value])
evaluateRepl'' v (expr:xs) (c, name) = evaluateRepl' (evaluateExpr c expr : String name : v) c xs
evaluateRepl'' v [] (c, name) = (c, reverse $ String name : v)
evaluateDefine :: Context -> [Expression] -> (Context, String)
evaluateDefine c [Atom symbol, expr] = (Map.insert symbol (evaluateExpr c expr) c, symbol)
evaluateDefine c [Seq (Atom symbol : args), func] = (Map.insert symbol (createFunction args func) c, symbol)
evaluateDefine _ _ = throw $ EvaluationException "define : Invalid arguments"
createFunction :: [Expression] -> Expression -> Value
createFunction args func = Function $ Defined (map asAtom args) func
evaluateExpr :: Context -> Expression -> Value
evaluateExpr _ (Quoted expr) = evaluateQuoted expr
evaluateExpr c (Seq exprs) = evaluateSeq c exprs
evaluateExpr c (Atom a) = evaluateAtom c a
evaluateAtom :: Context -> String -> Value
evaluateAtom c s = Map.lookup s c
?: ((Number <$> readMaybe s)
?: throw (EvaluationException (show s ++ " is not a variable")))
evaluateSeq :: Context -> [Expression] -> Value
evaluateSeq _ [] = Nil
evaluateSeq c (expr:xs) = evaluateSeq' c (evaluateExpr c expr) xs
evaluateSeq' :: Context -> Value -> [Expression] -> Value
evaluateSeq' c (Function (Spe s)) exprs = s c exprs
evaluateSeq' c v exprs = evaluateSeq'' c $ v:map (evaluateExpr c) exprs
evaluateSeq'' :: Context -> [Value] -> Value
evaluateSeq'' c (Function f : xs) = invokeFunction c f xs
evaluateSeq'' _ [] = Nil
evaluateSeq'' _ _ = throw $ EvaluationException "Sequence is not a procedure"
evaluateQuoted :: Expression -> Value
evaluateQuoted (Atom a) = evaluateQuotedAtom a
evaluateQuoted (Seq []) = Nil
evaluateQuoted (Seq q) = List $ evaluateQuotedSeq q
evaluateQuoted (Quoted q) = evaluateQuoted q
evaluateQuotedAtom :: String -> Value
evaluateQuotedAtom s = (Number <$> readMaybe s) ?: String s
evaluateQuotedSeq :: [Expression] -> [Value]
evaluateQuotedSeq = foldr ((:) . evaluateQuoted) [Nil]
invokeFunction :: Context -> Function -> [Value] -> Value
invokeFunction _ (Builtin b) args = b args
invokeFunction c (Defined symbols func) args = evaluateExpr (functionContext c symbols args) func
invokeFunction _ (Spe _) _ = throw $ EvaluationException "The impossible has happened"
functionContext :: Context -> [String] -> [Value] -> Context
functionContext c (symbol:sxs) (value:vxs) = functionContext (Map.insert symbol value c) sxs vxs
functionContext c [] [] = c
functionContext _ _ _ = throw $ EvaluationException "Invalid number of arguments"
baseContext :: Context
baseContext = Map.fromList builtins
builtins :: [(String, Value)]
builtins = [("+", Function $ Builtin add),
("-", Function $ Builtin sub),
("*", Function $ Builtin mult),
("div", Function $ Builtin division),
("mod", Function $ Builtin modulo),
("<", Function $ Builtin inferior),
("eq?", Function $ Builtin eq),
("atom?", Function $ Builtin atom),
("cons", Function $ Builtin cons),
("car", Function $ Builtin car),
("cdr", Function $ Builtin cdr),
("cond", Function $ Spe cond),
("lambda", Function $ Spe lambda),
("let" , Function $ Spe slet),
("quote" , Function $ Spe quote),
("#t" , String "#t"),
("#f" , String "#f")
]
add :: [Value] -> Value
add = Number . sum . map asNumber
sub :: [Value] -> Value
sub [Number n] = Number $ -n
sub (Number n:xs) = Number $ foldl (-) n $ map asNumber xs
sub _ = throw $ EvaluationException "- : Invalid arguments"
mult :: [Value] -> Value
mult = Number . product . map asNumber
division :: [Value] -> Value
division [Number lhs, Number rhs] = Number $ quot lhs rhs
division [_ , _] = throw $ EvaluationException "div : Invalid arguments"
division _ = throw $ EvaluationException "div : Invalid number of arguments"
modulo :: [Value] -> Value
modulo [Number lhs, Number rhs] = Number $ mod lhs rhs
modulo [_ , _] = throw $ EvaluationException "mod : Invalid arguments"
modulo _ = throw $ EvaluationException "mod : Invalid number of arguments"
inferior :: [Value] -> Value
inferior [Number lhs, Number rhs] = fromBool $ (<) lhs rhs
inferior [_ , _] = throw $ EvaluationException "< : Invalid arguments"
inferior _ = throw $ EvaluationException "< : Invalid number of arguments"
cons :: [Value] -> Value
cons [List l, Nil] = List l
cons [lhs, List l] = List $ lhs:l
cons [lhs, rhs] = List [lhs, rhs]
cons _ = throw $ EvaluationException "cons : Invalid number of arguments"
car :: [Value] -> Value
car [List (f : _)] = f
car _ = throw $ EvaluationException "car : Invalid arguments"
cdr :: [Value] -> Value
cdr [List [_, v]] = v
cdr [List (_ : l)] = List l
cdr _ = throw $ EvaluationException "cdr : Invalid arguments"
cond :: Context -> [Expression] -> Value
cond c (Seq [expr, ret] : xs) = cond' c (evaluateExpr c expr) ret xs
cond _ _ = throw $ EvaluationException "cond : invalid branch"
cond' :: Context -> Value -> Expression -> [Expression] -> Value
cond' c (String "#f") _ xs = cond c xs
cond' c _ ret _ = evaluateExpr c ret
eq :: [Value] -> Value
eq [Number lhs, Number rhs] | lhs == rhs = fromBool True
eq [String lhs, String rhs] | lhs == rhs = fromBool True
eq [Nil , Nil ] = fromBool True
eq [_ , _ ] = fromBool False
eq _ = throw $ EvaluationException "eq? : Invalid number of arguments"
atom :: [Value] -> Value
atom [] = throw $ EvaluationException "atom? : no argument"
atom [List _] = fromBool False
atom _ = fromBool True
lambda :: Context -> [Expression] -> Value
lambda _ [args, func] = lambda' args func
lambda _ _ = throw $ EvaluationException "lambda : Invalid number of arguments"
lambda' :: Expression -> Expression -> Value
lambda' (Seq args) func = Function $ Defined (map asAtom args) func
lambda' _ _ = throw $ EvaluationException "lambda : Invalid arguments"
slet :: Context -> [Expression] -> Value
slet c [Seq defs, expr] = evaluateExpr (letContext c defs) expr
slet _ _ = throw $ EvaluationException "let : Invalid number of arguments"
letContext :: Context -> [Expression] -> Context
letContext c (Seq [Atom key, value] : xs) = letContext (Map.insert key (evaluateExpr c value) c) xs
letContext c [] = c
letContext _ _ = throw $ EvaluationException "let : Invalid variable declaration"
quote :: Context -> [Expression] -> Value
quote _ [expr] = evaluateQuoted expr
quote _ _ = throw $ EvaluationException "quote : Invalid arguments"
fromBool :: Bool -> Value
fromBool True = String "#t"
fromBool False = String "#f"
asAtom :: Expression -> String
asAtom (Atom a) = a
asAtom _ = throw $ EvaluationException "Invalid atom"
asNumber :: Value -> Int
asNumber (Number n) = n
asNumber v = throw $ EvaluationException $ show v ++ " is not a number"
(?:) :: Maybe a -> a -> a
(?:) = flip fromMaybe
Configuration:
Thank you for your help !
Using infixr
and infixl
for a given operator should result in different parse-trees.
For example, parsing infixl 7 +>; a +> c +> d
and infixr 7 +>; a +> c +> d
results in the same parse tree, but really we would expect the +>
operator to be right-associative in the second example.
For completeness, I've included the output of tree-sitter parse
with those inputs below.
Now, I don't even know if it is possible to amend this issue while using tree-sitter, or if you want to, so feel free to close this issue at your convenience.
$ tree-sitter parse <(echo 'infixl 7 +>; a +> c +> d')
(haskell [0, 0] - [1, 0]
(fixity [0, 0] - [0, 11]
(integer [0, 7] - [0, 8])
(varop [0, 9] - [0, 11]
(operator [0, 9] - [0, 11])))
(top_splice [0, 13] - [0, 24]
(exp_infix [0, 13] - [0, 24]
(exp_infix [0, 13] - [0, 19]
(exp_name [0, 13] - [0, 14]
(variable [0, 13] - [0, 14]))
(operator [0, 15] - [0, 17])
(exp_name [0, 18] - [0, 19]
(variable [0, 18] - [0, 19])))
(operator [0, 20] - [0, 22])
(exp_name [0, 23] - [0, 24]
(variable [0, 23] - [0, 24])))))
$ tree-sitter parse <(echo 'infixr 7 +>; a +> c +> d')
(haskell [0, 0] - [1, 0]
(fixity [0, 0] - [0, 11]
(integer [0, 7] - [0, 8])
(varop [0, 9] - [0, 11]
(operator [0, 9] - [0, 11])))
(top_splice [0, 13] - [0, 24]
(exp_infix [0, 13] - [0, 24]
(exp_infix [0, 13] - [0, 19]
(exp_name [0, 13] - [0, 14]
(variable [0, 13] - [0, 14]))
(operator [0, 15] - [0, 17])
(exp_name [0, 18] - [0, 19]
(variable [0, 18] - [0, 19])))
(operator [0, 20] - [0, 22])
(exp_name [0, 23] - [0, 24]
(variable [0, 23] - [0, 24])))))
We need c++14 here but there is no way to pass --std=c++14
to emcc
used by tree-sitter build-wasm
The Makefile mentions a patched version of web-tree-sitter
I've tried it but does not solve the issue;
When I use the generated WASM file (with npx tree-sitter build-wasm
) I get:
bad export type for `_ZNSt3__25ctypeIcE2idE`: undefined
I think that is related to build-wasm
not building with c++14 so how was it done here? (Makefile does not specify c++14)
Greetings!
I'm working on a tree-sitter grammar for Agda, a language whose syntax is heavily influenced by Haskell. Like Haskell, Agda relies on spaces for indentation, which requires some scanner preprocessing magic. And that leads me to the magical scanner you've built for Haskell.
However, I've failed to transplant the scanner to Agda.
Here's what I've done:
src/scanner.cc
to the corresponding location at tree-sitter-agda
and replace all occurrences of haskell
to agda
."src/scanner.cc"
to binding.gyp
at tree-sitter-agda
.grammar.js
.I get the following error message when building the project.
> [email protected] build-scanner /Users/banacorn/node/tree-sitter-agda
> node-gyp build --debug
CC(target) Debug/obj.target/tree_sitter_agda_binding/src/parser.o
../src/parser.c:26238:43: error: use of undeclared identifier 'sym__layout_semicolon'
[ts_external_token__layout_semicolon] = sym__layout_semicolon,
^
../src/parser.c:26239:44: error: use of undeclared identifier 'sym__layout_open_brace'
[ts_external_token__layout_open_brace] = sym__layout_open_brace,
^
../src/parser.c:26240:45: error: use of undeclared identifier 'sym__layout_close_brace'
[ts_external_token__layout_close_brace] = sym__layout_close_brace,
^
../src/parser.c:26258:6: error: use of undeclared identifier 'sym__layout_semicolon'
[sym__layout_semicolon] = ACTIONS(1),
^
../src/parser.c:26259:6: error: use of undeclared identifier 'sym__layout_open_brace'
[sym__layout_open_brace] = ACTIONS(1),
^
../src/parser.c:26260:6: error: use of undeclared identifier 'sym__layout_close_brace'
[sym__layout_close_brace] = ACTIONS(1),
^
6 errors generated.
make: *** [Debug/obj.target/tree_sitter_agda_binding/src/parser.o] Error 1
Could you please point out where I may have done wrong? Thanks!
Fuzzing with AddressSanitizer triggers an oob-read:
==32493==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020002affee at pc 0x00000051a4dd bp 0x7ffee1c62390 sp 0x7ffee1c62388
READ of size 2 at 0x6020002affee thread T0
SCARINESS: 24 (2-byte-read-heap-buffer-overflow-far-from-bounds)
@0 0x51a4dc in (anonymous namespace)::Scanner::scan(TSLexer*, bool const*) /src/octofuzz/tree-sitter/test/fixtures/grammars/haskell/src/scanner.cc:316:54
@1 0x76d2d1 in parser__lex /src/octofuzz/tree-sitter/src/runtime/parser.c:257:11
@2 0x767d97 in parser__get_lookahead /src/octofuzz/tree-sitter/src/runtime/parser.c:489:12
@3 0x76519a in parser__advance /src/octofuzz/tree-sitter/src/runtime/parser.c:1081:21
@4 0x7642c9 in parser_parse /src/octofuzz/tree-sitter/src/runtime/parser.c:1303:9
@5 0x75ccbf in ts_document_parse_with_options /src/octofuzz/tree-sitter/src/runtime/document.c:137:16
which corresponds to:
tree-sitter-haskell/src/scanner.cc
Lines 314 to 321 in 2be101d
indent_length_stack.size() > 0
is not guaranteed to be true after the first iteration of the while
loop. When indent_length_stack.size() == 0
then indent_length_stack.back()
on line 316 triggers an out-of-bound read.
Three example inputs that trigger the issue (hexdump
ed):
00000000 6d 0d 62 0a 6f |m.b.o|
00000005
00000000 0d 4d 0a 3d |.M.=|
00000004
00000000 6d 28 0a 71 |m(.q|
00000004
I think the fix looks like:
diff --git a/src/scanner.cc b/src/scanner.cc
index 7e6faa4..3ff7faa 100644
--- a/src/scanner.cc
+++ b/src/scanner.cc
@@ -313,10 +313,8 @@ struct Scanner {
if (indent_length_stack.size() > 0) {
if (indent_length < indent_length_stack.back()) {
- while (indent_length < indent_length_stack.back()) {
- if (indent_length_stack.size() > 0) {
- indent_length_stack.pop_back();
- }
+ while (indent_length_stack.size() > 0 && indent_length < indent_length_stack.back()) {
+ indent_length_stack.pop_back();
queued_close_brace_count++;
}
but I'm not 100% sure that gives the same behaviour 🤷♂️
using a couple of easy-to-get-at measures find some popular repos to test the parser on.
Our current set is:
number of stars
number of logged in viewers last 30 days
number of commiters last 30 days
(these are defined in hive.entities.repos and thus are extremely easy for me to grab - i'll include sql below. Let me know if some others occur to you, whether they are in entities.repos or not! )
I was reading some GHC sources when I found my highlighting was broken.
Here's a minimized example (which does compile with GHC):
module OhNo where
lookupIdInScope :: Monad m => m ()
lookupIdInScope
= do { case () of
() -> do { return () } }
where
blah = ()
AST:
module: module [0, 7] - [0, 11]
where [0, 12] - [0, 17]
ERROR [1, 0] - [6, 13]
pat_typed [1, 0] - [2, 15]
pattern: pat_name [1, 0] - [1, 15]
variable [1, 0] - [1, 15]
type: context [1, 19] - [2, 15]
constraint [1, 19] - [1, 26]
class: class_name [1, 19] - [1, 24]
type [1, 19] - [1, 24]
type_name [1, 25] - [1, 26]
type_variable [1, 25] - [1, 26]
type_apply [1, 30] - [2, 15]
type_name [1, 30] - [1, 31]
type_variable [1, 30] - [1, 31]
type_literal [1, 32] - [1, 34]
con_unit [1, 32] - [1, 34]
type_name [2, 0] - [2, 15]
type_variable [2, 0] - [2, 15]
exp_literal [3, 14] - [3, 16]
con_unit [3, 14] - [3, 16]
alt [4, 11] - [4, 33]
pat_literal [4, 11] - [4, 13]
con_unit [4, 11] - [4, 13]
exp_do [4, 17] - [4, 33]
stmt [4, 22] - [4, 31]
exp_apply [4, 22] - [4, 31]
exp_name [4, 22] - [4, 28]
variable [4, 22] - [4, 28]
exp_literal [4, 29] - [4, 31]
con_unit [4, 29] - [4, 31]
ERROR [4, 34] - [6, 10]
pat_literal [6, 11] - [6, 13]
con_unit [6, 11] - [6, 13]
It should be written in haskell and compiled to an object file
The version mismatch is creating problems with the Rust binding, where Language
from tree-sitter 0.19.4
is different from Language
from tree-sitter 0.20.6
. I'm unsure if this just requires an update to cargo.toml
or if this would need to be an update for the whole repository. This may be a problem where cargo.toml
simply hasn't been regenerated and tree-sitter actually is up to date. I'm not sure how versioning works exactly.
This package hasn't been published to npm in a while, and the most recently published version no longer compiles with the recent versions of tree-sitter.
Would be possible to publish a more recent version? What is currently blocking a new version release?
From your perspective, is there a conceivable way out of the scanner-/layout issues ?
I am just referring to "Core Haskell" as even that would be a major step in Haskell editing / IDE-like support. There is hardly another incremental Haskell parser anywhere(?) to my knowledge, and it would be sad if this effort will not go forward anymore.
The added complexity of source plugins and some language extensions, I think, can be ignored for the moment.
In the following, the package 'tree-sitter-syntax-visualizer' highlights the selected span, which is computed by the language's tree-sitter grammar.
The span (hopefully the original one from the parser) is at the very bottom of each screenshot.
With tree-sitter-python, I get these (correct) spans in Python code:
With tree-sitter-haskell (current master, locally installed/referenced in the language-haskell package), I get these spans:
...but these look, if not completely wrong, but at least the starting line seems wrong
Comparing the Python scanner.cc to the one from Haskell, I have a wild guess, that maybe for the special token-type LAYOUT_SEMICOLON, there is a missing invocation of
advance(lexer, ...);
?
I understand this is WIP - but I noticed missing closing parens cause problems. Is this something that needs work in the scanner? I'd be willing to look into it.
test = ((1)
test = 2
(module [0, 0] - [3, 0]
(ERROR [0, 0] - [2, 8]
(variable_identifier [0, 0] - [0, 4])
(infix_operator_application [0, 8] - [2, 8]
(function_application [0, 8] - [2, 4]
(parenthesized_expression [0, 8] - [0, 11]
(integer [0, 9] - [0, 10]))
(variable_identifier [2, 0] - [2, 4]))
(variable_operator [2, 5] - [2, 6]
(variable_symbol [2, 5] - [2, 6]))
(integer [2, 7] - [2, 8]))))
There're some api changes on 0.15.x
code from 0.13.0 is working as expected, just want to follow the problems to update to last API version @maxbrunsfeld any blockers?
Version: tree-sitter 0.20.7
Hi! I'm trying to re-use the grammar for learning with a haskell-like language, an I get this message when I run tree-sitter test
:
❯ tree-sitter test
Error opening dynamic library "/home/hecate/.cache/tree-sitter/lib/haskell.so"
Caused by:
/home/hecate/.cache/tree-sitter/lib/haskell.so: undefined symbol: tree_sitter_haskell_external_scanner_create
It's fairly hard to look up on search engines. Any idea what could be the cause?
Context of the crash:
I was editing some markdown containing haskell, while also having pretty big haskell files open.
~ » coredumpctl debug
PID: 436573 (nvim)
UID: 1000 (jade)
GID: 100 (users)
Signal: 6 (ABRT)
Timestamp: Wed 2022-05-18 12:35:22 PDT (2min 21s ago)
Command Line: /run/current-system/sw/bin/nvim --cmd $'let g:loaded_node_provider=0' --cmd $'let g:loaded_python_provider=0' --cmd $'let g:python3_host_prog=\'/nix/store/mv12ajfnyndzdc1isj0kgmwdjm61n023-neovim-0.7.0/bin/nvim-python3\'' --cmd $'let g:ruby_host_prog=\'/nix/store/mv12ajfnyndzdc1isj0kgmwdjm61n023-neovim-0.7.0/bin/nvim-ruby\'' -S a.vim
Executable: /nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/bin/nvim
Control Group: /user.slice/user-1000.slice/[email protected]/app.slice/app-alacritty-ec1cd60ff21046dd880e5716a8652bdd.scope
Unit: [email protected]
User Unit: app-alacritty-ec1cd60ff21046dd880e5716a8652bdd.scope
Slice: user-1000.slice
Owner UID: 1000 (jade)
Boot ID: 350d1f0cacfd430f83ff6cb345ec866e
Machine ID: 4fc42215004f4b53bc919a5207a4b10e
Hostname: chonkpad
Storage: /var/lib/systemd/coredump/core.nvim.1000.350d1f0cacfd430f83ff6cb345ec866e.436573.1652902522000000.zst (present)
Disk Size: 25.9M
Message: Process 436573 (nvim) of user 1000 dumped core.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/typescript.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/ruby.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/yaml.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/markdown.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/bash.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/make.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/jsdoc.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/nix.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/comment.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so without build-id.
Module /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/json.so without build-id.
Module linux-vdso.so.1 with build-id 4d8f4ed93b54cb340698507a4a43e87763b45c66
Module libstdc++.so.6 without build-id.
Module libgcc_s.so.1 without build-id.
Module ld-linux-x86-64.so.2 with build-id b9f66b930ff8f91e4f0c5a5166a2a646b8dd7392
Module libpthread.so.0 with build-id 0fb27e00574442bff3b8e065ea25ee63a2a0a9a7
Module libc.so.6 with build-id 3f866b74dd769cad8eb7a7cad6229ee4a6824184
Module libluajit-5.1.so.2 without build-id.
Module libutil.so.1 with build-id aa9275b88f13303064d81bd40899c4a86e5aa694
Module libm.so.6 with build-id 995265d7140c8259c70e0e4ceef5651d8c37ab54
Module libtree-sitter.so.0 without build-id.
Module libunibilium.so.4 without build-id.
Module libtermkey.so.1 without build-id.
Module libvterm.so.0 without build-id.
Module libmsgpackc.so.2 without build-id.
Module librt.so.1 with build-id 51805a6bde589e18188284277aba28e598ed5020
Module libdl.so.2 with build-id 6c0e4c7d7e709d6d0b6a41dd881875f8a3dafd80
Module libuv.so.1 without build-id.
Module libluv.so.1 without build-id.
Module nvim without build-id.
Stack trace of thread 436573:
#0 0x00007f8a7c113adf __pthread_kill_implementation (libc.so.6 + 0x8badf)
#1 0x00007f8a7c0c9062 raise (libc.so.6 + 0x41062)
#2 0x00007f8a7c0b445c abort (libc.so.6 + 0x2c45c)
#3 0x00007f8a7c0b4395 __assert_fail_base.cold.0 (libc.so.6 + 0x2c395)
#4 0x00007f8a7c0c2082 __assert_fail (libc.so.6 + 0x3a082)
#5 0x00007f8a6da1823f n/a (/home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so + 0x923f)
ELF object binary architecture: AMD x86-64
Reading symbols from /nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/bin/nvim...
(No debugging symbols found in /nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/bin/nvim)
warning: Can't open file /run/nscd/dboLhMRf (deleted) during file-backed mapping note processing
[New LWP 436573]
[New LWP 436973]
[New LWP 436972]
[New LWP 436971]
[New LWP 436974]
[New LWP 436574]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libthread_db.so.1".
Core was generated by `/run/current-system/sw/bin/nvim --cmd let g:loaded_node_provider=0 --cmd let g:'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f8a7c113adf in __pthread_kill_implementation ()
from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
[Current thread is 1 (Thread 0x7f8a7c064740 (LWP 436573))]
warning: File "/nix/store/69brclzxp7mg927k6986hrfzyd1hpqgd-gcc-11.2.0-lib/lib/libstdc++.so.6.0.29-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/nix/store/pv1vnwdlqscmyvv1yqgpdw3hbh0flnrh-gcc-11.3.0-lib".
To enable execution of this file add
add-auto-load-safe-path /nix/store/69brclzxp7mg927k6986hrfzyd1hpqgd-gcc-11.2.0-lib/lib/libstdc++.so.6.0.29-gdb.py
line to your configuration file "/home/jade/.config/gdb/gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/home/jade/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
(gdb) c
The program is not being run.
(gdb) bt
#0 0x00007f8a7c113adf in __pthread_kill_implementation ()
from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#1 0x00007f8a7c0c9062 in raise () from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#2 0x00007f8a7c0b445c in abort () from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#3 0x00007f8a7c0b4395 in __assert_fail_base.cold.0 ()
from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#4 0x00007f8a7c0c2082 in __assert_fail () from /nix/store/ybkkrhdwdj227kr20vk8qnzqnmj7a06x-glibc-2.34-115/lib/libc.so.6
#5 0x00007f8a6da1823f in tree_sitter_haskell_external_scanner_serialize ()
from /home/jade/.local/share/nvim/site/pack/packer/start/nvim-treesitter/parser/haskell.so
#6 0x00007f8a7c404f34 in ts_parser_parse ()
from /nix/store/b0zvjj1gf6axy2sdqk1j2ddnf6zm55x6-tree-sitter-0.20.6/lib/libtree-sitter.so.0
#7 0x00000000005b11ce in parser_parse ()
#8 0x00007f8a7c292a36 in ?? ()
from /nix/store/qcynznv8nr1kk0zrwjbw90pfik1yv0hs-luajit-2.1.0-2022-04-05-env/lib/libluajit-5.1.so.2
#9 0x00007f8a7c2ed334 in lua_pcall ()
from /nix/store/qcynznv8nr1kk0zrwjbw90pfik1yv0hs-luajit-2.1.0-2022-04-05-env/lib/libluajit-5.1.so.2
#10 0x00000000005a0d48 in nlua_pcall.lto_priv ()
#11 0x00000000005af198 in nlua_call_ref ()
#12 0x0000000000736aaf in decor_provider_invoke.constprop ()
#13 0x00000000004b1d8a in decor_providers_invoke_buf ()
#14 0x0000000000678d4a in update_screen ()
#15 0x00000000004c43a2 in ins_compl_show_pum ()
#16 0x00000000004c4894 in ins_compl_new_leader.lto_priv ()
#17 0x00000000004c7a4e in insert_execute ()
#18 0x00000000006c84f0 in state_enter ()
#19 0x00000000004c3796 in edit ()
#20 0x00000000005fa64d in invoke_edit ()
#21 0x00000000005f210d in normal_execute.lto_priv ()
#22 0x00000000006c84f0 in state_enter ()
#23 0x00000000005edabb in normal_enter ()
#24 0x000000000044dcda in main ()
(gdb)
I can't provide the coredump as it contains confidential information, but let me know if there's anything I can get out of it that would be useful to debug.
I just ran :TSUpdate
, so I should be on the latest version of the parser. nvim-treesitter was version 9069849, and my nvim is this:
:ver
NVIM v0.7.0
Build type: Release
LuaJIT 2.1.0-beta3
Compiled by nixbld
Features: +acl +iconv +tui
See ":help feature-compile"
system vimrc file: "$VIM/sysinit.vim"
fall-back for $VIM: "/nix/store/02z6kwgc1ma1ra9ir2x1mnvm3qlz8s6l-neovim-unwrapped-0.7.0/share/nvim"
Run :checkhealth for more info
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.