tree-sitter / py-tree-sitter Goto Github PK
View Code? Open in Web Editor NEWPython bindings to the Tree-sitter parsing library
License: MIT License
Python bindings to the Tree-sitter parsing library
License: MIT License
While trying to invoke build_library
with typescript is resulting in missing src/parser.c error. Is there any documentation on which languages are supported with the python binding?
This file api.h contains the public tree-sitter API, right? So, shouldn't be that fully exposed into python?
Which leads me to another question, the way you've created the wrapper is really nice (it's minimal and user friendly), also dependency-free but... why didn't you consider using cffi, swig or similars? Was it because you didn't want to add any dependency or any other reason?
For example, i have this piece of code:
// this is a comment fn main() { /* this is another comment */ }
I traverse the AST to remove all of the comments, which expect to output this code:
fn main() { }
The question is that I have the edited version of AST, but not sure how to convert the edited AST back to text ? What is the function to call?
How can I solve this problem?
I am using tree_sitter==0.19.0
and I am not able to parse swift
, verilog
, and agda
code. I cloned the grammars and installed the newest bindings. Is there anything I can do?
Swift:
ValueError: Incompatible Language version 10. Must be between 13 and 13
Verilog:
ValueError: Incompatible Language version 12. Must be between 13 and 13
Agda:
ValueError: Incompatible Language version 11. Must be between 13 and 13
Can TreePropertyCursor be made accessible along with TreeCursor? It seems to have useful functionality.
If one can give me pointers, I can try to add the bindings.
I saw it at https://github.com/tree-sitter/rust-tree-sitter/blob/375e6b4b59961da4d62db1dda90c99263d1abfdc/src/lib.rs.
The usage I saw was at https://github.com/maxbrunsfeld/tree-tags/blob/master/src/crawler.rs.. Here, the propertyMatcher is used to find the scope_type, definition etc..
I would like to get the scope of any variable/identifier I am encountering in the AST.. From these examples, I got the impression that tree_sitter has some kind of book-keeping for it.
I am currently using the following method to walk the tree:
def traverse(tree):
def _traverse(node):
print_node(node)
for child in node.children:
_traverse(child)
_traverse(tree.root_node)
How could someone do the same using TreeCursor
?
I am struggling to get identifier node
. I've tried to read introduction page and using parser page, but still have no clue.
Using example in readme.
tree = parser.parse(
bytes(
"""
def foo():
if bar:
baz()
""",
"utf8",
)
)
I want to get function identifier node
name which is "foo".
I've tried several things but no clue:
ipdb> function_name_node.is_named
True
ipdb> function_name_node.__str__()
'<Node kind=identifier, start_point=(1, 4), end_point=(1, 7)>'
Thank you, sorry for beginner question.
Is there an approach to check node parent?
It would be great to expose this ts_tree_get_changed_ranges binding
for Python
/**
* Compare an old edited syntax tree to a new syntax tree representing the same
* document, returning an array of ranges whose syntactic structure has changed.
*
* For this to work correctly, the old syntax tree must have been edited such
* that its ranges match up to the new tree. Generally, you'll want to call
* this function right after calling one of the `ts_parser_parse` functions.
* You need to pass the old tree that was passed to parse, as well as the new
* tree that was returned from that function.
*
* The returned array is allocated using `malloc` and the caller is responsible
* for freeing it using `free`. The length of the array will be written to the
* given `length` pointer.
*/
TSRange *ts_tree_get_changed_ranges(
const TSTree *old_tree,
const TSTree *new_tree,
uint32_t *length
);
https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h
Happend on Windows7 x64
D:\>python -V
Python 3.8.5
D:\>python test_treesitter.py
parser.c
scanner.cc
D:\Repositories\eko\tree-sitter-python\src\scanner.cc(104): warning C4267: '=': conversion from 'size_t' to 'char', possible loss of data
D:\Repositories\eko\tree-sitter-python\src\scanner.cc(114): warning C4244: '=': conversion from 'unsigned short' to 'char', possible loss of data
D:\Repositories\eko\tree-sitter-python\src\scanner.cc(117): warning C4267: 'return': conversion from 'size_t' to 'unsigned int', possible loss of data
Creating library build/python.lib and object build/python.exp
Generating code
Finished generating code
Traceback (most recent call last):
File "test_treesitter.py", line 16, in <module>
parser.set_language(PY_LANGUAGE)
ValueError: Incompatible Language version 13. Must not be between 9 and 12
D:\>dir build
Directory of D:\build
05.03.2021 23:56 <DIR> .
05.03.2021 23:56 <DIR> ..
05.03.2021 23:56 195.072 python.dll
05.03.2021 23:56 675 python.exp
05.03.2021 23:56 1.740 python.lib
Content of test_treesitter.py
from tree_sitter import Language, Parser
Language.build_library(
# Store the library in the `build` directory
'build/python.dll',
# Include one or more languages
[
r'D:\Repositories\eko\tree-sitter-python'
]
)
PY_LANGUAGE = Language('build/python.dll', 'python')
parser = Parser()
parser.set_language(PY_LANGUAGE)
tree = parser.parse(bytes("""
def foo():
if bar:
baz()
""", "utf8")
)
print(tree)
When I try to run Language.build_library('languages.so', ['tree-sitter-python'])
on a machine using an Anaconda python distribution (which uses libstdc++) and my Ubuntu 16 system has both libc++ and libstdc++ installed, build_library
gives higher precedence to the existence of libc++, when libstdc++ is necessary for use with my Anaconda python. The result of this is that I get the error
/usr/bin/ld: cannot find -lc++
.
I recognize this is strange since I have both installed, but perhaps the distutils compiler only pays attention to Anaconda library paths?
I fixed it on my system by giving higher precedence to libstdc++ (and can submit a PR), but would this break py-tree-sitter for others? Is there another way to fix this logic so it works? Since libc++ is 'newer' than libstdc++, perhaps the precedence of the older one makes sense?
pip install tree_sitter
...snipped...
cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /Tctree_sitter/core/lib/src/lib.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/core/lib/src/lib.obj -std=c99
error: command 'cl.exe' failed: No such file or directory
I think it should be included in the docs that Visual Studio is a dependency for installing tree_sitter
{{
is used to write a literal "{" in an f-string, but it's not parsed correctly if it's at the end of the string
f"{my_var} {{"
module [0, 0] - [1, 0]
ERROR [0, 0] - [0, 14]
interpolation [0, 2] - [0, 10]
identifier [0, 3] - [0, 9]
Adding a space after the {{
causes it to parses correctly:
f"{my_var} {{ "
module [0, 0] - [1, 0]
expression_statement [0, 0] - [0, 15]
string [0, 0] - [0, 15]
interpolation [0, 2] - [0, 10]
identifier [0, 3] - [0, 9]
I'd like to come up with a proper highlighting algorithm to use with QScintilla and I need a little bit of help/advice. In QScintilla basically you apply styles to chunks of texts using byte positions, here you can find a perfect explanation of how it works.
Right now I've coded a little snippet to understand the basics of tree-sitter... the goal would be apply a monokai style using the existing python grammar, consider this:
import textwrap
from tree_sitter import Language, Parser
def print_node(node):
pos_point = f"[{node.start_point},{node.end_point}]"
pos_byte = f"({node.start_byte},{node.end_byte})"
print(
f"{repr(node.type):<25}{'is_named' if node.is_named else '-':<20}"
f"{pos_point:<30}{pos_byte}"
)
def traverse(tree):
def _traverse(node):
print_node(node)
for child in node.children:
_traverse(child)
_traverse(tree.root_node)
if __name__ == '__main__':
code = textwrap.dedent("""\
# こんにちは
def hello():
if world:
foo()"""
)
print(repr(code), len(code.encode("utf-8")))
print('-'*80)
for l in code.splitlines(True):
print(repr(l), len(l.encode("utf-8")))
print('-'*80)
grammar_name = "python"
parser = Parser()
parser.set_language(PY_LANGUAGE) # <-- IMPORTANT: Set PY_LANGUAGE to use your shared library
tree = parser.parse(bytes(code, "utf8"))
traverse(tree)
you should get this output:
'# こんにちは\ndef hello():\n if world:\n foo()' 58
--------------------------------------------------------------------------------
'# こんにちは\n' 18
'def hello():\n' 13
' if world:\n' 14
' foo()' 13
--------------------------------------------------------------------------------
'module' is_named [(0, 0),(3, 13)] (0,58)
'comment' is_named [(0, 0),(0, 17)] (0,17)
'function_definition' is_named [(1, 0),(3, 13)] (18,58)
'def' - [(1, 0),(1, 3)] (18,21)
'identifier' is_named [(1, 4),(1, 9)] (22,27)
'parameters' is_named [(1, 9),(1, 11)] (27,29)
'(' - [(1, 9),(1, 10)] (27,28)
')' - [(1, 10),(1, 11)] (28,29)
':' - [(1, 11),(1, 12)] (29,30)
'if_statement' is_named [(2, 4),(3, 13)] (35,58)
'if' - [(2, 4),(2, 6)] (35,37)
'identifier' is_named [(2, 7),(2, 12)] (38,43)
':' - [(2, 12),(2, 13)] (43,44)
'expression_statement' is_named [(3, 8),(3, 13)] (53,58)
'call' is_named [(3, 8),(3, 13)] (53,58)
'identifier' is_named [(3, 8),(3, 11)] (53,56)
'argument_list' is_named [(3, 11),(3, 13)] (56,58)
'(' - [(3, 11),(3, 12)] (56,57)
')' - [(3, 12),(3, 13)] (57,58)
In that tree I see some missing information, how should I tweak my algorithm to print all relevant information I need to make syntax highlighting?
In any case, could you please explain the overall algorithm to use in combination with tree-sitter? In the past when using another libraries I've always found making syntax highlighting with large texts wasn't a trivial task... hopefully with tree-sitter this will be much easier :)
Thanks.
Ps. This is basically a question but at the end of this thread maybe we'll get some useful explanations that you can use to add to the docs... also, once I've got a QScintilla example working I could make a PR to add it as example/test if you want.
Dear Community,
I found that there is a memory leak when using the query api.
Consider this small example, where i crawled the c++ repos with the most stars on github to data/raw/:
# %% show memory leak
from tree_sitter import Language, Parser
from glob import glob
from tqdm import tqdm
RELATIVE_PATH_TO_PARSER = "build/my-languages.so"
LANGUAGE = Language(RELATIVE_PATH_TO_PARSER, "cpp")
parser = Parser()
parser.set_language(LANGUAGE)
files = glob("data/raw/**/*.cpp", recursive=True)
files.extend(glob("data/raw/**/*.c", recursive=True))
files.extend(glob("data/raw/**/*.cc", recursive=True))
use_query = True
query_statement = "(if_statement(condition_clause)@if_statement)"
query = LANGUAGE.query(query_statement)
for file in tqdm(files):
code = open(file, "rb").read()
tree = parser.parse(code)
if use_query:
file_captures = query.captures(tree.root_node)
When using the "use_query"-switch, the memory usage goes well beyond 50GB when crawling through all of them, even though the "file_captures" variable is never actually used and also gets overwritten every iteration. The increase in memory is very steady, and not linked to the filesize of the file that is currently read.
When not using the "use_query"-switch, the memory usage stays much below a single GB.
Manually deleting the used looped-variables or running the garbage collector via gc.collect() every couple of iterations does not fix the bug.
I am not a good enough c++ developer myself to fix it. I hope it can be fixed by someone else. I am very happy to pride additional info!
Cheers
A memoryview object exposes the C level buffer interface as a Python object which can then be passed around like any other object. https://docs.python.org/3/c-api/memoryview.html
I have a pyarrow table of strings where each string contains the content of a source code file.
Pyarrow exposes each row as pyarrow.Buffer, which is compatible with the Python memoryview interface.
It would be nice if tree-sitter can be executed directly on the memoryview, instead of requiring users to perform a copy to a new bytes
object
Example:
parser = tree_sitter.Language(lib_path, 'python')
parser.parse(memoryview(tbl['content'][0].as_buffer())) # TypeError: First argument to parse must be bytes
parser.parse(tbl['content'][0].as_buffer().to_pybytes()) # Works, but performs unnecessary copy.
0.19.0 release was not tagged in git. Latest tag is still v0.2.2
Could you please add a tag?
I tried to use the query which has the query with matching regex. However those do not seem to be working. They work in playground, but not when running through Python.
I have following code.
i = 0
class ArgParser:
logger = logging.getLogger("ArgParser")
I am trying to use the query as below
((attribute
object: (identifier) @cls
attribute: (identifier) @clsvar
)
(match? @cls "^cls$"))
When running through python, it gives getLogger
as @clsvar
(Ignoring the match condition). However while using the playground, I am getting the correct result. (When I remove the condition of match in playground, I get similar results as in python.. So I thought python binding is not checking the condition)
Is there any wayout?
This is a awesome project.
I'm not familiar with this tool. Now all identifiers are replaced by the same word. I want to get an ast with identifier values, but I don't know how to do it.
Hi, I'm debugging the memory increase issue in our product(which use py-tree-sitter) and found there may exist a memory leak problem in point_new
:
static PyObject *point_new(TSPoint point) {
PyObject *row = PyLong_FromSize_t((size_t)point.row);
PyObject *column = PyLong_FromSize_t((size_t)point.column);
if (!row || !column) {
Py_XDECREF(row);
Py_XDECREF(column);
return NULL;
}
return PyTuple_Pack(2, row, column);
}
PyTuple_Pack
will add a reference on its arguments:
PyObject *
PyTuple_Pack(Py_ssize_t n, ...)
{
Py_ssize_t i;
PyObject *o;
PyObject **items;
va_list vargs;
if (n == 0) {
return tuple_get_empty();
}
va_start(vargs, n);
PyTupleObject *result = tuple_alloc(n);
if (result == NULL) {
va_end(vargs);
return NULL;
}
items = result->ob_item;
for (i = 0; i < n; i++) {
o = va_arg(vargs, PyObject *);
Py_INCREF(o);
items[i] = o;
}
va_end(vargs);
tuple_gc_track(result);
return (PyObject *)result;
}
So after I change point_new to following code, our product's memory usage is stable:
static PyObject *point_new(TSPoint point)
{
PyObject *row = PyLong_FromSize_t((size_t)point.row);
PyObject *column = PyLong_FromSize_t((size_t)point.column);
if (!row || !column)
{
Py_XDECREF(row);
Py_XDECREF(column);
return NULL;
}
PyObject *obj = PyTuple_Pack(2, row, column);
Py_XDECREF(row);
Py_XDECREF(column);
return obj;
}
I'm not very familiar with the python c extension, so not very sure if it's a real memory leak.
Just playing around on the playground (https://tree-sitter.github.io/tree-sitter/playground), the following python code says for is an identifier, and marks x as the error, instead of the other way around.
def func(x, list):
if for x in list:
print(x)
After installing tree sitter with and importing the library as explain in the README, the following command :
>>> from tree_sitter import Language, Parser
>>> Language.build_library(
... 'build/test_tree_sitter.so',
... ['path/to/tree-sitter-python'])
Got the following error:
/usr/bin/ld: cannot find -lc++abi
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
File "/usr/lib/python3.7/distutils/unixccompiler.py", line 215, in link
self.spawn(linker + ld_args)
File "/usr/lib/python3.7/distutils/ccompiler.py", line 910, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/usr/lib/python3.7/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/usr/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'cc' failed with exit status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/home/joe/.local/lib/python3.7/site-packages/tree_sitter/__init__.py", line 72, in build_library
compiler.link_shared_object(object_paths, output_path)
File "/usr/lib/python3.7/distutils/ccompiler.py", line 717, in link_shared_object
extra_preargs, extra_postargs, build_temp, target_lang)
File "/usr/lib/python3.7/distutils/unixccompiler.py", line 217, in link
raise LinkError(msg)
distutils.errors.LinkError: command 'cc' failed with exit status 1
I've installed py-tree-sitter, but something seems wrong; when trying to import Parser
, I seem to be told it has no definition (which, among other things, means that in VSCode, autocomplete doesn't work for any of its functions, and it is not recognized as a class).
Going to the definition of Language
shows that in the __init__.py
file, the from tree_sitter.binding
lines also do not work properly (no highlighting, and they have no definition).
The binding.c
file also does not seem to exist in the tree_sitter
folder in my virtual environment (instead there is a file that starts with binding.cpython
), and I believe this is probably the cause of this problem. I should note I am not getting any errors in execution (indeed, I am able to parse code use functions/access nodes), this is just the IDE not allowing me any information about these functions (making development more difficult, as I cannot see at a glance, for example, what types nodes have without visiting github and reading binding.c
).
How can I fix this?
python -m pip install --upgrade pip
Collecting pip
Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
100% |████████████████████████████████| 1.4MB 4.9MB/s
Installing collected packages: pip
Found existing installation: pip 18.1
Uninstalling pip-18.1:
Successfully uninstalled pip-18.1
Successfully installed pip-19.0.3
PS C:\Bitnami\wampstack-7.1.24-0\apache2\htdocs\autoteststuff> python -m pip install --upgrade pip
Requirement already up-to-date: pip in c:\users\laura\appdata\local\programs\python\python37\lib\site-packages (19.0.3)
PS C:\Bitnami\wampstack-7.1.24-0\apache2\htdocs\autoteststuff> pip3 install tree_sitter
Collecting tree_sitter
Using cached https://files.pythonhosted.org/packages/cf/c3/f1850242f8fb3676250fab00568310a2898d721c5f024a1e789e1de78ff7/tree_sitter-0.0.4.tar.gz
Installing collected packages: tree-sitter
Running setup.py install for tree-sitter ... error
Complete output from command c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-y3rtodf6\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-tjbljfeq\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\tree_sitter
copying tree_sitter\__init__.py -> build\lib.win-amd64-3.7\tree_sitter
running build_ext
building 'tree_sitter_binding' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\tree_sitter
creating build\temp.win-amd64-3.7\Release\tree_sitter\core
creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib
creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib\src
cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" /Tctree_sitter/core/lib/src/lib.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/core/lib/src/lib.obj -std=c99
error: command 'cl.exe' failed: No such file or directory
----------------------------------------
Command "c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-y3rtodf6\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-tjbljfeq\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Laura\AppData\Local\Temp\pip-install-y3rtodf6\tree-sitter\
PS C:\Bitnami\wampstack-7.1.24-0\apache2\htdocs\autoteststuff> pip3 install tree_sitter
Collecting tree_sitter
Using cached https://files.pythonhosted.org/packages/cf/c3/f1850242f8fb3676250fab00568310a2898d721c5f024a1e789e1de78ff7/tree_sitter-0.0.4.tar.gz
Installing collected packages: tree-sitter
Running setup.py install for tree-sitter ... error
Complete output from command c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-sdugcto3\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-morby4_e\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\tree_sitter
copying tree_sitter\__init__.py -> build\lib.win-amd64-3.7\tree_sitter
running build_ext
building 'tree_sitter_binding' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\tree_sitter
creating build\temp.win-amd64-3.7\Release\tree_sitter\core
creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib
creating build\temp.win-amd64-3.7\Release\tree_sitter\core\lib\src
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /Tctree_sitter/core/lib/src/lib.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/core/lib/src/lib.obj -std=c99
cl : Command line warning D9002 : ignoring unknown option '-std=c99'
lib.c
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(47): warning C4477: 'fprintf' : format string '%lu' requires an argument of type 'unsigned long', but variadic argument 1 has type 'size_t'
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(47): note: consider using '%zu' in the format string
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(56): warning C4477: 'fprintf' : format string '%lu' requires an argument of type 'unsigned long', but variadic argument 1 has type 'size_t'
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(56): note: consider using '%zu' in the format string
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(65): warning C4477: 'fprintf' : format string '%lu' requires an argument of type 'unsigned long', but variadic argument 1 has type 'size_t'
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./alloc.h(65): note: consider using '%zu' in the format string
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./array.h(107): warning C4267: 'function': conversion from 'size_t' to 'uint32_t', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./lexer.c(54): warning C4244: '=': conversion from 'utf8proc_ssize_t' to 'uint32_t', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./lexer.c(62): warning C4244: '=': conversion from 'utf8proc_ssize_t' to 'uint32_t', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./lexer.c(312): warning C4267: '=': conversion from 'size_t' to 'uint32_t', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(520): warning C4267: '=': conversion from 'size_t' to 'uint32_t', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(1582): warning C4996: 'fdopen': The POSIX name for this item is deprecated. Instead, use the ISO C and C++ conformant name: _fdopen. See online help for details.
C:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt\stdio.h(2457): note: see declaration of 'fdopen'
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(1652): warning C4267: 'function': conversion from 'size_t' to 'unsigned int', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./parser.c(1725): warning C4267: 'function': conversion from 'size_t' to 'unsigned int', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./subtree.c(186): warning C4244: 'initializing': conversion from 'TSSymbol' to 'uint8_t', possible loss of data
c:\users\laura\appdata\local\temp\pip-install-sdugcto3\tree-sitter\tree_sitter\core\lib\src\./subtree.c(236): warning C4244: '=': conversion from 'TSSymbol' to 'uint8_t', possible loss of data
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Itree_sitter/core/lib/include -Itree_sitter/core/lib/utf8proc -Ic:\users\laura\appdata\local\programs\python\python37\include -Ic:\users\laura\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /Tctree_sitter/binding.c /Fobuild\temp.win-amd64-3.7\Release\tree_sitter/binding.obj -std=c99
cl : Command line warning D9002 : ignoring unknown option '-std=c99'
binding.c
tree_sitter/binding.c(7): error C2059: syntax error: ';'
tree_sitter/binding.c(10): error C2059: syntax error: '}'
tree_sitter/binding.c(13): error C2059: syntax error: ';'
tree_sitter/binding.c(15): error C2059: syntax error: '}'
tree_sitter/binding.c(18): error C2059: syntax error: ';'
tree_sitter/binding.c(20): error C2059: syntax error: '}'
tree_sitter/binding.c(41): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(41): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(41): error C2059: syntax error: ')'
tree_sitter/binding.c(41): error C2054: expected '(' to follow 'self'
tree_sitter/binding.c(46): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(46): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(46): error C2059: syntax error: ')'
tree_sitter/binding.c(46): error C2054: expected '(' to follow 'self'
tree_sitter/binding.c(63): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(63): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(63): error C2371: 'PyObject': redefinition; different basic types
c:\users\laura\appdata\local\programs\python\python37\include\object.h(110): note: see declaration of 'PyObject'
tree_sitter/binding.c(63): error C2143: syntax error: missing ';' before '*'
tree_sitter/binding.c(63): error C2059: syntax error: ')'
tree_sitter/binding.c(63): error C2054: expected '(' to follow 'args'
tree_sitter/binding.c(70): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(70): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(70): error C2059: syntax error: 'type'
tree_sitter/binding.c(70): error C2059: syntax error: ')'
tree_sitter/binding.c(74): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(74): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(74): error C2059: syntax error: 'type'
tree_sitter/binding.c(74): error C2059: syntax error: ')'
tree_sitter/binding.c(78): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(78): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(78): error C2059: syntax error: 'type'
tree_sitter/binding.c(78): error C2059: syntax error: ')'
tree_sitter/binding.c(82): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(82): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(82): error C2059: syntax error: 'type'
tree_sitter/binding.c(82): error C2059: syntax error: ')'
tree_sitter/binding.c(86): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(86): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(86): error C2059: syntax error: 'type'
tree_sitter/binding.c(86): error C2059: syntax error: ')'
tree_sitter/binding.c(90): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(90): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(90): error C2059: syntax error: 'type'
tree_sitter/binding.c(90): error C2059: syntax error: ')'
tree_sitter/binding.c(94): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(94): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(94): error C2059: syntax error: 'type'
tree_sitter/binding.c(94): error C2059: syntax error: ')'
tree_sitter/binding.c(120): error C2065: 'node_sexp': undeclared identifier
tree_sitter/binding.c(120): warning C4312: 'type cast': conversion from 'int' to 'PyCFunction' of greater size
tree_sitter/binding.c(118): error C2099: initializer is not a constant
tree_sitter/binding.c(118): warning C4047: 'initializing': 'PyCFunction' differs in levels of indirection from 'int'
tree_sitter/binding.c(118): warning C4047: 'initializing': 'int' differs in levels of indirection from 'char [42]'
tree_sitter/binding.c(128): error C2065: 'node_get_type': undeclared identifier
tree_sitter/binding.c(128): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(129): error C2065: 'node_get_is_named': undeclared identifier
tree_sitter/binding.c(129): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(130): error C2065: 'node_get_start_byte': undeclared identifier
tree_sitter/binding.c(130): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(131): error C2065: 'node_get_end_byte': undeclared identifier
tree_sitter/binding.c(131): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(132): error C2065: 'node_get_start_point': undeclared identifier
tree_sitter/binding.c(132): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(133): error C2065: 'node_get_end_point': undeclared identifier
tree_sitter/binding.c(133): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(134): error C2065: 'node_get_children': undeclared identifier
tree_sitter/binding.c(134): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(128): error C2099: initializer is not a constant
tree_sitter/binding.c(128): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [16]'
tree_sitter/binding.c(129): error C2099: initializer is not a constant
tree_sitter/binding.c(129): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [21]'
tree_sitter/binding.c(130): error C2099: initializer is not a constant
tree_sitter/binding.c(130): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [22]'
tree_sitter/binding.c(131): error C2099: initializer is not a constant
tree_sitter/binding.c(131): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [20]'
tree_sitter/binding.c(132): error C2099: initializer is not a constant
tree_sitter/binding.c(132): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [23]'
tree_sitter/binding.c(133): error C2099: initializer is not a constant
tree_sitter/binding.c(133): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [21]'
tree_sitter/binding.c(134): error C2099: initializer is not a constant
tree_sitter/binding.c(134): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [20]'
tree_sitter/binding.c(142): error C2065: 'Node': undeclared identifier
tree_sitter/binding.c(145): error C2065: 'node_dealloc': undeclared identifier
tree_sitter/binding.c(145): warning C4312: 'type cast': conversion from 'int' to 'destructor' of greater size
tree_sitter/binding.c(146): error C2065: 'node_repr': undeclared identifier
tree_sitter/binding.c(146): warning C4312: 'type cast': conversion from 'int' to 'reprfunc' of greater size
tree_sitter/binding.c(138): error C2099: initializer is not a constant
tree_sitter/binding.c(138): warning C4047: 'initializing': 'setattrofunc' differs in levels of indirection from 'unsigned long'
tree_sitter/binding.c(138): warning C4133: 'initializing': incompatible types - from 'char [14]' to 'PyBufferProcs *'
tree_sitter/binding.c(138): warning C4047: 'initializing': 'getiterfunc' differs in levels of indirection from 'PyMethodDef *'
tree_sitter/binding.c(138): warning C4133: 'initializing': incompatible types - from 'PyGetSetDef *' to 'PyMethodDef *'
tree_sitter/binding.c(152): error C2065: 'Node': undeclared identifier
tree_sitter/binding.c(152): error C2297: '*': illegal, right operand has type 'int *'
tree_sitter/binding.c(152): error C2059: syntax error: ')'
tree_sitter/binding.c(154): error C2223: left of '->node' must point to struct/union
tree_sitter/binding.c(155): error C2223: left of '->children' must point to struct/union
tree_sitter/binding.c(162): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(162): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(162): error C2059: syntax error: ')'
tree_sitter/binding.c(162): error C2054: expected '(' to follow 'self'
tree_sitter/binding.c(167): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(167): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(167): error C2059: syntax error: 'type'
tree_sitter/binding.c(167): error C2059: syntax error: ')'
tree_sitter/binding.c(176): error C2065: 'tree_get_root_node': undeclared identifier
tree_sitter/binding.c(176): warning C4312: 'type cast': conversion from 'int' to 'getter' of greater size
tree_sitter/binding.c(176): error C2099: initializer is not a constant
tree_sitter/binding.c(176): warning C4047: 'initializing': 'setter' differs in levels of indirection from 'char [10]'
tree_sitter/binding.c(184): error C2065: 'Tree': undeclared identifier
tree_sitter/binding.c(187): error C2065: 'tree_dealloc': undeclared identifier
tree_sitter/binding.c(187): warning C4312: 'type cast': conversion from 'int' to 'destructor' of greater size
tree_sitter/binding.c(180): error C2099: initializer is not a constant
tree_sitter/binding.c(180): warning C4047: 'initializing': 'PyBufferProcs *' differs in levels of indirection from 'unsigned long'
tree_sitter/binding.c(180): warning C4047: 'initializing': 'unsigned long' differs in levels of indirection from 'char [14]'
tree_sitter/binding.c(180): warning C4047: 'initializing': 'iternextfunc' differs in levels of indirection from 'PyMethodDef *'
tree_sitter/binding.c(180): warning C4133: 'initializing': incompatible types - from 'PyGetSetDef *' to 'PyMemberDef *'
tree_sitter/binding.c(193): error C2065: 'Tree': undeclared identifier
tree_sitter/binding.c(193): error C2297: '*': illegal, right operand has type 'int *'
tree_sitter/binding.c(193): error C2059: syntax error: ')'
tree_sitter/binding.c(194): error C2223: left of '->tree' must point to struct/union
tree_sitter/binding.c(205): error C2065: 'Parser': undeclared identifier
tree_sitter/binding.c(205): error C2297: '*': illegal, right operand has type 'int *'
tree_sitter/binding.c(205): error C2059: syntax error: ')'
tree_sitter/binding.c(206): error C2223: left of '->parser' must point to struct/union
tree_sitter/binding.c(210): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(210): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(210): error C2059: syntax error: ')'
tree_sitter/binding.c(210): error C2054: expected '(' to follow 'self'
tree_sitter/binding.c(215): error C2143: syntax error: missing ')' before '*'
tree_sitter/binding.c(215): error C2143: syntax error: missing '{' before '*'
tree_sitter/binding.c(215): error C2371: 'PyObject': redefinition; different basic types
c:\users\laura\appdata\local\programs\python\python37\include\object.h(110): note: see declaration of 'PyObject'
tree_sitter/binding.c(215): fatal error C1003: error count exceeds 100; stopping compilation
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.15.26726\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
----------------------------------------
Command "c:\users\laura\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Laura\\AppData\\Local\\Temp\\pip-install-sdugcto3\\tree-sitter\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Laura\AppData\Local\Temp\pip-record-morby4_e\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Laura\AppData\Local\Temp\pip-install-sdugcto3\tree-sitter\
I'm not sure if tree_sitter can even be compiled on windows like this? :(
I just built a .so file for Python language (using py-tree-sitter) and now then I am trying to use it, it throws an error -
ValueError: Incompatible Language version 12. Must not be between 9 and 11
I cant install this on macOS with the following error message. I tried with several Python versions and with pip3 and with pipenv to see if there is a difference.
I also updated pip3, setuptools and wheel.
I did get it working on Ubuntu 20.04 with Python 3.8.
Error message on macOS.
➜ NO pipenv install tree-sitter --python 3.8
Virtualenv already exists!
Removing existing virtualenv...
Creating a virtualenv for this project...
Pipfile: /Users/zensored/Desktop/NO/Pipfile
Using /usr/bin/python3 (3.8.2) to create virtualenv...
⠧ Creating virtual environment...created virtual environment CPython3.8.2.final.0-64 in 396ms
creator CPython3macOsFramework(dest=/Users/zensored/.local/share/virtualenvs/NO--wY_wDPi, clear=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/Users/zensored/Library/Application Support/virtualenv)
added seed packages: pip==20.2.4, setuptools==50.3.2, wheel==0.35.1
activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
✔ Successfully created virtual environment!
Virtualenv location: /Users/zensored/.local/share/virtualenvs/NO--wY_wDPi
Installing tree-sitter...
Error: An error occurred while installing tree-sitter!
Error text: Collecting tree-sitter
Using cached tree_sitter-0.19.0.tar.gz (112 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Building wheels for collected packages: tree-sitter
Building wheel for tree-sitter (PEP 517): started
Building wheel for tree-sitter (PEP 517): finished with status 'error'
Failed to build tree-sitter
ERROR: Command errored out with exit status 1:
command: /Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/bin/python /Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /var/folders/ww/9g40xcr51854gq69rrk2smt80000gn/T/tmpzxxy5bw0
cwd: /private/var/folders/ww/9g40xcr51854gq69rrk2smt80000gn/T/pip-install-40n0w3yk/tree-sitter
Complete output (26 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.14.6-x86_64-3.8
creating build/lib.macosx-10.14.6-x86_64-3.8/tree_sitter
copying tree_sitter/__init__.py -> build/lib.macosx-10.14.6-x86_64-3.8/tree_sitter
warning: build_py: byte-compiling is disabled, skipping.
running build_ext
building 'tree_sitter.binding' extension
creating build/temp.macosx-10.14.6-x86_64-3.8
creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter
creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core
creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib
creating build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib/src
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64 -Itree_sitter/core/lib/include -Itree_sitter/core/lib/src -I/Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/include -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8 -c tree_sitter/core/lib/src/lib.c -o build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib/src/lib.o -std=c99 -Wno-unused-variable
clang: warning: using sysroot for 'iPhoneSimulator' but targeting 'MacOSX' [-Wincompatible-sysroot]
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -arch arm64 -arch x86_64 -Itree_sitter/core/lib/include -Itree_sitter/core/lib/src -I/Users/zensored/.local/share/virtualenvs/NO--wY_wDPi/include -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/include/python3.8 -c tree_sitter/binding.c -o build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/binding.o -std=c99 -Wno-unused-variable
clang: warning: using sysroot for 'iPhoneSimulator' but targeting 'MacOSX' [-Wincompatible-sysroot]
clang -bundle -undefined dynamic_lookup -Wl,-headerpad,0x1000 -arch arm64 -arch x86_64 build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/core/lib/src/lib.o build/temp.macosx-10.14.6-x86_64-3.8/tree_sitter/binding.o -o build/lib.macosx-10.14.6-x86_64-3.8/tree_sitter/binding.cpython-38-darwin.so
clang: warning: using sysroot for 'iPhoneSimulator' but targeting 'MacOSX' [-Wincompatible-sysroot]
ld: warning: -undefined dynamic_lookup is deprecated on iOS Simulator
ld: -platform_version passed unknown platform name 'macos-simulator'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'clang' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for tree-sitter
ERROR: Could not build wheels for tree-sitter which use PEP 517 and cannot be installed directly
✘ Installation Failed
macOS 11.1 (20C69)
XCode Version 12.4 (12D4e)
Python 3.8, 3.9, 3.10 (yes I tested them all)
pip 21.0.1 from /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip (python 3.10)
pipenv, version 2020.11.15
It is crashing on the line parser.set_language(PYTHON).
I tried to build, install using python 32 bit installation (3.7). It worked smoothly. However as soon as started using 64 bit Python, it failed. (I even copied 32 bit language library, but as expected load library failed)
During debugging, I found that it was crashing at ts_language_version (in binding.c). Further debugging revealed that the language_id is not getting passed correctly while passing Language using set_laguange. (I printed out the language_id while using both 32bit & 64 bit python.. For 32 bit it was printed like 6 or 7.. While for 64 bit it was some garbage value..) (I am not sure how the object is passed to bindings.. but I thought to print it and observed it).
I am not sure if that matters, but while building the library for Languages, I saw the compiler was used from "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.23.28105\bin\HostX86\x64\cl.exe". I see a folder named Hostx64 at ...\14.23.28105\bin too. However I could not figure it out how to force to use it.
Any pointers will be helpful.. Any additional setting I should do or Am I missing something?
(I do "python setup.py build" then "python setup.py install" and then "python setup.py test".. so that it uses the freshly built tree-sitter)
Thanks.
P.S. It worked perfectly on Ubuntu 18.0 64 bit
Can tree-sitter support obtaining the argument type of callee function? I can use query mechanism to obtain the method_invocation Node, while I have no idea how to know the argument type of this invocated function.
To confirm it, can we obtain the argument type from the AST.
Thanks.
When writing code of the following form I ran into segfaults with py-tree-sitter (I'll also open a PR that reproduces it):
def parse(contents: bytes) -> List[Node]:
tree = Parser.parse(contents)
return tree.root_node.children
First I tried finding some issue with the creation of the List of child nodes or the nodes itself. Then I realised that this is probably due to the tree going out of scope.
I believe we need to add a reference to the tree (Python object) for every node so that the tree doesn't get destroyed while any of the nodes in the tree are still alive.
I'm using python 3.7 in Windows. My python program will always crash when executing parser.set_language(XX_LANGUAGE) no matter what language I'm binding to the parser. This issue won't occur in Linux or macOS but always do in Windows.
My PC is running on Windows 10 and the bug is reproducible in both python and ipython shells, via powershell or anaconda prompt (cmd).
Before starting on creating a parser, I am trying to run the simplest possible examples with python bindings but there is a rather obscure issue I am dealing with. In the simplest context:
import os
from tree_sitter import Language, Parser
Language.build_library(os.path.join('server','tree-sitter','build','LANG.so'),
[os.path.join('server','tree-sitter')])
LANGUAGE = Language(os.path.join('server','tree-sitter','build','LANG.so'), 'LANG')
parser = Parser()
parser.set_language(LANGUAGE)
for the grammer.js file
module.exports = grammar({
name: 'LANG',
rules: {
// TODO: add the actual grammar rules
source_file: $ => 'hello'
}
});
Above code, when ran in any python shell, exits from that shell abruptly at the last line without giving any exception or error output. The build_library function returns True, and the parser can be created, so I am pretty sure that the setting part is the source of the problem.
I tried a different rules dictionary from a repo that I know it works, so the source is not that either. Sorry for not being able to provide more information, but I am also a bit perplexed as I have never seen such a thing in python before. Here's a pic if it tells you anything.
Thanks in advance!
Since there's no API to access the text right now, we do the following to obtain the text value of a node:
tree = parser.parse(source_code_bytes)
# ...
text = source_code_bytes[node.start_byte:node.end_byte]
However, this fails if the node text contains a multi-byte character.
from queue import Queue
from tree_sitter import Language, Parser
Language.build_library('build/my-languages.so', ['tree-sitter-javascript'])
JS_LANGUAGE = Language('build/my-languages.so', 'javascript')
parser = Parser()
parser.set_language(JS_LANGUAGE)
code = '''var t = '大';var k=1;'''
code_bytes = bytes(code, "utf8")
tree = parser.parse(code_bytes)
root_node = tree.root_node
queue = Queue()
queue.put(root_node)
while not queue.empty():
node = queue.get()
for child in node.children:
if child.type == 'string':
print(code[child.start_byte:child.end_byte])
queue.put(child)
'大';v
'大';
If you subtract 2 from child.end_byte
, it gives the expected result. It would be very much convenient to have a .text
method or similar to access the text as opposed to accessing it with string slicing.
Please let me know if you need more information.
Hi,
I have a usage question. The Node api supports get_child_by_field_id(id)
and get_child_by_field_name(name)
where you need to know the field id
or field name
. children()
gives a list of child nodes.
How do I get a list of child nodes along with their field names, like [{field_name:Node}]
, when I don't know which fields a child node can have. Or, is there a way to get a list of child field names that are possible from a parent node or a node type?
Thanks!
I followed your documentation and added a C parser to my project:
from tree_sitter import Language, Parser
Language.build_library(
# Store the library in the `build` directory
'build/c.so',
# Include one or more languages
[
'tree-sitter-c'
]
)
C_LANGUAGE = Language('build/c.so', 'c')
The parser works, but I have problems with querying, I never get a capture:
query = C_LANGUAGE.query("""
(function_definition)
""")
captures = query.captures(tree.root_node)
Example source file:
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/cpu.h>
#include <linux/sort.h>
static cpumask_var_t *alloc_node_to_cpumask(void)
{
cpumask_var_t *masks;
int node;
masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL);
if (!masks)
return NULL;
for (node = 0; node < nr_node_ids; node++)
{
if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL))
goto out_unwind;
}
return masks;
out_unwind:
while (--node >= 0)
free_cpumask_var(masks[node]);
kfree(masks);
return NULL;
}
I tried many different queries and never get a result. Are C queries not supported yet or do I have to fix my queries somehow?
@maxbrunsfeld Hi Max, could you please check this little test?
I'm using latest master of tree-sitter
and py-tree-sitter
, both were built ok. And now I try
to use a simple grammar
(py364_32) d:\mcve>mkdir tree-sitter-hello
(py364_32) d:\mcve>cd tree-sitter-hello
(py364_32) d:\mcve\tree-sitter-hello>clipout > grammar.js
(py364_32) d:\mcve\tree-sitter-hello>cat grammar.js
module.exports = grammar({
name: 'the_language_name',
rules: {
// The production rules of the context-free grammar
source_file: $ => 'hello'
}
});
(py364_32) d:\mcve\tree-sitter-hello>tree-sitter generate
(py364_32) d:\mcve\tree-sitter-hello>cd ..
(py364_32) d:\mcve>clipout > test.py
(py364_32) d:\mcve>cat test.py
from tree_sitter import Language, Parser
def build():
Language.build_library(
'build/parser.pyd',
[
"tree-sitter-hello",
]
)
def main():
LANGUAGE = Language('build/parser.pyd', 'the_language_name')
parser = Parser()
parser.set_language(LANGUAGE)
tree = parser.parse(bytes("""hello""", "utf8"))
if __name__ == '__main__':
build()
main()
(py364_32) d:\mcve>python test.py
parser.c
Creating library build/parser.lib and object build/parser.exp
Generating code
Finished generating code
Traceback (most recent call last):
File "test.py", line 21, in <module>
main()
File "test.py", line 17, in main
tree = parser.parse(bytes("""hello""", "utf8"))
ValueError: Parsing failed
What's the meaning of that "ValueError: Parsing failed"? What am I doing wrong? Do you see anything suspicious here?
Thanks in advance.
I tried to use the python parser like in the readme example, but I cannot load the built language object file...
from tree_sitter import Language, Parser
Language.build_library('lang.so', ['/home/robo/.cache/wsyntree/python/tsrepo/'])
pylang = Language('./lang.so', 'python')
Traceback (most recent call last):
File "<input>", line 1, in <module>
pylang = Language('./lang.so', 'python')
File "/home/robo/.local/lib/python3.8/site-packages/tree_sitter/__init__.py", line 81, in __init__
self.lib = cdll.LoadLibrary(library_path)
File "/usr/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: ./lang.so: undefined symbol: _ZSt20__throw_length_errorPKc
Since it compiled successfully, I assume it's a problem with py-tree-sitter and not tree-sitter-python (commit c4282ba
and 4cca050
tested), but I might be wrong since tree-sitter-javascript worked just fine.
I tried both in and outside a virtualenv, on Pop!_OS 20.10, the same example worked without issue on Ubuntu 20.04
Using cc/c++ 10.2.0 on 20.10 and 8.4.0 on 20.04, python 3.8 on both
Using python3.6 (in a clean conda env) + tree-sitter==0.0.7 in WSL, running below
from tree_sitter import Language, Parser
Language.build_library("build/my-languages.so", ["vendor/tree-sitter-python"])
PY_LANGUAGE = Language("build/my-languages.so", "python")
parser = Parser()
parser.set_language(PY_LANGUAGE)
tree = parser.parse(
bytes(
"""
from mymodule import f
a = f()
""",
"utf8",
)
)
cursor = tree.walk()
print(cursor.node.sexp())
print(dir(cursor))
print(cursor.node.type)
print(cursor.goto_first_child())
print(cursor.node.type)
print(cursor.goto_first_child())
print(cursor.node.type)
print(dir(cursor.node)) # X
# print(cursor.node.is_named) # Y
print(cursor.node.sexp())
print(cursor.node)
# Returns `False` because the `def` node has no children
print(not cursor.goto_first_child())
print(cursor.goto_next_sibling())
print(cursor.node.type)
print(cursor.goto_next_sibling())
print(cursor.node.type)
gives
...
Traceback (most recent call last):
File "bug.py", line 34, in <module>
print(cursor.node.sexp())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 1: invalid start byte
Can you also see this behaviour?
If you comment line ... # X
or uncomment line ... # Y
, there's no error. I know nothing about tree-sitter, and was just messing around (trying to see if tree-sitter follows imports, which it doesn't seem to) ... but I figure you'd want to know. My suspicion is that it's something related to the C binding ... but it's a pretty weird bug (including e.g. if you delete the remainder of the code after where it errors, it stops erroring ... which really baffles me unless this isn't synchronous?)
FYI, error is from PyUnicode_FromString from here - whatever *string
is isn't utf8
.
Hello,
I'm using tree-sitter 0.2.0
[
% /usr/bin/pip3 freeze
tree-sitter==0.2.0
]
See this sample code (rename to parent_test.py to run):
parent_test.py.txt
I am just DFS walking the tree from the root.
Here's a small snippet of the output (beginning) it generates:
def main():
a = 2
Visiting node: id:4530382256, node.parent_id: None, actual_parent_id:None, b'def main():\n\t\ta = 2\n\t\t'
Node has 1 children..
Traversing child: 0
Visiting node: id:4530382384, node.parent_id: 4530382512, actual_parent_id:4530382256, b'def main():\n\t\ta = 2'
Node has 5 children..
Traversing child: 0
Visiting node: id:4530382512, node.parent_id: 4530382832, actual_parent_id:4530382384, b'def'
Node has 0 children..
Finished visiting node: id:4530382512, node.parent_id: 4530382832, actual_parent_id:4530382384, b'def'
Traversing child: 1`
Look at the 2nd node visited:
Visiting node: id:4530382384, node.parent_id: 4530382512, actual_parent_id:4530382256, b'def main():\n\t\ta = 2'
The node id when using node.parent is id: 4530382512
But the actual parent of this node in this traversal was id: 4530382256
So in summary, node.parent looks incorrect.
Is this a known issue? or is node.parent supposed to mean something different than "the parent of this node in the traversal"?
Hi guys, I've been using the library extensively and I think I've found a bug:
The following expression yields a boolean: (1*1) + num > 5
,
it should be interpreted as ((1*1) + num) > 5
as shown here:
Whereas the library interprets the operation as (1*1) + (num > 5)
is this a matter of flipping the operator precedence here: https://github.com/tree-sitter/tree-sitter-java/blob/master/grammar.js#L13 ?
Currently, there is some extra work on the user's side (compared to node-tree-sitter or the wasm bindings) involved in accessing text that corresponds to a node as mentioned elsewhere:
I looked a bit into how node-tree-sitter provides .text
: #16 (comment)
As a proof of concept, I tried something similar for py-tree-sitter (though this makes no attempt to track changes) by roughly:
Tree
to store source codetree_get_text
to pull out the stored source codenode_get_text
which uses tree_get_text
to access the retrieved source, using start_byte and end_byte to construct a slice of the source, and finally decode the resultThis seems to work in limited testing. Does this approach seem sound? (sogaiu@c4f0a27)
(I don't know whether other bindings attempt to keep the "retained" source up-to-date when the parse tree changes and I don't have any good ideas about how that might be done here.)
Hello, how can I use your parser to work as I explain in the following lines?
code = 'System'
code = 'System.'
code = 'System.out'
code = 'System.out.'
code = 'System.out.println'
code = 'System.println('
code = 'System.println("hello")
It means, the parser should be able to detect that the java syntax is doing well at each time I add one or more tokens. I tried using tree-sitter, but it did work and I do not know if there is a trick.
The code I tested is:
from tree_sitter import Language, Parser
JAVA_LANGUAGE = Language('parser/my-languages.so', 'java')
parser = Parser()
parser.set_language(JAVA_LANGUAGE)
code = 'System.' #.out.println("hello");
tree = parser.parse(bytes(code,'utf8')).root_node
subtree= [x[0] for x in get_all_sub_trees(tree)]
When code = 'System.out.println("hello");
works, but when code = 'System'
fails
Thank you in advance
Specifically one that includes this commit: b4db17e
Hi, I'm trying to create a tree traversal using the cursor. The cursor should print the nodes as the recursive method in this other issue (#5). However, the recursive implementation without cursor is very inefficient as it ends up with a huge recursion depth.
This is what I wrote so far, it prints some parts of the tree, but then it stops.
def print_node(code, node):
pos_point = f"[{node.start_point},{node.end_point}]"
pos_byte = f"({node.start_byte},{node.end_byte}"
print(
f"{code[node.start_byte:node.end_byte]:<25}{node.type}"
)
def itraverse(code, cursor):
has_sibling = True
while has_sibling:
has_childs = True
while has_childs:
has_childs = cursor.goto_first_child()
if not has_childs:
print_node(code, cursor.node)
#print_node(code, cursor.node)
has_sibling = cursor.goto_next_sibling()
I think the problem is that the goto_next_sibling()
doesn't work as I think (going to the closest sibling of the subtree). Is there a way to traverse the tree without using a stack and memorize the nodes that have been visited?
I run py-tree-sitter on a Windows system in an almost clean 32bit 3.6 Python venv.
While parsing a LOT of files and running the same query on each of the resulting trees I found myself out of memory rather quickly. I tried to narrow down the source of my leak:
query = PY_LANG.query(""" some query string with captures"") for root, _, files in os.walk("some/repos/"): for fn in files: if fn.endswith(".py"): with open(os.path.join(root, fn), mode="rb") as fd: tree = parser.parse(fd.read()) query.captures(tree.root_node)
After playing around with the above code a little query.captures(tree.root_node)
seems to be cause, since other tree operations or even just parsing are working perfectly fine. Other repositories using the python tree-sitter I know also don't really use queries , so I wondered whether this is known/intended behaviour and I'm just using it wronly or a genuine bug.
I tried looking at the C bindings, but unfortunately I'm quite inexperienced with C, so I couldn't find the exact cause.
Something weird is going one: The doc link on https://pypi.org/project/tree-sitter
links to http://initd.org/psycopg/docs/ ...
After buidling the library, I ran
PY_LANGUAGE = Language('test_tree_sitter.so', 'python')
and got the following error:
OSError Traceback (most recent call last)
<ipython-input-4-da3a6dbada56> in <module>
----> 1 PY_LANGUAGE = Language('test_tree_sitter.so', 'python')
~/.local/lib/python3.7/site-packages/tree_sitter/__init__.py in __init__(self, library_path, name)
79 """
80 self.name = name
---> 81 self.lib = cdll.LoadLibrary(library_path)
82 language_function = getattr(self.lib, "tree_sitter_%s" % name)
83 language_function.restype = c_void_p
/usr/lib/python3.7/ctypes/__init__.py in LoadLibrary(self, name)
432
433 def LoadLibrary(self, name):
--> 434 return self._dlltype(name)
435
436 cdll = LibraryLoader(CDLL)
/usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
354
355 if handle is None:
--> 356 self._handle = _dlopen(self._name, mode)
357 else:
358 self._handle = handle
OSError: test_tree_sitter.so: undefined symbol: _ZSt20__throw_length_errorPKc
Consider latest repos of py-tree-sitter & tree-sitter & tree-sitter-python and then also add tree-sitter-hello with the below grammar:
tree-sitter-hello/grammar.js:
module.exports = grammar({
name: 'the_language_name',
rules: {
source_file: $ => 'hello'
}
});
test.py:
from tree_sitter import Language, Parser
def build():
Language.build_library(
'build/parser.pyd',
[
"tree-sitter-python",
"tree-sitter-hello",
]
)
def main():
LANGUAGE = Language('build/parser.pyd', 'python')
parser = Parser()
parser.set_language(LANGUAGE)
tree = parser.parse(bytes("""a = 10""", "utf8"))
if __name__ == '__main__':
build()
main()
If I run test.py on the visual studio command prompt (vs2015) I'll get this output:
(py364_32) d:\mcve>python test.py
parser.c
parser.c
Creating library build/parser.lib and object build/parser.exp
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_create
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_serialize
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_deserialize
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_scan
parser.obj : error LNK2001: unresolved external symbol _tree_sitter_python_external_scanner_destroy
build/parser.pyd : fatal error LNK1120: 5 unresolved externals
Traceback (most recent call last):
File "d:\software\python364_32\Lib\distutils\_msvccompiler.py", line 519, in link
self.spawn([self.linker] + ld_args)
File "d:\software\python364_32\Lib\distutils\_msvccompiler.py", line 542, in spawn
return super().spawn(cmd)
File "d:\software\python364_32\Lib\distutils\ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "d:\software\python364_32\Lib\distutils\spawn.py", line 38, in spawn
_spawn_nt(cmd, search_path, dry_run=dry_run)
File "d:\software\python364_32\Lib\distutils\spawn.py", line 81, in _spawn_nt
"command %r failed with exit status %d" % (cmd, rc))
distutils.errors.DistutilsExecError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\link.exe' failed with exit status 1120
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 21, in <module>
build()
File "test.py", line 9, in build
"tree-sitter-hello",
File "D:\virtual_envs\py364_32\lib\site-packages\tree_sitter\__init__.py", line 65, in build_library
compiler.link_shared_object(object_paths, output_path)
File "d:\software\python364_32\Lib\distutils\ccompiler.py", line 717, in link_shared_object
extra_preargs, extra_postargs, build_temp, target_lang)
File "d:\software\python364_32\Lib\distutils\_msvccompiler.py", line 522, in link
raise LinkError(msg)
distutils.errors.LinkError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\link.exe' failed with exit status 1120
What's the reason to get these linker errors?
It is great to see the py binding is getting better with your recent addition of tree-queries, etc. Thank you for your efforts!
It would also be useful if node searches by position could be supported, i.e., *descendant_for_*_range
/ first_*_child_for_byte
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.