rust-analyzer / rowan Goto Github PK

License: Apache License 2.0

Rust 100.00%

rowan's Introduction

rust-analyzer is a modular compiler frontend for the Rust language. It is a part of a larger rls-2.0 effort to create excellent IDE support for Rust.

Quick Start

https://rust-analyzer.github.io/manual.html#installation

Documentation

If you want to contribute to rust-analyzer check out the CONTRIBUTING.md or if you are just curious about how things work under the hood, check the ./docs/dev folder.

If you want to use rust-analyzer's language server with your editor of choice, check the manual folder. It also contains some tips & tricks to help you be more productive when using rust-analyzer.

Security and Privacy

See the corresponding sections of the manual.

Communication

For usage and troubleshooting requests, please use "IDEs and Editors" category of the Rust forum:

https://users.rust-lang.org/c/ide/14

For questions about development and implementation, join rust-analyzer working group on Zulip:

https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Frust-analyzer

Quick Links

Website: https://rust-analyzer.github.io/
Metrics: https://rust-analyzer.github.io/metrics/
API docs: https://rust-lang.github.io/rust-analyzer/ide/
Changelog: https://rust-analyzer.github.io/thisweek

License

rust-analyzer is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for details.

rowan's People

Contributors

Stargazers

Watchers

Forkers

waywardmonkeys jd91mzm2 fredericp lnicola kjeremy zbelial cad97 jdomantas memoryruins atul9 matthias247 vemoo veetaha nickolay simonvandel mbergenlid rdambrosio016 m-j-hooper domenicquirl drtychai ruabmbua idawer isgasho kkiyama117 boggle s-panferov trlim veykril eggyal theo-lw oberblastmeister possiblyashrub volland auxoncorp directxman12 peamaeq aminya huang12zheng icodein maltejanz tzvipm arbersephirotheca binadamu-isiyoonekana oxalica dzmitry-lahoda-forks azdavis technohacker sarvex trickster krasimirgg jelmer mattfbacon nickk-dv jamestiotio adotinthevoid michaelvanstraten sunqiao520

rowan's Issues

size assert not compatible with randomized struct layouts

rowan/src/green/node.rs

Lines 30 to 31 in 73ed5e7

    
           #[cfg(target_pointer_width = "64")] 
        
           static_assert!(mem::size_of::<GreenChild>() == mem::size_of::<usize>() * 2);

breaks when compiling with layout randomization

If the assert is necessary for correctness then repr(C) should be used on the struct and its children to get deterministic layouts.
If it's just an optimization then it's probably better to just check it in tests/CI.

Found this while trying to build the rust compiler with randomized layouts. rust-lang/rust#101339

Panic on calling `prev_sibling` on the Root SyntaxNode

panicked with 'attempt to subtract with overflow'

This was introduced with #140 🙈 . That fix worked fine for every other SyntaxNode except the Root...

Sending `SyntaxNode`s to other threads?

I have a need to send SyntaxNodes to other threads, however SyntaxNode is not Send. What is the recommended approach to such usecases? I know green nodes are both send and sync, but they dont provide the needed information (text offsets). SyntaxNode is preferred, as it can be used in typed AST wrappers.

One solution that comes to mind is sending the green node and turning it into a syntax node in the new thread. But this seems wasteful if I already have a syntax node (I would have to turn it into a green node just to rebuild the syntax node in a new thread). Is there a better solution?

Serde feature does not enable serde for SmolStr

As in title, there is a serde1 feature in rowan that enables serde support for both rowan and text-size. It should probably also turn on the feature for smol_str.

[question] Line/column information

It seems like SyntaxNode only has TextRange which contains only infomation about absolute offsets. But how do you handle line/column ranges (necessary for printing errors)?

Add deserialization symmetrically to existing serialization

I found no deserialization in feature serde1. Could this be added, please?

Bug with SyntaxPtr::to_node + many nodes with same range

SyntaxNodePtr::to_node uses SyntaxNode::child_or_token_at_range, whose docs state

If there are several intersecting elements, any one can be returned.

This leads to issues (i.e. spurious panics) when there are indeed several elements that share the same range.

In my project, I have a somewhat minimal reproducer:

$ cargo t -p tests rowan_panic
    Finished test [unoptimized + debuginfo] target(s) in 0.13s
     Running unittests src/lib.rs (target/debug/deps/tests-bbc3b3005ec8d90f)

running 1 test
test misc::rowan_panic ... FAILED

failures:

---- misc::rowan_panic stdout ----
root: [email protected]
  [email protected] "\n"
  [email protected]
    [email protected]
      [email protected]
        [email protected]
          [email protected]
            [email protected] "functor"
            [email protected] " "
            [email protected]
              [email protected] "MkThing"
              [email protected] "\n"
              [email protected]
                [email protected]
    [email protected]

=====
ptr: SyntaxNodePtr { kind: FunctorDec, range: 1..17 }
node: [email protected]
node: [email protected]
node: [email protected]
node: [email protected]
node: [email protected]
node: [email protected]
resolved: [email protected]
=====
ptr: SyntaxNodePtr { kind: Dec, range: 17..17 }
node: [email protected]
node: [email protected]
node: [email protected]
thread 'misc::rowan_panic' panicked at 'can't resolve local ptr to SyntaxNode: SyntaxNodePtr { kind: Dec, range: 17..17 }', crates/mlb-statics/src/lib.rs:358:24

The second ptr is to a Dec with range 17..17, which indeed exists, but it seems that this is not chosen for consideration and instead the DecWithTail with the same range is chosen instead. It does not have the same SyntaxKind, leading to the panic.

Replace `SyntaxKind` with `Into<SyntaxKind>` in function parameters

It would be much more convenient to write builder.start_node(Ast::Expr) than builder.start_node(Ast::Expr.into()), assuming there is an impl From<Ast> for SyntaxKind somewhere. Obviously not a huge deal, but it would slightly cut down on boilerplate.

Investigate non-natural tree embeddings

Today, Rowan's logical tree shape is identical to the physical shape of the tree data structure. That's natural (you might even wonder what is the other option), but is not necessary optimal. In particular, natural tree sometimes have very wide nodes with hundreds of children. Traversing or modifying such nodes sometimes is O(len(children)), which might give poor worst-case complexity.

An alternative is to make sure that the physical shape of the data structure is always a ballaned tree. For example, we can introduce a TRANSPARENT SyntaxKind, and represent long sequences as deep, balanced trees with TRANSPARENT internal nodes.

Clarify how Rowan could be used with a GLR parser

If I understand correctly, Rowan's use in rust-analyzer relies on a recursive descent parser that emits events for nodes beginning or ending and those events are converted by a TreeSink into calls to GreenNodeBuilder's start_node and finish_node.

I'm wondering if it could make sense to use Rowan's syntax tree abstraction with a different sort of parser, say GLR for example, which uses a shift-reduce mechanism for constructing trees and may even fork at some point to attempt multiple non-deterministic branches of the grammar in parallel. In such an algorithm, the context may not be known in advance, so nodes would have to be combined differently.

Can you clarify how I might accomplish this? Would I just make a different Builder abstraction?

Add the ability to make a SyntaxNode which is offset by a certain amount from the source code

Hello, i have been using rowan in my own project, it works absolutely great. However, when parsing something like a markdown file, i run into issues with text ranges because it assumes the root node starts at the start of the source. Which means i need to offset every single node, token, or element's range i make. It would be great if i could make a syntax node and tell rowan the range is actually offset by a certain amount, i tried to implement this myself but i ran into a good amount of issues, including:

A lot of things internally rely on range, this could probably be solved by making a raw_range internal method
There may be conflicts with how rowan uses the range to find stuff inside of green nodes, this could probably also be solved by the above.

Text would stay the same, as the green node will still have the right text, only text range of parent and children would be affected.

Undefined Behavior


  error: casting references to a bigger memory layout than the backing allocation is undefined behavior, even if the reference is unused
     --> src/green/node.rs:197:35
      |
  195 |         let repr: &Repr = &self.ptr;
      |                            -------- backing allocation comes from here
  196 |         unsafe {
  197 |             let repr: &ReprThin = &*(repr as *const Repr as *const ReprThin);
      |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
      = note: casting from `ThinArc<GreenNodeHead, GreenChild>` (8 bytes) to `HeaderSlice<GreenNodeHead, [GreenChild; 0]>` (16 bytes)
      = note: `#[deny(invalid_reference_casting)]` on by default
  
  error: casting references to a bigger memory layout than the backing allocation is undefined behavior, even if the reference is unused
     --> src/green/token.rs:139:35
      |
  138 |             let repr: &Repr = &self.ptr;
      |                                -------- backing allocation comes from here
  139 |             let repr: &ReprThin = &*(repr as *const Repr as *const ReprThin);
      |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |
      = note: casting from `ThinArc<GreenTokenHead, u8>` (8 bytes) to `HeaderSlice<GreenTokenHead, [u8; 0]>` (16 bytes)
  
  error: could not compile `rowan` (lib) due to 2 previous errors

Exporting some version of Cache

In PR #41 it was mentioned

We should probably consider eventually exposing some version of the Cache implemented here, such that bottom-up builders can use it and get the deduplication for free.

This would be seem very helpful in trying to integrate lalrpop.

How integrate with codespan or similar? or: How get the original source?

I'm looking if move my parsing to rowan, and one thing looks complicated. I wish to also have nice error reporting, like using codespan, but have trouble making the integration. As I see it today, I need to keep 2 copies of the whole source code to make both:

pub type File = SimpleFile<String, String>;  <-- Copy for reports
pub struct FilesDb {
    files: SlotMap<FileId, File>,
}

//Interface to my source code, this is where I need to mix...
impl<'a> Files<'a> for FilesDb {
    type FileId = FileId;
    type Name = String;
    type Source = &'a str;

    fn name(&self, file_id: FileId) -> Result<Self::Name, Error> {
        Ok(self.get(file_id)?.name().to_string())
    }

    fn source(&'a self, file_id: FileId) -> Result<Self::Source, Error> {
        Ok(self.get(file_id)?.source().as_ref())
    }

    fn line_index(&self, file_id: Self::FileId, byte_index: usize) -> Result<usize, Error> {
        self.get(file_id)?.line_index((), byte_index)
    }

    fn line_range(&self, file_id: Self::FileId, line_index: usize) -> Result<Range<usize>, Error> {
        self.get(file_id)?.line_range((), line_index)
    }
}

Now, I see I can allocate the code from the green_nodes:

parser.green_node.to_string()

But then is again a double-copy. How I can get the nodes that are part of the range of an error? Or index into the line/col, etc?

Is that possible to mutate the SyntaxNode

I am using rowan to write a transpiler , but the rowan's syntax node is immutable, so convert ast is very hard. So i am wonder if there has some best practice to do these thing.

Comparing GreenNodes

How can I compare two SyntaxNode<T>s? I see that the Eq trait is implemented for GreenNode, but I don't see how to get a GreenNode from a SyntaxNode. .green() gives me a GreenNodeData which I can't turn into a GreenNode.

The motivation for this is I have a SyntaxNode child that I was to remove. So, I get the parent SyntaxNode, iterate through its children to find the index of the child SyntaxNode and then call remove_child. This gives me a type error, though. Here's the code I'm working with:

pub fn kill_node_attribute(
    node: &SyntaxNode<NixLanguage>,
) -> Result<SyntaxNode<NixLanguage>, String> {
    let parent = node.parent().unwrap();
    match parent.kind() {
        NODE_ATTR_SET => {
            let idx =
                parent
                    .green()
                    .children()
                    .enumerate()
                    .fold(0, |correct_idx, (idx, val)| -> usize {
                        if let Some(inner_node) = val.into_node() {
                            //if Borrow::<GreenNodeData>::borrow(inner_node) == node.green() {
                            if *inner_node == (*node.green()).into() {
                                idx
                            } else {
                                correct_idx
                            }
                        } else {
                            correct_idx
                        }
                    });
            let mut new_root =
                SyntaxNode::<NixLanguage>::new_root(parent.green().remove_child(idx));
            loop {
                println!("did one iteration of the inner loop");
                if let Some(parent) = new_root.parent() {
                    new_root = parent;
                } else {
                    break;
                }
            }
            Ok(Root::cast(new_root).unwrap().inner().unwrap())
        }
        NODE_STR => unimplemented!(),
    }
}

And the error:

0% ❯ bash run.sh
warning: unused import: `GreenNodeData`
 --> src/parser.rs:2:48
  |
2 | use rowan::{api::SyntaxNode, GreenNodeBuilder, GreenNodeData};
  |                                                ^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

error[E0277]: the trait bound `GreenNode: From<GreenNodeData>` is not satisfied
  --> src/parser.rs:42:63
   |
42 | ...                   if *inner_node == (*node.green()).into() {
   |                                                         ^^^^ the trait `From<GreenNodeData>` is not implemented for `GreenNode`
   |
   = note: required because of the requirements on the impl of `Into<GreenNode>` for `GreenNodeData`

error: aborting due to previous error; 1 warning emitted

For more information about this error, try `rustc --explain E0277`.
error: could not compile `flake_generator`

To learn more, run the command again with --verbose.```

Timeline for 0.13 release?

Is there a timeline for an official 0.13 release?

I am trying to use rowan through taplo and I am not sure if taplo would be fine with depending on a RC.

Separate types for token kinds and node kinds

Although using a single type for both (SyntaxKind) is easier, it doesn’t make illegal states unrepresentable. For example, if I’m matching on a SyntaxKind produced by my lexer, I know that the kind cannot be one of SyntaxKind’s node variants, e.g. BinaryExpr. Similarly, when I’m deciding what kind to give a node, I shouldn’t be able to create a node of kind LetKw.

NodeCache hashes twice

NodeCache does get, then insert if it didn't find the node. We could avoid that by doing something like

diff --git i/src/green/builder.rs w/src/green/builder.rs
index 35ecbc3..82ce808 100644
--- i/src/green/builder.rs
+++ w/src/green/builder.rs
@@ -1,4 +1,6 @@
-use rustc_hash::FxHashSet;
+use std::collections::hash_map::Entry;
+
+use rustc_hash::FxHashMap;
 
 use crate::{
     green::{GreenElement, GreenNode, GreenToken, SyntaxKind},
@@ -7,8 +9,8 @@ use crate::{
 
 #[derive(Default, Debug)]
 pub struct NodeCache {
-    nodes: FxHashSet<GreenNode>,
-    tokens: FxHashSet<GreenToken>,
+    nodes: FxHashMap<GreenNode, ()>,
+    tokens: FxHashMap<GreenToken, ()>,
 }
 
 impl NodeCache {
@@ -17,7 +19,7 @@ impl NodeCache {
         I: IntoIterator<Item = GreenElement>,
         I::IntoIter: ExactSizeIterator,
     {
-        let mut node = GreenNode::new(kind, children);
+        let node = GreenNode::new(kind, children);
         // Green nodes are fully immutable, so it's ok to deduplicate them.
         // This is the same optimization that Roslyn does
         // https://github.com/KirillOsenkov/Bliki/wiki/Roslyn-Immutable-Trees
@@ -27,21 +29,29 @@ impl NodeCache {
         // 17% of the memory for green nodes!
         // Future work: make hashing faster by avoiding rehashing of subtrees.
         if node.children().len() <= 3 {
-            match self.nodes.get(&node) {
-                Some(existing) => node = existing.clone(),
-                None => assert!(self.nodes.insert(node.clone())),
+            match self.nodes.entry(node) {
+                Entry::Occupied(entry) => entry.key().clone(),
+                Entry::Vacant(entry) => {
+                    let node = entry.key().clone();
+                    entry.insert(());
+                    node
+                }
             }
+        } else {
+            node
         }
-        node
     }
 
     fn token(&mut self, kind: SyntaxKind, text: SmolStr) -> GreenToken {
-        let mut token = GreenToken::new(kind, text);
-        match self.tokens.get(&token) {
-            Some(existing) => token = existing.clone(),
-            None => assert!(self.tokens.insert(token.clone())),
+        let token = GreenToken::new(kind, text);
+        match self.tokens.entry(token) {
+            Entry::Occupied(existing) => existing.key().clone(),
+            Entry::Vacant(entry) => {
+                let token = entry.key().clone();
+                entry.insert(());
+                token
+            }
         }
-        token
     }
 }

But I can't confirm that this is actually faster.

Make it easier to define high-level ASTs?

Currently we provides traits and auxiliary functions in rowan::ast, but it's still difficult to use.

In the example, we define ASTs using a custom macro.

rowan/examples/s_expressions.rs

Lines 268 to 270 in 6a94212

ast_node!(Root, ROOT);

ast_node!(Atom, ATOM);

ast_node!(List, LIST);
In rust-analyzer, we even use code generation to produce a 4.8k lines file crates/syntax/src/ast/generated/nodes.rs.

Could we make it easier to define ASTs and child extractors?
Maybe using some DSL with proc-macros or build.rs for generation?

Miri UB with -Zmiri-track-raw-pointers

Hi! I got UB when running Miri with raw pointer tracking enabled (-Zmiri-track-raw-pointers) against the invocation of GreenNodeBuilder::token. The UB is not reported when running Miri without raw pointer tracking.
To reproduce the UB, run the following function with MIRIFLAGS='-Zmiri-track-raw-pointers' cargo +nightly miri run.

use rowan::{GreenNodeBuilder, SyntaxKind};

fn main() {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(SyntaxKind(0));
    builder.token(SyntaxKind(1), "foo");
}

Here is the backtrace:

error: Undefined Behavior: no item granting write access to tag <3629> at alloc1681+0x18 found in borrow stack.
   --> /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:887:9
    |
887 |         copy_nonoverlapping(&src as *const T, dst, 1);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no item granting write access to tag <3629> at alloc1681+0x18 found in borrow stack.
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
            
    = note: inside `std::ptr::write::<u8>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:887:9
    = note: inside `rowan::arc::ThinArc::<rowan::green::token::GreenTokenHead, u8>::from_header_and_iter::<std::str::Bytes>` at /home/pan/.cargo/registry/src/github.com-1ecc6299db9ec823/rowan-0.12.6/src/arc.rs:382:21
    = note: inside `rowan::GreenToken::new` at /home/pan/.cargo/registry/src/github.com-1ecc6299db9ec823/rowan-0.12.6/src/green/token.rs:110:19
    = note: inside `rowan::green::builder::NodeCache::token` at /home/pan/.cargo/registry/src/github.com-1ecc6299db9ec823/rowan-0.12.6/src/green/builder.rs:93:29
    = note: inside `rowan::GreenNodeBuilder::token` at /home/pan/.cargo/registry/src/github.com-1ecc6299db9ec823/rowan-0.12.6/src/green/builder.rs:133:29
note: inside `main` at src/main.rs:5:5
   --> src/main.rs:5:5
    |
5   |     builder.token(SyntaxKind(1), "foo");
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
    = note: inside closure at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
    = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
    = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
    = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
    = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
    = note: inside closure at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
    = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
    = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
    = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
    = note: inside `std::rt::lang_start_internal` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
    = note: inside `std::rt::lang_start::<()>` at /home/pan/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:62:5

error: aborting due to previous error

It looks like Miri is not satisfied with ThinArc construction.

Remove double indirection in NodeData

NodeData looks like this:

#[derive(Debug)]
struct NodeData {
    kind: Kind,
    green: ptr::NonNull<GreenNode>,
}

#[derive(Debug)]
struct NodeData {
    kind: Kind,
    green: ptr::NonNull<GreenNode>,
}

Note how we need to chase two pointers to get to the node data! This probably should be fixed

AstPtr can de-sync from mutable Syntax Tree after mutation

If I keep AstPtrs to two nodes in a tree (appearing one after the other in source code), and make changes to the first one, the second AstPtr becomes invalid and will panic when to_node is called

In retrospect I should've seen this coming considering AstPtr only keeps a source code location reference to the node in the tree, and the tree source physically gets changed when mutations occur

Pseudocode to demonstrate the issue:

println!("Create Tree");
let mut tree = make_module("root");

println!("Add 2 children");
tree.append_child(make_module("child_1"));
tree.append_child(make_module("child_2"));

println!("Create child pointers");
let child_1_ptr = AstPtr::new(&tree.ast_children().nth(0).unwrap());
let child_2_ptr = AstPtr::new(&tree.ast_children().nth(1).unwrap());

println!("Resolve both pointers");
let mut child_1 = child_1_ptr.to_node(tree.syntax());
let child_2 = child_2_ptr.to_node(tree.syntax()); // All fine here, no mutations done

println!("Mutate tree under child 1");
child_1.append_child(make_module("child_1_1")).unwrap();

println!("Try to resolve pointer 2");
let child_2 = child_2_ptr.to_node(tree.syntax()); // Panic!

I'm fairly certain this is by design, but is there any way around this? If not, I could make a PR to add this as a note to the docs

Can we do a quick release?

Hi!

I'd love to release rnix-lsp with a newer rowan+rnix, but first I need a release containing this patch and then a release of nixpkgs-fmt with that new rowan version.

Thanks!

Testing of an interpreter

Hi,

I’m currently working on an interpreted language that uses rowan to represent its syntax trees. To interpret code, I’ve created a typed layer on top of Syntax{Node,Element,Token}s using the macro-based approach used in the s-expressions example. I added tests as I wrote the parser, asserting that the Debug representation of SyntaxNode looked as expected.

However, I did not add tests as I wrote either the typed layer or the interpreter. Just for context, the interpreter is wholly based on the typed layer, and does not use any other data structures. After completing the initial implementation of the interpreter, I found that it was laden with bugs – I guess that’s what you get when you don’t write tests.

How can I test the interpreter if I can’t construct Syntax{Node,Element,Token}s by hand? Is the recommended way to invoke the relevant sub-parsers to create typed layer nodes from strings?

Make rowan generic over the type of token text

Currently, the token texts are stored in SmolStrs, which limits the source code to Unicode. IMHO, make the token text generic would widen the scope of rowan, such as representing a syntax tree for a binary protocol.
(Actually, I'm planning to write a parser for a programming language whose source code is non-Unicode, which cannot be converted to Unicode since it contains some unique emojis that doesn't exist in Unicode. Having a generic lossless syntax tree would make my work much easier. That's where my idea comes from.)

Way to change node order

It would be really useful if there was a way to change the root syntax kind for a node.
The reason for this is because the language that I'm using has both grouping expressions i.e. (1+10) and tuple expressions. So when parsing a grouping expression which is actually a tuple expression it would be nice if the GreenNodeBuilder offered an API that could swap the current root node to something else.

v0.15.16 is not published on crates.io

I see that main contains a commit to do 0.15.16, but it seems to not have been uploaded to crates. I am guessing this is an oversight, so I am reporting it.

Add smart debugging function for `GreenNodeBuilder`

As it is, debugging issues with GreenNodeBuilder isn't easy. Sure, you can dbg! it, but it doesn't label what the SyntaxKinds are, rather, it just displays them with their raw value.

I think it would be great to add a function that helps with this. Ideally it would log something like this:

[email protected]
	[email protected]
		[email protected] "pub"
		[email protected] " "
		[email protected]
			[email protected] "type"
			[email protected] " "
			[email protected]
				[email protected] "Foo"
			[email protected] " "
			[email protected] "="

The last node (and/or token) would likely be highlighted in green to signify that it is the current node.

If this would be a welcome addition, I would be more than happy to implement it, but I wanted to ask first.

Niching for Checkpoint

Since Checkpoint is a fairly arbitrary integer newtype, it would be nice to use the NonZero* variant of the internal integer type to allow niching. I think it would be fairly simple to do: just add one when wrapping the type and subtract one when unwrapping it.

Bad offset: range 0..558 offset 564 in cursor.rs

Hi,

I run into this error message when trying to add tailwindcss to my leptos project, according to this comment.

Is there a missing boundary check in cursor.rs?

thread 'Worker' panicked at 'Bad offset: range 0..558 offset 564', /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/rowan-0.15.10/src/cursor.rs:751:9
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: rowan::cursor::SyntaxNode::token_at_offset
   3: ide_completion::context::analysis::expand_and_analyze
   4: ide_completion::context::CompletionContext::new
   5: ide_completion::completions
   6: std::panicking::try
   7: rust_analyzer::handlers::handle_completion
   8: std::panicking::try
   9: <F as threadpool::FnBox>::call_box
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[Error - 11:34:53 AM] Request textDocument/completion failed.
  Message: request handler panicked: Bad offset: range 0..558 offset 564
  Code: -32603

Is Ord strictly required for Lang Kind?

Hi!

I've been poking around at using rowan for a project, and I was wondering what the logic for Lang & Kind being Ord was. It's not a massive burden (like... #[derive(PartialOrd, Ord)] is pretty easy :-P) but it feels a bit strange for me to make my SyntaxKind Ord when it's shouldn't really be compared like that (like, what does Identifier > Operator mean). Tried cargo build --examples and cargo test w/o it and nothing seemed to fail to compile (also tried cargo build on rust-analyzer itself with the dep pointed to the local copy, and it seemed to build fine), so that got me curious as to what the original logic was.

Hi what is the general purpose of this language?

I wanted to know is this an interpreted language or compiled language?

Allow non UTF-8 source code

Currently, rowan requires string slices. However, not all programming languages require UTF-8, which makes it impossible to represent them using a rowan tree without adding arbitrary rules to the existing specification. It would be convenient if we could, for instance, use byte slices as our base source code representation. If a maintainer has an idea of how this would work (how to specify whether or not Unicode is enforced using &str), I would love to draft a PR.

Breaking renames

GreenNode name doesn't make sense, as we don't have RedNodes. FunNode or PureNode would make more sense.

While we are at it, Node -> Tree, so that its TreeOrToken

NodeCache not exposed

GreenNodeBuilder has a public with_cache constructor, but the NodeCache type is not public and I can't see any way to construct one outside of the library.

Consider adding a `CHANGELOG.md`

Hi!
I have a crate using Rowan 0.10 and want to update to the latest version. I am getting a few compiler errors but should be able to sort it out.
I know that the main user of this lib is rust-analyzer, but it would still be nice to provide a CHANGELOG.md in this repo to help other people with updates.

prev_sibling does not work as expected

A call to syntax_node.prev_sibling() returns actually the node two nodes before itself instead of the one before it.
I found out that syntax_node.prev_sibling_or_token works as expected and finds also the previous node.

To illustrate this issue, take the following tree of syntax nodes (only nodes, not tokens):

Root
    A
    B
    C

The calls to prev_sibling and prev_sibling_or_token report the following (per current node)

A
    prev_sibling=None
    prev_sibling_or_token=None
B
    prev_sibling=None
    prev_sibling_or_token=Some(Node(A))
C
    prev_sibling=Some(A)
    prev_sibling_or_token=Some(Node(B))

I found this by working with the rowan syntax trees in my rewrite of ( https://github.com/MalteJanz/ludtwig/tree/bachelor-thesis ) where I use rowan ( still in a very early state ). Parsing is done by the GreenNodeBuilder (abstracted away behind a sink / events). I have not yet found the time to produce a minimal reproducible example, so it would be nice if someone can confirm this behaviour in their rowan syntax trees or proof me wrong in my understanding of the function 🙈 .

I guess I already found the underlaying issue and will submit a PR shortly (offset by one error in src/cursor.rs line 393) 🙂.

Consider using (or offering using) nonstandard Arc without weak support

e.g. triomphe, elysees, or a custom one.

Our usage of Arc is currently entirely contained, so this could be done transparently. The alternate backend of sorbus exposes Arc, so would make this harder to do.

@RDambrosio016 reports a 55% (!) (valgrind-reported) drop in memory usage for switching to an Arc without weak reference support.

Investigate allocating all nodes on the Arena

Currently, each green node is allocated on the heap, which helps to be incremental. However, the bulk of files are parsed only once, so it should be cool to have a non-incremental mode as well, where the whole tree is allocated in a bump-allocator.

SyntaxNode/GreenNode setup allows us to do just that! We can have a setup like this

SyntaxRoot {
    green: GreenNode | Vec<GreenData>
}

struct SyntaxNode {
    green: *const GreenData
}

type GreenNode = Arc<GreenData>;

The details need to be figured out: ideally, green nodes should always be clone, SyntaxNodes should not do any mode checks when getting green node, and there should only be a single "flag" for incremental mode in the root of the tree.

My gut feeling is that we'll need some kind of shared_from_this unsafe setup here, such that we arena-allocate GreenNodeData, store pointers to GreenNodeData in SyntaxNodes, store pointers to GreenNodeData in the children array in GreenNodeData, but, in case of green node, we know that pointer actually is to Arc<GreenNodeData>.

We should use https://github.com/fitzgen/bumpalo for arena of GreenData

[Feature Request] Unique ids per node instance

Having a way to uniquely identify nodes on a per-occurrence basis would really help with implementing incremental parsing and compilation

Now do I add payload to a node in the tree

Update s_expressions example to use rowan::ast

The s_expressions example has an ast_node macro that wraps SyntaxNodes, but reading the docs for rowan::ast it sounds like this might no longer the way to go? Having an updated example would be really helpful.

Consider using succinct data structures for read-only trees

I've recently been reading on succinct data structures literature, and I am wondering if they might be applicable to rowan. The tagline of SD is to represent data using approximately as few bits as entropy allows, but with keeping all the operations fast.

In particular, for trees, we generally use 2 * n * sizeof<usize> bytes (parent/children pointers). SD allows using roughly 2n bits, while still allowing for efficient parent/child queries. This works due to the following tricks:

bitvectors can be represented very efficiently (you can pack 64 bits into a single word)
working with bitmasks is fast (popcount + precomputed tables)
by spending a few extra bytes, it's possible to augment bitvector with fast prefix-sum datastructures (rank & select)
it's possible to encode a tree as a bitvec, using couple of bits per node, where parent/child relations are expressible via rank/select

It would be interesting to see if it makes sense to use something like this for rowan.

Some reasons to do this:

using fewer bits for storing tree topology means that larger trees will fit into CPU caches, potentially improving performance
trees are heavy-weight, reducing RAM seems like a good idea
algorithmic complexity of operations becomes independent of the topology of the tree. Ie, the tree can be arbitrary deep or arbitrary wide

Some reasons not to do this:

performance will probably be worse
works only for read-only trees
high complexity of the implementation

Integration testing by implementing a toy language

There should be a simple toy language and toy syntax tree to test all / most of the tree api methods (traversal and accessing specific nodes / tokens).

This issue tries to improve the testing situation in rowan, so things like #139 (off-by-one-error) can be caught at an early stage and projects depending on this library don't have to find these bugs the hard way. This should also allow for easier refactoring / internal changes with the confidence, that the public api still works as expected. Also future PRs which provide bug fixes for the public API can also provide a test for it to help increase the coverage.

steps to do this (out of my head / only a suggestion for the implementation):

Add a test directory for the integration tests
Write a module toy_language inside the test directory
- Implement all the necessary things for rowan (see examples and Make A Language Blog )
- Instead of writing a complete lexer / parser, simply write a method build_toy_tree which uses the GreenNodeBuilder and constructs a meaningful tree by hand (which provides enough depth / siblings / ... to use for testing)
Confirm the toy_language tree and underlaying source string looks like expected by writing two tests for this
Start implementing tests for the tree traversal / accessing tree nodes API

	#[cfg(target_pointer_width = "64")]
	static_assert!(mem::size_of::<GreenChild>() == mem::size_of::<usize>() * 2);

	ast_node!(Root, ROOT);
	ast_node!(Atom, ATOM);
	ast_node!(List, LIST);