Comments (4)
That's a good problem to think through. Here is what I'd do:
Don't model whitespace and comments as tokens. Instead, follow the Roslyn model where those are considered trivia. Trivia is associated with a token. A token can have leading and trailing trivia. Trailing trivia only goes until the end of line, all subsequent trivia is considered leading trivia for the following token. Therefore, only the first token on a line can have leading trivia. Comments at the bottom of a file are leading trivia of the synthetic end-of-file token.
Why this way? Because it follows how we think of comments:
- If a comment is on a line by itself, it usually refers to the following code.
- If a comment is on a line with code, it usually refers to the code before it.
Let's look at a few examples:
// Comment 1
function Foo(a: int, // Comment 2
b: int) // Comment 3
{
let x = a /* Comment 4 */ + /* Comment 5 */ b
/* Comment 6 */ let y = x // Comment 7
// Comment 8
}
Comment | Leading/Trailing | Token |
---|---|---|
Comment 1 | Leading | function |
Comment 2 | Trailing | , |
Comment 3 | Trailing | ) |
Comment 4 | Trailing | a |
Comment 5 | Trailing | + |
Comment 6 | Leading | let |
Comment 7 | Trailing | x |
Comment 8 | Leading | } |
It's worth noting that comments are just one kind of trivia. Others are:
- whitespace & line breaks
- preprocessor directives
- tokens that were skipped by the parser (this ensures the
SyntaxTree
can always be used to print the input, character for character, which we currently can't do)
Rough sketch:
partial class SyntaxToken
{
public ImmutableArray<SyntaxTrivia> LeadingTrivia { get;}
public ImmutableArray<SyntaxTrivia> TrailingTrivia { get; }
}
class SyntaxTrivia
{
public SyntaxKind Kind { get; }
public string Text { get; }
}
Now add a few APIs to SyntaxNode
:
Descendents()
. Return all nodes and tokens by recursively walking the current node and all the children.GetFirstToken() => Descendents().OfType<SyntaxToken>().First();
GetLastToken() => Descendents().OfType<SyntaxToken>().Last();
GetLeadingTrivia() => GetFirstToken().LeadingTrivia;
GetTrailingTrivia() => GetLastToken().TrailingTrivia;
With that, your problem becomes pretty straight forward. When declaring a function, you do something like that:
void BindFunctionDeclaration(FunctionDeclarationSyntax syntax)
{
var comments = new List<string>();
var leadingTrivia = syntax.GetLeadingTrivia();
foreach (var trivia in leadingTrivia)
{
if (trivia.Kind == SyntaxKind.DocumentationCommentTrivia)
{
// Get text without slashes
var text = trivia.Text.Substring(3).Trim();
comments.Add(text);
}
}
var summary = string.Join(Environment.NewLine, comments);
...
var symbol = new FunctionSymbol(..., summary);
...
}
Does this make sense?
from minsk.
To all of you parser developers! I'm making a parser for a custom language. This language uses JSDoc as way of adding function documentation and metadata. My idea was to tokenize the input with the following types of comments:
// my line comment /* my multi line comment */ /// this is a documentation
then I would strip them from the List...
var comments = tokens.where(x=> x.Kind == Kind.SingleLComment || x.Kind == Kind.MultiLComment); var documentation = tokens.where(x=> x.Kind == Kind.Documentation); var code = tokens.where(x=> !comments.Contains(x) && !documentation.Contains(x));
now the
code
tokens can be normally parsed as intended...now in the parser I want to be able to, at any time, check if there is a token previous to a declaration and attach it to the declaration model as metadata... for example...
ParseFunctionDeclaration() { var functionKeyword = MatchToken(SyntaxKind.FunctionKeyword); var identifier = MatchToken(SyntaxKind.IdentifierToken); var openParenthesisToken = MatchToken(SyntaxKind.OpenParenthesisToken); var parameters = ParseParameterList(); var closeParenthesisToken = MatchToken(SyntaxKind.CloseParenthesisToken); var meta = functionKeyword.TryMatchPrevious(SyntaxKind.Documentation, out var token) ? token : null; }
even though the tokens are no longer next to each other.. (I stripped the original list) I was thinking of keeping them linked using a pointer to the previous token (during the tokenization process).
my problem is I maybe breaking the model structure here... as the tokens will have access to each other and they will need to have a
TryMatchPrevious
method it doesn't really make sense because a token shouldn't have a match functionality.on the other hand I can just put the function in the
Parser
and have it's signature be:
TryMatchPrevious(kindToMatch, out var token);
or even
TryMatchPrevious(startToken, kindToMatch, out var token);
what do you think of this approach? am I overthinking it? is this too much? is there a simple way of implementing this!!
from minsk.
What is relly Parse
from minsk.
I'll implement this on stream, which is tracked by #87 & #101.
from minsk.
Related Issues (20)
- [Feature request] Source Generators
- Add support for local functions
- Expose a semantic model
- Language server for IDE integration.
- Optional Tokens
- Episode 27 is marked private in YouTube HOT 2
- Support 5! HOT 1
- Add support for arrays HOT 1
- Null
- Add a language server HOT 1
- Dead project? HOT 4
- Future of the Project
- Anything happening here?
- Can't run `msc` project. Finishes with error HOT 2
- Language name HOT 2
- Been going through this series, episode 27 is down and is anything still happening here or is this series dead?
- Visual Minsk?
- Getting error while running ./msc ./hello/hello.ms
- Declare multiple variables in a single line.
- Using this project and videos as inspiration to create one in JS. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from minsk.