Summary
Razor (in the editor) runs a parsing loop on edits to update the extents of projection buffers when edits occur.
The primary goal of this process is to do as little work as possible while ensuring that the state of the various buffers reflects an accurate understanding of the document. When an edit occurs, the editor will attempt to apply that edit to a buffer based on its bounds. Our algorithm attempts to understand if the edit being applied changes the document in a fashion that makes the underlying buffers incorrect, and will run the code generation process, and then update the buffers after a background parse to the correct extents. We refer to an edit which requires no action from us as accepted, and an edit which requires us to update the buffers as rejected.
As a secondary goal, the parsing loop must avoid interfering with various editor systems that provide Intellisense features. This means that we need to defer changes that would update the underlying buffers while completion is being shown, or about to be shown. We do this by introducing the concept of a provisional change, where we temporarily treat an edit as accepted, but queue a background update of the buffers to ensure that the result is correct eventually.
To determine which of these three states an edit falls into, we have a subsystem referred to as partial parsing - comprised of an annotated syntax tree and a dedicated thread per-editor. The partial parsing system is pessimistic, and does not reuse the actual Razor parser. Instead, the partial parsing system has special case code that must be correct for many basic editing and intellisense features to function.
These two goals are in tension. An algorithm that is maximally pessimistic for accepting changes would always partition the buffers correctly, but would be slow to use and prevent completion from functioning. An algorithm that was maximally permissive would get lots of things wrong, including showing lots of incorrect colorization and completion.
I don't believe the current approach will meet all of our goals long terms. We are aware of IDE features that are currently broken such as signature help, and the path that we have to enable these features using the current approach is very challenging. The proposal here is to consider replacing this hand-tuned system with smarter reuse of our actual Razor parser.
Problems with the current approach
Changing projection buffers is expensive When we redraw the projection buffer boundaries, this causes a bunch of other parts of the editor (C# and HTML) to do work. Currently we don't any differential updates in the case of a rejected edit, it updates everything.
Correctness is ad-hoc This is an example of a bug that occurs when we're not pessimistic enough.
Pessimism breaks Intellisense This is and example of a bug that occurs when we're too pessimistic.
We're duplicating the code The more we improve the fidelity of partial parsing without addressing the design issues, the more functionality we duplicate.
Proposed solution
The proposed solution is that we keep the core idea of a reactive parser that updates buffers, but that we use the real Razor parser and apply a diff to determine whether to accept or reject a parse. That means that we need to build ideas like provisional changes and regions that should always reject a change into the actual syntax tree.
What this looks like end-to-end:
- Add provisional data to the syntax tree
- Add rejected data to the syntax tree
- Run the real parser not the partial parser
- Apply a diff to the syntax tree and determine if delta matches the edit
Prioritization and costing
We should consider carefully what we're trading off here. On one hand, expanding the set of partial parser cases is expensive, ad hoc and error-prone. On the other hand, doing a big investment is risky, time consuming, and doesn't deliver any user value unless we actually address the problems identified here.
Our prioritization may change based on a few planned other investments:
- If we needed to build a 'component mode' for the Razor parser (Blazor), we would need to retrofit the Razor parser with a deeper understanding of HTML which would be almost a rewrite.
- If we were working on VS Code, we might want a language design that is less "chatty", in which case we would care deeply about the performance cost of partial parsing.
- If we found more significant IDE features that don't work correctly, that might raise the priority.
The biggest cost, and value of the Razor parser system currently is the tests.
Our tests are critical since they are the best 'spec' we have for the Razor language, and the tests for the actual parser are fairly thorough. If we chose to invest here, the first step would be to modernize these tests and reduce the cost of maintenance by moving the baselines into a serialization format. Using serialization and baselines makes it easy to inspect deltas in our parser's behavior, and makes it easy to automate evolution of the parser and syntax design.
We additionally should improve our coverage of the editor tests, by building a framework for the parser in motion. This could look similar to what Roslyn has. Currently our testing of the partial parser is ad-hoc and adding new tests is tedious.
These investments could happen at any time without risk or substantial diversion of our resources.
I think only after doing these improvements would we have the confidence to do this work in earnest.
As to the costing of rewriting the partial parser to use the actual parser, I think the cost is probably about 3 weeks for a single engineer. This includes getting rid of the concept of 'edit handler' and replacing it with a more semantic notion on each syntax tree node. We also need to account for the cases that currently are treated as provisional in the syntax tree, so that we can still behave correctly in those cases. We need to write a diff algorithm to determine how the syntax tree differs and whether to reject the edit.