Comments (7)
tabs are forced to be treated as four spaces
In a CST would it be encoded as four spaces?
Wouldn't it be preserved as a tab?
My understand of CSTs is that it will always reflect the literal text on the page.
from micromark.
I think it would be present as a tab, so it’ll reflect the literal text of the page indeed, but there needs to be an “extra” thing that’s part of the the indented code as its indent.
Could maybe be an {type: 'indent', value: ' ', size: 4}
(a node without only the one space, but with the size extra from the previous block quote tab?) 🤷♂️
from micromark.
I think it would be present as a tab, so it’ll reflect the literal text of the page indeed, but there needs to be an “extra” thing that’s part of the the indented code as its indent
🤔 let's split this off into another discussion.
from micromark.
It’s a block quote marker, followed by a tab (tabs are forced to be treated as four spaces).
The first “virtual space” of the tab is part of the block quote marker. The second three “virtual spaces” are part of the indent of the indented code.
One extra real space, and you’ve got a code indent of four spaces, making it a proper indented code, in a block quote.
I'd tend to expect the indentation always be in another node.
The spaces/tabs are needed to make it clear that >
is in fact a block quote and not plain text.
But I'm not sure they are truly a part of the blockquote, but are their own indent node.
That indent node, could be a child of blockquote, but would still be it's own node.
from micromark.
The spaces/tabs are needed to make it clear that > is in fact a block quote and not plain text.
Not quite, one space after >
is optional. ># heading
is a heading in a block quote.
That indent node, could be a child of blockquote, but would still be its own node.
Right! I’m thinking the nodes would look something like this: blockQuote[line[marker, indent, content]]
, where content
is indentedCode[line[indent, content]]
.
The problem I’m foreseeing is what the values of indent
in the block quote line and indent
in the indented code line are.
from micromark.
some more thoughts: The current attempt of micromark (the one checked in) is, I now believe, incorrect. It assumes Markdown can be parsed in blocks (which may work in an AST).
Markdown is instead line-based, I’m working on a new proposal, and I think virtual spacing should work something like this:
A line is made up of block continuation and block openings: >␉␠i
is a blockquote marker token, followed by a whitespace token. Because the whitespace token is big enough, the rest of the line is part of indented code.
While tokenizing, this “big enough” can be accomplished by using a “conceptual” (non-real) character: a “Virtual space”.
Say we have: >␉␉i
, the characters fed into the state machine are then: >, ␉, VS, VS, ␉, VS, VS, VS, i
. Now, every whitespace character has a size of 1, so this whitespace token has a size of 7.
The blockquote “uses” 1 size of that, the indented code uses 4, leaving the last VS, VS
characters. Those are part of the whitespace token, but can be used by something that compiles to HTML as a prefix before content: <blockquote><pre><code>␠␠i…
This does not affect tokens, if we’re doing something similar to mdast the whitespace token should look something like: {type: 'whitespace', value: '\t\t', characters: [9, -1, -1, 9, -1, -1, -1], size: 7, used: 5}
from micromark.
This is now defined and solved in CMSM, by defining virtual spaces and content prefixes.
from micromark.
Related Issues (20)
- 3.0.8 seems to introduce a module level dependency on document HOT 9
- `index.d.ts` is missing in `micromark-util-encode` published files HOT 3
- HTML with excess whitespace is not parsed correctly HOT 2
- List items wrapped in <p> tags due to trailing space HOT 3
- hard break at the end of a paragraph is not properly parsed HOT 3
- Make `definitions` available to extensions HOT 2
- Custom extensions break in development mode, despite working in production HOT 6
- & in image url will be encode to html entity HOT 2
- Configure collapsing newlines into a single paragraph HOT 3
- TokenizeContext.sliceSerialize throws in sliceChunks if first chunk of token is Code instead of string HOT 20
- Reduce execution time by ~11% with a simple reimplementation of TokenizeContext.now HOT 3
- nested ordered lists not starting with 1. are not detected HOT 4
- `TokenizeContext.sliceSerialize` for `Token.type` of `setextHeading` includes non-heading content from outside the range of [`startLine`, `endLine`] HOT 1
- `micromark-util-symbol` can not be imported by typescript HOT 9
- Strings ending with `\n-` are compiled into a level 2 heading HOT 3
- Error - [webpack] 'dist': ./node_modules/micromark-util-decode-numeric-character-reference/index.js 23:11 Module parse failed: Identifier directly after number HOT 12
- Emphasis and strong when immediately followed by emphasis in the same word causes extra asterisks to appear HOT 4
- Implementation of autolink and literalAutolink (micromark-extension-gfm-autolink-literal) are inconsistent when handling "@." HOT 10
- Including license in NPM packages HOT 4
- Performance: larger MDX files are unmanagably slow to parse HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from micromark.