Comments (15)
About terminology, I'd say that "IR" is the genus and "AST" is one species.
ok, thank you.
RE the bigger question, some things to consider:
-
I would say that Pandoc's solves the problem of translation between so many different input and output forms by defining an IR that is syntax independent and more or less a semantic superset of those syntaxes. Then the question becomes whether representing paragraphs that span block quotes (or other elements, see below) is a universal or common enough to warrant complicating Pandoc's IR.
-
An old W3C www-html list discussion: Re: Lists within Paragraphs. An excerpt:
I think this is part of a bigger problem. Paragraph's can't contain block level elements. At first this seems to make a lot of sense. But it doesn't work in many instances. For example often block level mathematical formulas occur in paragraphs. If we consider x + y = z as such an example, we see that in this case this paragraph is the still the same one, but we have a block level element in it.
-
The HTML spec's ultimate answer admits that paragraphs might logically span block elements, but that it doesn't apply to the HTML standard:
List elements (in particular,
ol
andul
elements) cannot be children ofp
elements. When a sentence contains a bulleted list, therefore, one might wonder how it should be marked up.For instance, this fantastic sentence has bullets relating to
- wizards,
- faster-than-light travel, and
- telepathy,
and is further discussed below.
The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.
The markup for the above example could therefore be:
<p>For instance, this fantastic sentence has bullets relating to</p> <ul> <li>wizards, <li>faster-than-light travel, and <li>telepathy, </ul> <p>and is further discussed below.</p>
Authors wishing to conveniently style such "logical" paragraphs consisting of multiple "structural" paragraphs can use the
div
element instead of thep
element.Thus for instance the above example could become the following:
<div>For instance, this fantastic sentence has bullets relating to <ul> <li>wizards, <li>faster-than-light travel, and <li>telepathy, </ul> and is further discussed below.</div>
This example still has five structural paragraphs, but now the author can style just the
div
instead of having to consider each part of the example separately. -
Allowing paragraphs to span/nest block elements provides, I think, a cleaner and more consistent solution to "tight lists". For example, the following would be a tight list because each list item contains exactly a single element (a paragraph):
- para 1 - para 2 - a - b - c - para 3
The current CommonMark solution has flaws, as can be seen by comparing
- item one - item two # a heading more text - item three
with
- item one - item two a heading --------- more text - item three
Both should be treated as loose lists since the second item in each contains block sequences, but CommonMark's determination is based on the existence or lack thereof of blank lines in the source, not logical structure.
I hope this is helpful. Please let me know if you've had enough! It just happens to be a question I've been trying to tackle myself.
from djot.
Re. @bpj 's suggestion about indenting: would this cause a problem with putting lists between paragraphs? That is, with a list you may (and typically) indent the list marker. Is there a difference between a list that's its own paragraph vs a list that's in the midst of a paragraph?
from djot.
Why not indentation for an embedded blockquote? That seems the most intuitive to me.
from djot.
I mean that if what follows the blockquote is a continuation the blockquote is indented == the blockquote is embedded in a paragraph.
paragraph
> blockquote inside paragraph
rest of paragraph (continuation)
vs.
first paragraph
> blockquote after paragraph
another paragraph
I hope that is clearer.
from djot.
Thinking about a syntax for, "anyhow, as I was saying", I was going to suggest ...
, as in:
The boat ride took us through the everglades.
> It was one of those "airboats" with the giant propeller.
... We saw a lot of birds but no alligators.
But that causes a pretty big indent, and ...
already automatically gets you a "…" in djot, and it might cause problems when the author wants an actual ellipses.
The leading underscore is ok, but also does make me think italics.
Since "and" is at least somewhat close to "anyhow, as I was saying", maybe &
?
The boat ride took us through the everglades.
> It was one of those "airboats" with the giant propeller.
& We saw a lot of birds but no alligators.
I like that one because,
&
is not currently used for any other djot syntax,- I can read it as "and" ("and as I was saying") and it kinda works. :)
- the glyph itself also looks somewhat like any other alphabet letter, and so is not as distracting (does not stand out so much on the page) as the
_
, which I think is a desirable characteristic for this bit of markup.
from djot.
Including lists in paragraphs is an important use case for the kind of writing that I do, at least, and neither the suggestion of a leading _
nor the suggestion of indentation work well for this case.
I would need to distinguish between all of the following:
A:
Some text:
\begin{itemize}
\item A
\item B
\end{itemize}
And more text
B:
Some text:
\begin{itemize}
\item A
\item B
\end{itemize}
And more text
C
Some text:
\begin{itemize}
\item A
\item B
\end{itemize}
And more text
D
Some text:
\begin{itemize}
\item A
\item B
\end{itemize}
And more text
Leading underscore works to distinguish A from C. But not to distinguish A from B, nor C from D. It catches part of the A/D distinction.
Indentation doesn't work for any of them.
An alternative design is a convention that there's a div for "multi-paragraphs" that contain multiple block elements. It's ugly but accurate:
::: {.paragraph}
Some text:
* A
* B
And more text
:::
would denote option A.
This would be tool specific, however, but that's perhaps OK - I think the need for this kind of thing tends to arise in long-form scientific writing more than in smaller, casual documents, so having a Googleable solution like this is perhaps OK. This also remains compatible with the various ASTs out there.
from djot.
One could use a single dot on a line as a "connector" that says: the following normally-block-level thing is to be considered as part of the current paragraph. Then your A is
Some text
.
- A
- B
.
more text
and your B is
Some text
- A
- B
.
more text
and so on. Of course, this would require figuring out an AST model that actually permits this sort of thing. And some (most?) output formats just won't allow a list or a block quote to be part of a paragraph: in HTML for example, a p element can only contain "phrasing content."
from djot.
The question is not about the syntax of the block quote, but about how to mark what follows it as either a new paragraph or a continuation of the previous one.
from djot.
Yes, got it now.
from djot.
Does jdot's AST support block elements nested within a paragraph?
from djot.
We wouldn't need the AST to support block elements as children of a paragraph. It would be sufficient just to be to mark the following content as "not a new paragraph."
from djot.
@jgm,
We wouldn't need the AST to support block elements as children of a paragraph. It would be sufficient just to be to mark the following content as "not a new paragraph."
I understand. Would you mind answering a related long standing question I've had about terminology?
Is there a distinction between an abstract syntax tree and an intermediate representation? Since djot parses to an AST, and since you are proposing a new djot syntax for paragraph continuation, your suggested approach above makes sense. But if instead you needed to model a general abstraction of structured text, independent of any specific syntax, such as the "AST" at Pandoc's core, then it might be better to represent it as a single paragraph with a nested block quote, yes? And whether or not that is the better representation of this specific case, would you agree that there is nonetheless a difference between an AST and an IR, and that the core data structure of Pandoc is better characterized as an IR?
from djot.
It might make more sense conceptually to allow a block quote to be a child of a paragraph. But this would make the interface with Pandoc's types more complicated. I don't know what is best.
About terminology, I'd say that "IR" is the genus and "AST" is one species.
from djot.
Just commenting to second @bpj's proposed use of indentation for this.
from djot.
AsciiDoc uses the plus sign (+) as the so called list continuation: https://docs.asciidoctor.org/asciidoc/latest/syntax-quick-reference/#ex-complex
from djot.
Related Issues (20)
- SyntaxError after installation HOT 4
- Code generation from ast.ts/djot-schema.json HOT 3
- Templating, symbols and conditionals
- Class names for spans similar to divs HOT 7
- Proposal: explicit syntax for custom tags HOT 9
- Possible parsing ambiguity: attribute key starting with punctuation HOT 7
- Is a backslashed space still whitespace? HOT 5
- Why aren't comments in the AST? HOT 2
- Allow attributes for list items, table cells and rows
- how are block elements recognized? HOT 2
- Comment syntax is unclear HOT 1
- Style change doesn't break task lists HOT 1
- Clarification on tab indentation rules HOT 3
- SSL_ERROR_BAD_CERT_DOMAIN for djot.net HOT 3
- "flag" attributes HOT 3
- Standalone attributes HOT 6
- Attribute references HOT 4
- Playground should provide examples
- Tables with a column of th elements HOT 1
- Typo in syntax.md
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from djot.