Comments (11)
i think supporting inserting characters by codepoint is a good thing—especially with invisible or confusable characters it can be useful. i think HTML entity names are not so good; many of them are essentially legacy and the coverage is not necessarily complete or well‐thought‐out.
i don’t like the XML/HTML entity reference syntax because it makes the decimal form of codepoints &#NNNN;
easier to type than the hexadecimal form &#xNNNN;
. hexadecimal makes much more sense for unicode and i’m not sure that it even makes sense for decimal codepoints to be supported.
why not extend the emoji syntax to allow arbitrary characters by unicode codepoint, like :U+2764:
? perhaps even multiple characters could be included, such as :U+2764.FE0E:
(.
is commonly used in unicode documentation for delimiting sequences of codepoints). emoji are already a kind of entity reference, after all.
from djot.
entities are essential for comfort writing of mixed-language texts - e.g. when mixing right-to-left and left-to-right languages as is common e.g. in United Arab Emirates, Qatar, etc.
Can you explain a bit more why entities help with this? (E.g. give an example?)
from djot.
@uvtc the bigger concern is invisible characters, for example variation selectors, right‐to‐left and left‐to‐right marks, ligation marks (zero‐width joiner and zero‐width non‐joiner), characters which allow breaks (zero‐width space) and prevent them (word joiner), “shy” hyphens, etc…… in some text editors it may be possible to inspect whether these characters are present (CotEditor for example is very good), but in others it may not, and regardless simply having those characters written out in the text is often much easier to handle.
as an example, the codepoint U+3402
㐂 has five different registered variations, which may be indicated by appending the variation selectors U+E0100..U+E0104
. the font you are using when composing your document is not necessarily going to be the same one that you use when rendering it, so it may not support all of the different variants. it would be very useful to be able to write 㐂{:U+E0102:}
to explicitly indicate the third variant, because (depending on fonts etc) the composed form 㐂󠄂 might not look any different than the character without any variations applied.
similar arguments extend to things like wanting to type no‐{:U+2060:}break
to add a word‐joiner to suppress line breaking, etc…
as for having to remember the unicode codepoints as opposed to the names, i think many people probably would prefer writing {:U+E0102:}
rather than {:variation_selector-19:}
; it’s much shorter and easier to skim over in a line of text. in any case, supporting the latter would require unicode character name lookups, which would make implementation a little bit more difficult.
from djot.
If so Lua 5.3 style with braces \u{123}
so that one need only type as few digits as necessary.
from djot.
This said I think :0x14b:
and :331:
and hopefully {:0x14b:}
and {:331:}
would be a reasonable syntax as an extension of existing emoji syntax (which IMO should include {:emoji:}
) since it might allow processors to support custom names; :entity:
, {:Unicode name:}
or whatever.
from djot.
As there are escapes already, why not add unicode escapes as supported in many programming languages? Along the lines of \u1234
from djot.
I do not care about the syntax here but would like to point out entities are essential for comfort writing of mixed-language texts - e.g. when mixing right-to-left and left-to-right languages as is common e.g. in United Arab Emirates, Qatar, etc.
So any solution you come up in here has to be well readable (and comfortable to write) for characters changing the direction etc.
from djot.
Is the purpose for supporting entities to let you put in unicode characters when you're unable to insert the actual unicode character into your source? (That is, you know the character you want but cannot copy/paste it into your content file? Is it common to know the codepoint but not be able to copy/paste the character in?)
:U+2192:
(for "→") is pretty syntax, and symmetric with emoji syntax, but not very readable (unless you happen to know that 2192 mean "→"). Those html entities are more readable :&rarr:
(and potentially easier to remember), though I agree with @marrus-sh about their problems.
I didn't realize that the list of djot-supported emojis was so large. Seems like adding 10 or 20 commonly-used readable unicode char names like :right-arrow:
wouldn't be too crazy, would it?
from djot.
@jgm sorry for the delay - yes, the intent is mostly what @marrus-sh wrote above. Namely to make visible all those characters (incl. future ones) which change or influence overall "style", "form", "layout", "paragraphing", etc.
from djot.
See my idea in #112 of generalizing the syntax currently used for emojis.
The idea would be that :smile:
is parsed as, say,
special text="smile"
If you use emojis, you can use this syntax for them with a filter that inserts the emoji character proper to the alias. But you could just as easily use a different filter to associate whatever unicode string you like.
from djot.
I have written a simple Pandoc filter which replaces codepoint escapes like :0x14b:
, :331:
in strings with characters.
Gotcha: a literal colon (:) next to a digit must be escaped as :58:
/:0x3a:
!
local char = utf8.char
local pat = '(%:(%w+)%:)'
local function subst (match, id)
--'If we can numify it it is probably a codepoint!'
local cp = tonumber(id)
if cp then
--'If the codepoint is out of range char throws a scarcely helpful error!'
local ok, res = pcall(char, cp)
if ok then
return res
end
error("Failed to convert " .. tostring(match) .. " to a character:\n\t" .. tostring(res))
end
return match
end
function Str (str)
return pandoc.Str(str.text:gsub(pat, subst))
end
It could easily be ported to a djot filter using my pure-Lua char function from #44 (comment)
from djot.
Related Issues (20)
- SyntaxError after installation HOT 4
- Code generation from ast.ts/djot-schema.json HOT 3
- Templating, symbols and conditionals
- Class names for spans similar to divs HOT 7
- Proposal: explicit syntax for custom tags HOT 9
- Possible parsing ambiguity: attribute key starting with punctuation HOT 7
- Is a backslashed space still whitespace? HOT 5
- Why aren't comments in the AST? HOT 2
- Allow attributes for list items, table cells and rows
- how are block elements recognized? HOT 2
- Comment syntax is unclear HOT 1
- Style change doesn't break task lists HOT 1
- Clarification on tab indentation rules HOT 3
- SSL_ERROR_BAD_CERT_DOMAIN for djot.net HOT 3
- "flag" attributes HOT 3
- Standalone attributes HOT 6
- Attribute references HOT 4
- Playground should provide examples
- Tables with a column of th elements HOT 1
- Typo in syntax.md
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from djot.