Comments (7)
Actually, you’ve uncovered (at least) 3 bugs:
(1) I haven’t made prominent enough that content mathml
is not, in any realistic sense, yet supported;
(2) The “Superscript” shouldn’t generate an empty ,
of course, but probably a (?)…
but then, that doesn’t really solve that much anyway;
and the real underlying issue is
(3) LaTeXML’s parser isn’t really smart enough to know
whether the superscript is a power.
Dealing with the last issue is a longer term prospect,
particularly because DLMF keeps me pretty well sidetracked.
I’ve speculated on various ways to approach the problem
(a) some sort of type-analysis to figure out what a superscript
(or even worse, a subscript) could mean; (b) some richer
declarative language to say what a superscript on a particular
symbol could mean; and © some heuristic mode along the
lines of “do the most likely reasonable thing”..
and most likely some combination.
I guess I should leave this open as a reminder,
but I’m not likely to make much progress in the short term.
Thanks for your comments, in any case.
from latexml.
Dear Bruce,
I have been dispatched to try and help patch the postprocessing to an extent that would allow some meaningful input for [MathWebSearch](http://www.mathweb.org/mws/) to run on. This error report came from Stefan who is currently developing that very application.
Since you point out a long term solution would be a tough nut to crack, would you have a preference to a short term “patch up” that would allow us to get some helpful temporary results?
I’m currently investigating this (or trying to) and would pop up with more follow up comments later this week.
from latexml.
Hi Deyan;
This sounds like a good idea.
A good first, or early, step would be to characterize the
cases where latexml is either generating bad markup (eg empty ),
or doesn’t have enough information (at least sub and superscripts).
Note that a superscript can easily be “converted” to a power by
adding a meaning to the XMTok. But also note that the limits
of sums and integrals currently (i think) end up as simple
sub & superscripts without any extra annotation, even though
we in principle know more about them.
Ultimately, it’s a question of finding insufficiently characterized
tokens, and inferring, guessing or defaulting some sort of meaning
for each instance.
from latexml.
remembering these problems, I just added semantic markup for all exponentials in sTeX in my GenCS slides. But of course this is no solution to the general case :-).
from latexml.
So, coming back to this.
Bruce, how would you conceive a first temporary patch taking the following shape:
- As a temporary compromise, we assign meaning “ambiguous-sub/superscript” to any XMTok having a role “SUB/SUPERSCRIPTOP”
- Next, design a subroutine that handles the ambiguous script case, based on a very small context – the next token it is applied to.
- In this sub, so far I could think of the following meaningful default. Whenever the succeeding symbol has no/unknown meaning and role (both!), then we can “assume” (cough) we are seeing a variable and default the script to be a “power” (double-cough).
Now, obviously, this will be wrong at a lot of cases, but it will be correct in the most widespread one and is certainly better than the current broken markup.
OTOH, you mentioned these heuristics are to be made on the LaTeXML level and not during postprocessing. So, I have a second version of this proposal, following this scenario:
- There is some scary MathGrammar and MathParser interaction generation I don’t yet understand, that could possibly do the defaulting. It seems to be the right place, but meddling with the grammar seems mind boggling.
- If the heuristic check is done there, then we can keep the current look of scripts, just putting a meaning “power” to the ones matching the criteria. Postprocessing will already work the way we want it having this, right?
I know I am discussing a rather isolated case with a single case heuristic fix, yet that’s what I can see myself doing as a starting point. Writing code inside of the modules is still looking quite challenging in my eyes, although I am comfortable with writing contained subroutines here and there.
from latexml.
I've fixed the sub/superscript bug by generating
That obviously doesn't solve the long term problem,
but hopefully gives some room to play...
I started using an imaginary cd "latexml" until
we can come up with a reasonable mapping between
latexml "meanings" and OM cd's (& a customizable mechanism).
rev 952.
from latexml.
correction: I used cd="ambiguous" for this symbol,
(but cd="latexml" for most others...)
from latexml.
Related Issues (20)
- JATS: <inline-graphic> not output into JATS
- JATS: proper way to add author data (e.g. email, orcid) with LaTeXML
- JATS: proper way to add "author date" for LaTeXML
- JATS: proper way to add license/copyright info
- JATS: Use Commonmeta to generate JATS metadata HOT 8
- Install Fails on Windows 10 via "cpan LaTeXML" HOT 1
- Would like the ability to add static assets at a deeper directory level than the main html file HOT 2
- \hline in array creates empty row
- feat: support alt text via \Description command in ACM template HOT 1
- Text alignment HOT 1
- link to (sub)section in title of that (sub)section HOT 3
- Space between \micro-prefix and rest of unit symbol with siunitx HOT 1
- View of a page in a responsive mode with a smartphone HOT 2
- xfrac package fails with mathsvg HOT 5
- siunitx fails in mathmode with --mathsvg HOT 3
- Project home page link gives 404 HOT 1
- add viewBox to inline SVGs HOT 1
- support font sizes provided by the class
- Get Harps manual HOT 1
- Support for the .sty files already included in the directory HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from latexml.