Code Monkey home page Code Monkey logo

Comments (7)

brucemiller avatar brucemiller commented on July 22, 2024

Actually, you’ve uncovered (at least) 3 bugs:
(1) I haven’t made prominent enough that content mathml
is not, in any realistic sense, yet supported;
(2) The “Superscript” shouldn’t generate an empty ,
of course, but probably a (?)…
but then, that doesn’t really solve that much anyway;
and the real underlying issue is
(3) LaTeXML’s parser isn’t really smart enough to know
whether the superscript is a power.

Dealing with the last issue is a longer term prospect,
particularly because DLMF keeps me pretty well sidetracked.
I’ve speculated on various ways to approach the problem
(a) some sort of type-analysis to figure out what a superscript
(or even worse, a subscript) could mean; (b) some richer
declarative language to say what a superscript on a particular
symbol could mean; and © some heuristic mode along the
lines of “do the most likely reasonable thing”..
and most likely some combination.

I guess I should leave this open as a reminder,
but I’m not likely to make much progress in the short term.
Thanks for your comments, in any case.

from latexml.

dginev avatar dginev commented on July 22, 2024

Dear Bruce,

I have been dispatched to try and help patch the postprocessing to an extent that would allow some meaningful input for [MathWebSearch](http://www.mathweb.org/mws/) to run on. This error report came from Stefan who is currently developing that very application.

Since you point out a long term solution would be a tough nut to crack, would you have a preference to a short term “patch up” that would allow us to get some helpful temporary results?

I’m currently investigating this (or trying to) and would pop up with more follow up comments later this week.

from latexml.

brucemiller avatar brucemiller commented on July 22, 2024

Hi Deyan;
This sounds like a good idea.
A good first, or early, step would be to characterize the
cases where latexml is either generating bad markup (eg empty ),
or doesn’t have enough information (at least sub and superscripts).

Note that a superscript can easily be “converted” to a power by
adding a meaning to the XMTok. But also note that the limits
of sums and integrals currently (i think) end up as simple
sub & superscripts without any extra annotation, even though
we in principle know more about them.

Ultimately, it’s a question of finding insufficiently characterized
tokens, and inferring, guessing or defaulting some sort of meaning
for each instance.

from latexml.

kohlhase avatar kohlhase commented on July 22, 2024

remembering these problems, I just added semantic markup for all exponentials in sTeX in my GenCS slides. But of course this is no solution to the general case :-).

from latexml.

dginev avatar dginev commented on July 22, 2024

So, coming back to this.

Bruce, how would you conceive a first temporary patch taking the following shape:
- As a temporary compromise, we assign meaning “ambiguous-sub/superscript” to any XMTok having a role “SUB/SUPERSCRIPTOP
- Next, design a subroutine that handles the ambiguous script case, based on a very small context – the next token it is applied to.
- In this sub, so far I could think of the following meaningful default. Whenever the succeeding symbol has no/unknown meaning and role (both!), then we can “assume” (cough) we are seeing a variable and default the script to be a “power” (double-cough).

Now, obviously, this will be wrong at a lot of cases, but it will be correct in the most widespread one and is certainly better than the current broken markup.

OTOH, you mentioned these heuristics are to be made on the LaTeXML level and not during postprocessing. So, I have a second version of this proposal, following this scenario:
- There is some scary MathGrammar and MathParser interaction generation I don’t yet understand, that could possibly do the defaulting. It seems to be the right place, but meddling with the grammar seems mind boggling.
- If the heuristic check is done there, then we can keep the current look of scripts, just putting a meaning “power” to the ones matching the criteria. Postprocessing will already work the way we want it having this, right?

I know I am discussing a rather isolated case with a single case heuristic fix, yet that’s what I can see myself doing as a starting point. Writing code inside of the modules is still looking quite challenging in my eyes, although I am comfortable with writing contained subroutines here and there.

from latexml.

brucemiller avatar brucemiller commented on July 22, 2024

I've fixed the sub/superscript bug by generating

That obviously doesn't solve the long term problem,
but hopefully gives some room to play...

I started using an imaginary cd "latexml" until
we can come up with a reasonable mapping between
latexml "meanings" and OM cd's (& a customizable mechanism).

rev 952.

from latexml.

brucemiller avatar brucemiller commented on July 22, 2024

correction: I used cd="ambiguous" for this symbol,
(but cd="latexml" for most others...)

from latexml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.