Code Monkey home page Code Monkey logo

miogatto's People

Contributors

babyygemperor avatar delta-river avatar musashi1729 avatar wtsnjp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

miogatto's Issues

A few extensions for the jumping functions

These are a few small update requests for #59.

  • Make the button with jQuery UI for consistency
  • When no mi is selected
    • next: select the first unannotated mi
    • prev: select the last unannotated mi
  • Add keyboard shortcuts

when selecting some mathematical objects, the main page moves

Paper ID: 1711.09576

when selecting different mathematical objects (such as 'q' and 's') in the 6th line of the second paragraph below, the appearance of the second paragraph changes.

image

image

what is more, after repeat selecting different objects for 10 times (like (select q -> reload -> ) select s, q, s, q, s, q ,s, q, s, q) , the page start moving when selecting different objects in the line.

arity and args_type not popped up

currently, only the description of concept is popped up when mouse pointer is over the texts.
But, it would be more helpful if all the other information of concept (arity and args_type) are shown.

Argument type not reset

new conceptを追加する際に、argument typeを適当に設定した後にOKせずにCancelして、その後にNewからまた追加しようとするとargument typeの設定がリセットされずそのままになっています。
image

New feature: special edit/view mode for math concept dictionaries

We need another mode for viewing and editing the math concept dictionaries. This should be served somewhere else from the index /, e.g., /mcdict.

Requested spec:

  • List all the items in the dictionary for a document
    • It should be categorized by the type of math identifier
    • It should also show the number of occurrences and SoGs associated with each concept
  • All the contents of the dictionary items can be edited on the screen

Relevant SoGs not highlighted

When using the option "Highlight only relevant SoGs", sometimes sogs are not highlighted even if the corresponding mi is selected.
This seems to happen when there is a overlap between sogs of different mis.

Incomplete img path

Problems:

  • For some figures, the img path is incomplete; it doesn't have filename

Following is an example from 1605.02821:
image

Causes:

  • Sometimes, the output of LaTeXML contains img tag with empty string for src attribute [example from 2003.10641]
    • the cause of this error is still unknown
  • And current tools/preprocess.py does not detect such errors resulting in incomplete img paths

Unnecessary tailing spaces in preprocess output

If you reprocess the source with the latest tools.preprocess, it seems that unnecessary tailing space may be introduced, for example, in the span element of an email address.

MWE

python -m tools.preprocess 1609.06038.html

you will get the following diff (modified to mask personal information and emphasize the tailing space):

diff --git a/sources/1609.06038.html b/sources/1609.06038.html
index 0ff1c29..3d4d6f7 100644
--- a/sources/1609.06038.html
+++ b/sources/1609.06038.html
@@ -17,19 +17,19 @@
 <span class="ltx_personname">
 Firstname Lastname
 <br class="ltx_break"/>University of ***
-<br class="ltx_break"/><span id="id1.1.id1" class="ltx_text ltx_font_typewriter">***@***</span>
+<br class="ltx_break"/>***@***␣

Texts in \fbox cannot be added as sogs

Currently the "gd_word" class is not given to the texts (words) in \fbox, thus making it impossible to add them as sources of grounding.

Solution

Since the text spans in the \fbox possibly have "ltx_inline-block" class, the "gd_word" class can be given to those texts in the similar way to footnotes [source].
However, we have to be careful not to break the word IDs of previous annotations.

Figure and Caption Misalignment

As for the paper (id: 1808.02342), the figures and the captions are not correctly aligned.
The order of the figures is not wrong, however, still there are the following problems:

  • the captions of figures are not correct (it looks like the caption for Fig 2 is skipped; thus, for Fig n(>1), caption for Fig (n-1) is corresponding to)
  • Fig 15 is completely vanished from where it should be

Add frontend UI to select data files

Currently, users need to select data files via the command line options of the server. We might want to select or change the data files with the Web UI instead of the command line options.

No button to go back to index page from edit_mcdict page

While there is a button to move to edit_mcdict page from index page, there is no button to go back to index page from edit_mcdict page.

This feature may not be necessary because such transition can be realized by simply clicking the "back button" of the browser.
If there is no need to implement the feature, this issue can be closed.

Ensure consistency between source and annotation files

We will add metadata to the annotation files (anno and mcdict) to identify the source HTML that served as input during the preprocess that generated them.

There are two possible policies:

  • Embed a timestamp of the preprocess execution into both the source HTML and the annotation files
  • Take the MD5 hash of the source HTML during preprocess execution and embed it in the annotation file.

The second method has the advantage that if the source HTML is the same, the metadata embedded will not change no matter how many times the preprocess is executed. However, there is a possibility of inconsistency if the preprocess logic changes. At the very least, when applying an operation that changes the structure of the source HTML (e.g., patch_source), a hash should be taken of the HTML content after the operation.

Filename convention for the data files

MioGatto's current specification embeds important information in the filenames of source HTMLs and data files, and if the filename does not conform to these rules, it will not work properly. We are considering changing the specification so that important information, such as paper ID, is embedded in the file's content rather than in the filename, and the filename can be freely determined.

We are also considering changing the specification to specify the data file type with _anno or _mcdict. We do not rule out merging these files into a single file.

This issue is closely related to #74.

`\log` with different mathvariant

Currently, there are two different identifiers for \log. When \log is used in theorem environment, its mathvariant is "roman", though when \log is used in other environment, its mathvariant is "default" resulting in different identifiers.

The following example is from 2004.08500:
image

highlighting sources of groundings

problem

Currently, when the same text span is added as a source of grounding of different mathematical objects, the color representing the corresponding concepts are overpainted.

Thus, it is difficult to check if source of groundings.

solution?

only highlighting text spans corresponding to the selected mathematical object or concept.

No args_type for `\langle P(f) \rangle`

In the following example (A4.Ex4.m1.2.2.1.1.1.2 from 1805.08522), the args_type of P may include \langle, (, ) and \rangle.
However, currently there is no args_type for \langle and \rangle. Furthermore, simply adding \langle won't be sufficient because \langle appears in the left side of P.

image

Scrolling problem

When, scrolling the annotation candidates box, sometimes the whole page is scrolled instead of the candidates in the box.

error with latexmlc preload command referenced in the README

Hello,

When I run the latexmlc command described in the MioGatto README I get an error, regardless of the input file content.
This might be a question for https://github.com/brucemiller/LaTeXML but I think latexmlc is operating correctly.

I have

latexmlc --VERSION
latexmlc (LaTeXML version 0.8.2)

I can create a file (the specific content does not matter)

echo "hello" > file.tex

This command completes successfully:

latexmlc --format=html5 --pmml --cmml --mathtex --nodefaultresources --dest=out.html file.tex

The expected out.html file is created by latexmlc. That indicates latexmlc is working.

However, when I run the command in the MioGatto README I get an error:

latexmlc --preload=[nobibtex,ids,mathlexemes,localrawstyles]latexml.sty --format=html5 --pmml --cmml --mathtex --nodefaultresources --dest=out.html file.tex 

(Loading /usr/share/perl5/LaTeXML/Package/TeX.pool.ltxml...
(Loading /usr/share/perl5/LaTeXML/Package/eTeX.pool.ltxml... 0.01 sec)
(Loading /usr/share/perl5/LaTeXML/Package/pdfTeX.pool.ltxml... 0.01 sec) 0.22 sec)
(Loading /usr/share/perl5/LaTeXML/Package/latexml.sty.ltxml...
Error:undefined:\default@ds The token T_CS[\default@ds] is not defined.
	/usr/share/perl5/LaTeXML/Package/latexml.sty.ltxml line 47
	Defining it now as <ltx:ERROR/>
	In Core::Stomach[@0x55e076033e50] /usr/share/perl5/LaTeXML/Package/latexml.sty.ltxml line 47
	 <= Core::Gullet[@0x55e076033568] <= Core::Stomach[@0x55e076033e50] <= Core::Gullet[@0x55e076033568] <= ...
 0.02 sec)

Initialization complete: 1 error; 1 undefined macro[\default@ds]. Aborting.

(Loading /usr/share/perl5/LaTeXML/Package/TeX.pool.ltxml...
(Loading /usr/share/perl5/LaTeXML/Package/eTeX.pool.ltxml... 0.01 sec)
(Loading /usr/share/perl5/LaTeXML/Package/pdfTeX.pool.ltxml... 0.01 sec) 0.22 sec)
(Loading /usr/share/perl5/LaTeXML/Package/latexml.sty.ltxml...
Error:undefined:\default@ds The token T_CS[\default@ds] is not defined.
	/usr/share/perl5/LaTeXML/Package/latexml.sty.ltxml line 47
	Defining it now as <ltx:ERROR/>
	In Core::Stomach[@0x55e0760306e8] /usr/share/perl5/LaTeXML/Package/latexml.sty.ltxml line 47
	 <= Core::Gullet[@0x55e076030508] <= Core::Stomach[@0x55e0760306e8] <= Core::Gullet[@0x55e076030508] <= ...
 0.02 sec)

Initialization complete: 1 error; 1 undefined macro[\default@ds]. Aborting.
Initialization failed.
Error! Did not write file out.html

I looked at https://github.com/brucemiller/LaTeXML/blob/master/lib/LaTeXML/Package/latexml.sty.ltxml but don't understand the code well enough to understand the cause of this problem.

I suspect the cause is a missing .sty file? When I try just one input .sty I get "missing file" at the end

latexmlc --preload=nobibtexlatexml.sty --format=html5 --pmml --cmml --mathtex --nodefaultresources --dest=out.html file.tex 

(Loading /usr/share/perl5/LaTeXML/Package/TeX.pool.ltxml...
(Loading /usr/share/perl5/LaTeXML/Package/eTeX.pool.ltxml... 0.01 sec)
(Loading /usr/share/perl5/LaTeXML/Package/pdfTeX.pool.ltxml... 0.01 sec) 0.22 sec)
Warning:missing_file:nobibtexlatexml Can't find package nobibtexlatexml
	at Anonymous String; line 0 col 0
	Anticipate undefined macros or environments
	search paths are /scratch

latexmlc (LaTeXML version 0.8.2)
processing started Fri May 27 22:33:13 2022

(Digesting TeX file...
(Processing content /scratch/file.tex... 0.01 sec) 0.02 sec)
(Building...
(Loading compiled schema /usr/share/perl5/LaTeXML/resources/RelaxNG/LaTeXML.model... 0.02 sec). 0.07 sec)
(Rewriting... 0.00 sec)
(Finalizing... 0.00 sec)
Conversion complete: 1 warning; 1 missing file[nobibtexlatexml.sty].

(post-processing...
(Scan out.html processing... 0.00 sec)
(CrossRef out.html processing... 0.00 sec)
(XSLT[using LaTeXML-html5.xsl] out.html processing... 0.02 sec)
(Writer out.html processing... 0.01 sec) 0.03 sec)
Post-processing complete: No obvious problems
processing finished Fri May 27 22:33:13 2022
Status:conversion:1 
1 warning; 1 missing file[nobibtexlatexml.sty]
No obvious problems
Wrote out.html

For the latexmlc command in the MioGatto README, is there a nobibtexlatexml.sty file being referenced?

I also found that DeclareOption('nobibtex', sub { AssignValue('NO_BIBTEX' => 1, 'global'); }); is in LaTeXML/lib/LaTeXML/Package/latexml.sty.ltxml, so that means I'm not missing a .sty file.

If this question should be posted to https://github.com/brucemiller/LaTeXML I will do that.

text span selection problem when adding source of grounding

Paper ID: 1711.09576

  • s2.p3 last sentence "The RNN-acceptor"->
    image
  • s2.p7.l7 "PyTorch framework"->
    image
  • s2.p7.l7 " " (space between PyTorch and framework)->(result differs sometimes)
    image
    image
    image
    (some parts of the incorrectly selected span does not allow "delete source" <- footenote)
    (and sometimes page position moves)
  • S2.p3.13.m13.2.2.4.2 ("returns an accept or reject decision" -> as bellow)
    image

Algorithm block appears in the middle of main text

Concern:

  • currently, the algorithm block defined as the following is included as targets of annotation
  • this happens in 1107.4466 and 1905.11006 (following example is from 1905.11006)
\begin{algorithm}
    \begin{algorithmic}
        ...
    \end{algorithmic}
\end{algorithm}

image

  • besides, the conversion of LaTeXML seems to be incomplete

image

Jump to current mi function not implemented

Currently, there is no option/function for jumping back to the current selected mi.

The function is necessary for the following reasons:

  • the position of main page sometimes moves unexpectedly after doing some operations ← already fixed
  • it is easier to restart annotation after looking back previous sections

Mathvariant problem for mi with more than one characters

Problem

Mis that look apparently the same might have different underlying identifiers, because identifiers are currently identified by their texts and their mathvariants.

Current situation

  • Currently, for most of the identifiers, mathvariant attribute is not given
  • Even if \mathrm is used for mis with more than one characters, mathvariant is not set (which means "default" in Miogatto)
  • But, in some specific environment or situations, e.g., section title, Theorem environment, etc.), mathvariant attribute is set to "normal" ("roman" in Miogatto)
  • For mis with more than one characters, MathML by default renders them in roman

What needs to be fixed?

  • *_mcdict.json
    • remove the "default" entries for mis with more than one characters
    • or merge the "default" entry with "roman"
      • Note that *_anno.json need modifications when *_mcdict.json are to be merged
  • lib/util.py get_mi2idf
    • This function also needs some treatments for mis with more than one character whose mathvariant is either empty or roman
  • client/index.ts get_idf

There maybe some other possible fixes, but note that just modifying get_mi2idf is not enough because mis that used to have roman mathvariant may be assigned with inconsistent concept_ids.

New feature: jump to next/previous unannotated mi

This is somewhat related to #27. Add a new button to the sidebar that, when clicked, moves both the selection and focus to the next unannotated mi after the currently selected mi. The same goes for the back action.

Figure environments with more than two figures

Problem:

Currently, when there are more than two figures in a figure environment, tools/preprocess.py causes two problems.

The following example is from 2010.00710:

  1. add-sourcing the span "Effect of the number of neighbors" in the caption of Figure 2 results in additionally add-sourcing the span "Effect of datastore size on the" in the caption of Figure 3
  2. there is no img path written for both Figure 2 and Figure 3

image

Cause:

  1. the gd_words of the captions of Figure 2, 3 have the same ids because both figcaptions share the same parent (and its id) when applying embed_word_span_tags [corresponding part of perprocess.py] [corresponding part of the source html]
  2. img path is added when there is only one figcaption in a figure environment [corresponding part of preprocess.py]

Description edit

addしたnew conceptのdescriptionが編集できると作業しやすいので、機能追加していただけるとありがたいです。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.