Code Monkey home page Code Monkey logo

latex2unicode's People

Contributors

aesakamar avatar atcold avatar dweiss avatar florian-beetz avatar i10416 avatar linusdietz avatar siedlerchr avatar tobiasdiez avatar tomtung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

latex2unicode's Issues

Problems in setting up latex2unicode

Thank you for providing this quite nice looking library!

For the JabRef reference manager (https://github.com/JabRef/jabref), we are currently looking for a replacement for our custom internal conversion from LaTeX to unicode. This library seems very suited and so I was giving it a try. Unfortunately, I partly get incorrect results and therefore I wonder if I did some mistake in the setup.

I added latex2unicode via a compile dependency in gradle (which by itself works fine). From the gradle file:

compile 'com.github.tomtung:latex2unicode_2.11:0.1-SNAPSHOT'

I can use DefaultLatexToUnicodeConverter in our code, but it fails to perform a correct conversion in some very simple cases. For instance, the following JUnit test:

    @Test
    public void testUmlaut() {
        assertEquals("ä", DefaultLatexToUnicodeConverter.convert("{\\\"{a}}"));
    }

results in

org.junit.ComparisonFailure: 
Expected :ä
Actual   :"a

Now, I would be very surprised if the conversion for something as simple as an Umlaut did not work. Also, it works fine in your web interface. Might this be a problem with the String escaping in Java? In the above example, I have to used three \. One to escape the following backslash and one to escape the ". If this is the problem, is there a way to circumvent this on the side of latex2unicode? Or will I have to manually unescape Java Strings before I pass them into latex2unicode? From the README I was expecting that escaping is no problem. Otherwise, could this be related to the encoding (UTF-8 in my case)?

Any hints are welcome!

Regards
Jörg

Refs JabRef/jabref#2465

Incorrect display of LaTeX string with TeX commands

This issue is related to the use of latex2unicode in the JabRef project and hence is possibly beyond the scope of this project. (issue JabRef/jabref#2651)

Author strings that contain special characters (such as Č) in citations exported by APS journals include some TeX code \ifmmode ... \else .. \fi{} so that the entry will display correctly if the environment is math mode or normal. For example:
author = {\ifmmode \check{C}\else \v{C}\fi{}ernot\'{\i}k, Ond\ifmmode \check{r}\else \v{r}\fi{}ej and Hammerer, Klemens}

Using the online demo of latex2unicode (which uses version 0.2 of latex2unicode) and also the development version of JabRef, which, as far as I can tell, uses the latest version of latex2unicode, this does not show up correctly, producing two copies of each of the special characters rather than one, e.g. Č Černotı́k, Ond ř řej and Hammerer, Klemens when this should be: Černotı́k, Ondřej and Hammerer, Klemens.

Haskell port

Hey there!

Recently I did a little survey of existing "latex to unicode" converters. As far as I can tell, latex2unicode is by far the best out there, due to genuinely parsing the input instead of doing a naive find-and-replace.

As such, I really wanted to use it, but for irrelevant details can't really use Scala on my system. The result ... I've created a Haskell port! It's like to diverge from latex2unicode, but at the very least the first commit is a roughly one-to-one port.

Wanted to drop a comment here and say:

  1. Thanks for the fabulous work

  2. Here's the port, in case this is useful info to anyone

Cheers =)

Option to display (unprocessed) unknown LaTeX commands

I report this issue related to the use of latex2unicode since version 4.0 in JabRef, and after a discussion with the JabRef developers team.

For a long time I have been using my own LaTeX functions in fields of JabRef, for instance \prl for the Journal, which upon compilation in LaTeX I translate to Phys. Rev. Lett. As \prl is not recognized as a standard LaTeX command, it is ignored by latex2unicode and removed. As a result, in my table of entries I do not see any more the fields which are coded as private LaTeX commands. For example, I lost completely the overview of the journals.

The developers of JabRef suggested that an option may be added to latex2unicode to specify how an unknown LaTeX command is processed, which is the option I also thought of. The minimum could be "Ignore" and "Display", the latter rendering either \prl or prl. This would be very helpful, possibly beyond JabRef.

Convert unicode to latex

Hi Tom,

This is just an idea for a feature. How about supporting the reverse direction for the conversion in this library: From unicode text to LaTeX-escaped text?

Since you have the maps for LaTeX to unicode conversion, it might be possible to support the reverse direction without too much effort (I hope). We need this conversion in JabRef as well and would be glad to rely on latex2unicode!

If you are not interested in implementing this you can just close the issue.

Regards
Jörg

Keep unknown commands

Hi @tomtung
You recently introduced the feature that unknown commands are preserved.
However, parameters in braces are removed from the brace environment or empty braces are stripped.
We discussed this at the JabRef dev call and came to the conclusion that it would be beneficial if the commands are just kept as they are if they are unknown.
I added a few tests that show the intended behavior.
Could you help us in bringing this into Scala code?
Or in general, do you agree with our thoughts?

Best regards,

Stefan and the JabRef team

  test("Unknown commands") {
    LaTeX2Unicode.convert("\\this \\is \\alpha test") shouldBe "\\this \\is α test"
    LaTeX2Unicode.convert("\\unknown command") shouldBe "\\unknown command"
    LaTeX2Unicode.convert("\\unknown{} empty params") shouldBe "\\unknown{} empty params"
    LaTeX2Unicode.convert("\\unknown{cmd}") shouldBe "\\unknown{cmd}"
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.