Code Monkey home page Code Monkey logo

fluent-langneg-rs's People

Contributors

atouchet avatar cmyr avatar desiderantes avatar ember-rose avatar emilio avatar fhoehle avatar klosspeter avatar zbraniecki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluent-langneg-rs's Issues

Deterministic results

At the moment we have a class of non-deterministic results when a request for en matches two regions (GB and CA for example). I'd like to find a way to make it deterministic if only to remove the papercut when writing tests.

Switch negotiation APIs to accept an iterator.

Feedback on reddit from Quxxy:

Just a quick bit of feedback: you generally shouldn't be taking Vec<_> as an argument like you are with negotiate_languages. Unless you need to own the elements (which you don't appear to), or you need resizable storage (which you don't appear to), you should be taking &[_] instead.

It's like asking for an IKEA shelf specifically, when really any brand will do.

Edit: for bonus points: if all you ever do is iterate over the elements once, you could also take an iterator for maximum client-side flexibility.

https://www.reddit.com/r/rust/comments/74rv6r/fluent_locale_library_for_language_tag/do13yto/

Incorrect existing likely subtags

This issue is related to #9, and I want to try to figure it out while making a PR.

First, there are two locales for which the existing logic is simply wrong. "cs" results in "cs-CS", while the region should be CZ, and similarly for "sr", should be "sr-Cyrl-RS" rather than "sr-Cyrl-SR".

A deeper issue is that this logic is very inconsistent whether it adds a script or not. It feels to me like the right thing to do is add a script. However, this makes three of the negotiate tests fail. I feel that if any downstream logic is dependent on the script not being present (as is the case now), it is pretty fragile. However, fixing this feels like a bit of a yak-shave, so I'm filing this issue asking for advice.

I'll also prepare a minimally invasive PR for likely subtags.

No way to own the result of negotiate_languages()

Pardon me if this is a Rust newbie question (I am one), but I'm struggling to implement this library in my app.

Very roughly I want to do some negotiation up front and decide on a language fallback stack, then retain that in an immutable struct for the lifetime of the app (a CLI tool). I don't have any problem with this config struct otherwise, it's working and I can even put some language information in it. For example I can use this libraries accepted_languages::parse() and get an owned result back (Vec<LanguageIdentifier>) which I can easily keep in my struct. The issue I have is there seems to be no way to use negotiate_languages() and get back something owned by the calling function. It always returns Vec<&LanguageIdentifier> (which I can't retain in my struct).

Shouldn't there be a built in method that returns something that can be owned by the parent scope?

Ideas for improving performance

I'm interested in a very high performance representation of locales for skribo. I think what fluent-locale has is a good base, but have some ideas how to make it more performant, both in speed and in object size.

The main cost is likely the allocation of the many small String objects in a locale. There are existing tiny string implementations (tendril, inlinable_string, iString), but I think it's possible to do better by specializing to the needs of bcp47. Most of these strings are in the ballpark of 16 bytes each, and much of the cost is the need to spill to allocation when the strings get big. In bcp47, most of the subtags have a small, fixed maximum size.

I've prototyped a "tinystr" that uses a NonZeroU32 as its backing store, and thus takes 4 bytes, even when used as an option. It also uses SIMD-like math to verify ASCII and no NUL bytes. I'm happy to PR that into this repo, or make a separate crate (there are a number of file formats that use 4 byte tags, and this would be good for those). Use of this string type would probably not be a huge code change, as it doesn't fundamentally change the architecture, just the representation. There is unsafe code, but I think it should be possible to review it to get good confidence.

A more aggressive optimization is to use an enum between a fast-path and a general-case representation. The fast path would be optional 4 byte tiny strings for language, script, and region. The general case would be a boxed struct similar to the current one, but with an 8 byte tiny string for language and variant, and 4 byte tiny strings for the other subtags. This enum is 16 bytes on both 32 and 64 bit platforms.

I'm posting an issue to get a sense of how welcome these changes are, and also whether tinystr should be its own crate or just a source file in fluent-locale.

Likely subtags

I need "likely subtags" for script-aware fallback. ICU has an implementation.

I actually have this pretty well implemented. The question is whether it belongs in fluent-locale-rs or whether it should be in skribo. I estimate that it's in the ballpark of 50k of code and data; I could probably get it down a little.

Also, I haven't implemented the deprecated subtags (for example, the conversion of "sh" to "sr_Latn". I suspect I won't miss them for text rendering, but other applications might want them (for example, so that hyphenation can handle "no-NO"). If I submitted a PR, would you want these?

A "no" answer is fine - it'll just live in skribo.

Update the code to use 1.24

Rust 1.24 got released and it brings two goodies for us:

  • AsciiExt on char
  • stable rustfmt-preview.

I'd like to start using both in fluent-locale-rs ASAP.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.