projectfluent / fluent-langneg-rs Goto Github PK
View Code? Open in Web Editor NEWLibrary for language and locale identifier negotiation.
Home Page: https://projectfluent.org/
License: Apache License 2.0
Library for language and locale identifier negotiation.
Home Page: https://projectfluent.org/
License: Apache License 2.0
Other Fluent crates use the LanguageIdentifier
from unic-langid
, whereas fluent-langneg
uses one from icu_locid
, hence, the result of negotiate_languages
cannot be used for other crates.
At the moment we have a class of non-deterministic results when a request for en
matches two regions (GB
and CA
for example). I'd like to find a way to make it deterministic if only to remove the papercut when writing tests.
Feedback on reddit from Quxxy:
Just a quick bit of feedback: you generally shouldn't be taking Vec<_> as an argument like you are with negotiate_languages. Unless you need to own the elements (which you don't appear to), or you need resizable storage (which you don't appear to), you should be taking &[_] instead.
It's like asking for an IKEA shelf specifically, when really any brand will do.
Edit: for bonus points: if all you ever do is iterate over the elements once, you could also take an iterator for maximum client-side flexibility.
https://www.reddit.com/r/rust/comments/74rv6r/fluent_locale_library_for_language_tag/do13yto/
License file didn't set the copyright holder. It currently has a placeholder value:
Copyright {yyyy} {name of copyright owner}
https://github.com/projectfluent/fluent-langneg-rs/blob/master/LICENSE#L189C4-L189C46
Could you please fix that and release a version with that fix included?
This issue is related to #9, and I want to try to figure it out while making a PR.
First, there are two locales for which the existing logic is simply wrong. "cs" results in "cs-CS", while the region should be CZ, and similarly for "sr", should be "sr-Cyrl-RS" rather than "sr-Cyrl-SR".
A deeper issue is that this logic is very inconsistent whether it adds a script or not. It feels to me like the right thing to do is add a script. However, this makes three of the negotiate tests fail. I feel that if any downstream logic is dependent on the script not being present (as is the case now), it is pretty fragile. However, fixing this feels like a bit of a yak-shave, so I'm filing this issue asking for advice.
I'll also prepare a minimally invasive PR for likely subtags.
Pardon me if this is a Rust newbie question (I am one), but I'm struggling to implement this library in my app.
Very roughly I want to do some negotiation up front and decide on a language fallback stack, then retain that in an immutable struct for the lifetime of the app (a CLI tool). I don't have any problem with this config struct otherwise, it's working and I can even put some language information in it. For example I can use this libraries accepted_languages::parse()
and get an owned result back (Vec<LanguageIdentifier>
) which I can easily keep in my struct. The issue I have is there seems to be no way to use negotiate_languages()
and get back something owned by the calling function. It always returns Vec<&LanguageIdentifier>
(which I can't retain in my struct).
Shouldn't there be a built in method that returns something that can be owned by the parent scope?
I'm interested in a very high performance representation of locales for skribo. I think what fluent-locale has is a good base, but have some ideas how to make it more performant, both in speed and in object size.
The main cost is likely the allocation of the many small String
objects in a locale. There are existing tiny string implementations (tendril, inlinable_string, iString), but I think it's possible to do better by specializing to the needs of bcp47. Most of these strings are in the ballpark of 16 bytes each, and much of the cost is the need to spill to allocation when the strings get big. In bcp47, most of the subtags have a small, fixed maximum size.
I've prototyped a "tinystr" that uses a NonZeroU32 as its backing store, and thus takes 4 bytes, even when used as an option. It also uses SIMD-like math to verify ASCII and no NUL bytes. I'm happy to PR that into this repo, or make a separate crate (there are a number of file formats that use 4 byte tags, and this would be good for those). Use of this string type would probably not be a huge code change, as it doesn't fundamentally change the architecture, just the representation. There is unsafe code, but I think it should be possible to review it to get good confidence.
A more aggressive optimization is to use an enum between a fast-path and a general-case representation. The fast path would be optional 4 byte tiny strings for language, script, and region. The general case would be a boxed struct similar to the current one, but with an 8 byte tiny string for language and variant, and 4 byte tiny strings for the other subtags. This enum is 16 bytes on both 32 and 64 bit platforms.
I'm posting an issue to get a sense of how welcome these changes are, and also whether tinystr should be its own crate or just a source file in fluent-locale.
Please re-export unic_langid::{LanguageIdentifier, LanguageIdentifierError}
as they are included in the public API.
Our current strategy uses empty field to match wildcard, which results in empty locales matching all other locales.
That means that unless specifically blocked (which we did in 0.4.1) x-testing
matches everything.
I need "likely subtags" for script-aware fallback. ICU has an implementation.
I actually have this pretty well implemented. The question is whether it belongs in fluent-locale-rs or whether it should be in skribo. I estimate that it's in the ballpark of 50k of code and data; I could probably get it down a little.
Also, I haven't implemented the deprecated subtags (for example, the conversion of "sh" to "sr_Latn". I suspect I won't miss them for text rendering, but other applications might want them (for example, so that hyphenation can handle "no-NO"). If I submitted a PR, would you want these?
A "no" answer is fine - it'll just live in skribo.
Rust 1.24 got released and it brings two goodies for us:
I'd like to start using both in fluent-locale-rs ASAP.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.