tc39 / ecma402 Goto Github PK
View Code? Open in Web Editor NEWStatus, process, and documents for ECMA 402
Home Page: https://tc39.es/ecma402/
License: Other
Status, process, and documents for ECMA 402
Home Page: https://tc39.es/ecma402/
License: Other
Cloned from whatwg/html#171
All operating systems allow user to select time format. Usually between 12h, 24h or follow automatic.
Firefox OS is currently providing navigator.mozHour12 that can take true/false/undefined values and that nicely fits into Intl API:
(new Date()).toLocaleString(navigator.languages, {
hour12: navigator.mozHour12,
hour: 'numeric',
minute: 'numeric'
});
When "use default" is used, mozHour is undefined and Intl uses the default hour12 value for the given locale. If mozHour is specified to either true or false, Intl API follows that.
There's also an event associated with it: timeformatchange that is dipatched on window.
We're not yet sure if it should be part of Ecma 402 or HTML spec because of events, but I'm opening the issue here to start the conversation in case the HTML group will throw it here.
I can see us wanting to design the API to handle more than one preference or just follow @ehsan's proposal from the HTML issue:
[NoInterfaceObject, Exposed=(Window,Worker)]
interface NavigatorSystemHour12 {
readonly attribute boolean systemHour12;
};
Navigator implements NavigatorSystemHour12;
WorkerNavigator implements NavigatorSystemHour12;
UPDATE:
DateTimeFormat and NumberFormat are design to provide an opaque string as an output that is not meant to be manipulated by the consumer.
Unfortunately, that is impossible if one needs to format the output and that currently forces very dirty solutions that may break internationalization.
Example use case, is formatting minutes token to be smaller or different color than hours in hour:minute string.
var formatter = Intl.DateTimeFormat(navigator.languages, {
hour: 'numeric',
minute: 'numeric'
});
var string = formatter.format(new Date());
element.innerHTML = string; // we want to display hour token with bold font.
Modern user interfaces do that all the time - examples are Material Design in Android, modern Windows Mobile UI and PS4.
My proposal is to allow for token format strings to be available as option for the formatter. It could look like this:
var formatter = Intl.DateTimeFormat(navigator.languages, {
hour: 'numeric',
minute: 'numeric',
format: {
hour: '<strong>$&</strong>' // similar to String.replace(hour, '<strong>$&</strong>')
}
});
var string = formatter.format(new Date()); // return '<strong>14</strong>:12' in pl
For non HTML outputs, the formatting may involve console formatting characters for terminal UI apps.
CLDR contains the details to abbreviate large numbers, e.g.:
"decimalFormats-numberSystem-latn": {
"standard": "#,##0.###",
"long": {
"decimalFormat": {
"1000-count-one": "0 thousand",
"1000-count-other": "0 thousand",
"10000-count-one": "00 thousand",
"10000-count-other": "00 thousand",
"100000-count-one": "000 thousand",
"100000-count-other": "000 thousand",
"1000000-count-one": "0 million",
"1000000-count-other": "0 million",
"10000000-count-one": "00 million",
"10000000-count-other": "00 million",
"100000000-count-one": "000 million",
"100000000-count-other": "000 million",
"1000000000-count-one": "0 billion",
"1000000000-count-other": "0 billion",
"10000000000-count-one": "00 billion",
"10000000000-count-other": "00 billion",
"100000000000-count-one": "000 billion",
"100000000000-count-other": "000 billion",
"1000000000000-count-one": "0 trillion",
"1000000000000-count-other": "0 trillion",
"10000000000000-count-one": "00 trillion",
"10000000000000-count-other": "00 trillion",
"100000000000000-count-one": "000 trillion",
"100000000000000-count-other": "000 trillion"
}
},
"short": {
"decimalFormat": {
"1000-count-one": "0K",
"1000-count-other": "0K",
"10000-count-one": "00K",
"10000-count-other": "00K",
"100000-count-one": "000K",
"100000-count-other": "000K",
"1000000-count-one": "0M",
"1000000-count-other": "0M",
"10000000-count-one": "00M",
"10000000-count-other": "00M",
"100000000-count-one": "000M",
"100000000-count-other": "000M",
"1000000000-count-one": "0B",
"1000000000-count-other": "0B",
"10000000000-count-one": "00B",
"10000000000-count-other": "00B",
"100000000000-count-one": "000B",
"100000000000-count-other": "000B",
"1000000000000-count-one": "0T",
"1000000000000-count-other": "0T",
"10000000000000-count-one": "00T",
"10000000000000-count-other": "00T",
"100000000000000-count-one": "000T",
"100000000000000-count-other": "000T"
}
}
},
@rxaviers proposed this feature a while ago IIRC, but I could find the thread, so I'm posting it here to formalize the proposal.
This proposal goes hand-to-hand with the pluralization (#34) since it will have to compute what's the pluralization token to choose the right format (In the CLDR data you will see that for english we have a lot of *-count-one
and *-count-other
).
The initial proposal could be to add one more configuration to specify either:
a) best-fit
for the abbreviation, which means we choose the biggest matching decimal from the segments
b) the decimal reference to force to use a particular formatting option (e.g.: 10000
, which could produce 1234K
);
new Intl.NumberFormat('en', { compact: 'best-fit' }).format(1234000); // 1.2M
new Intl.NumberFormat('en', { compact: '4-digits' }).format(1234000); // 1234K
$12M
.new Intl.NumberFormat()
1M
vs 1.2M
1M
vs 1 Million
(this will probably require another configuration to specify long
vs short
.The ECMAScript Internationalization API Specification describes a number of abstract operations to process BCP 47 language tags. Some of them might be useful to expose as public API:
Unit formatting is similar to number formatting, but it involves one more units.
The goal is to produce strings like "5 MB/s", "10 KB", "35 °C" etc.
I believe the most relevant for the Web audience areas would be:
and maybe:
I would suggest that we limit ourselves first to just short version of units and use compound patterns to produce related units (like speed out of length and duration units).
In the future we may leave a window open to add 'variant' option for 'short'/'long' etc.
Proposed API:
var f = Intl.UnitFormat(navigator.languages, {
units: [
{type: 'digital', match: 'bestFit'}
]
});
f.format(1024); // would return '1 KB' in en-US
var f = Intl.UnitFormat(navigator.languages, {
units: {
{type: 'digital', unit: 'bestFit'},
{type: 'length', unit: 'second'}
}
});
f.format(1024); // would return '1 KB/s' in en-US
var f = Intl.UnitFormat(navigator.languages, {
units: {
{type: 'length', unit: 'bestFit'},
}
});
f.format(2048); // would return '2 km' in en-US
Open questions
ResolveLocale cannot accept some valid calendars in a u-ca-
extension, because it only considers a key
subtag followed by zero or one type
subtags. RFC 6067 allows for "zero or more" type
subtags to make the keyword
.
In the current spec the following calendar identifiers are incorrectly matched:
ethiopic-amete-alem
is an alias for ethioaa
, but will be matched as ethiopic
.islamic-civil
replaces the deprecated islamicc
, but will be matched as islamic
.islamic-umalqura
, islamic-tbla
, and islamic-rgsa
will each be matched as islamic
.The 'length' property of the DateTime Format function returned from get Intl.DateTimeFormat.prototype.format changed from 0 to 1 in ECMA-402, 2nd ed.
In ECMA-402, 1st ed:
a. Let F be a Function object, with internal properties set as specified for built-in functions in ES5, 15, or successor, and the length property set to 0, that takes the argument date and performs the following steps:
In ECMA-402, 2nd ed:
a. Let F be a new built-in function object as defined in 12.3.4.
b. The value of F’s length property is 1.
The change is not listed in Annex B, so I'm not sure if it was intentional.
(V8 and SpiderMonkey still return 0, test262 also tests for 0 (https://github.com/tc39/test262/blob/master/test/intl402/DateTimeFormat/prototype/format/12.3.2_1_a_L15.js). Only JavaScriptCore already returns 1.)
As part of the effort to align with EWM, we want to expose more internals, low level APIs. The resolved pattern for dates is something V8 is already exposing, where it matches the CLDR pattern used to format the date. E.g.:
(new Intl.DateTimeFormat("en-US")).resolved.pattern // should return "M/d/y"
From the user perspective, having access to the CLDR pattern helps to interoperate with other systems and languages, including the DOM. E.g.: specifically the <input/>
tag where the pattern can be specified.
( Placeholder, TBD )
See also:
UPDATE: This proposal has advanced to stage 1, details here: https://github.com/caridy/intl-plural-rules-spec
cloned from: https://groups.google.com/forum/#!topic/javascript-globalization/3nFDf5al5hU
It would be very helpful for all l10n libraries to get CLDR plural forms into Intl API.
The proposed API could look like this:
var cardinal = new Intl.PluralFormat(‘en’, {style: ‘cardinal’});
console.log(cardinal.format(0)); // “other”
console.log(cardinal.format(1)); // “one”
console.log(cardinal.format(2)); // “other”
var ordinal = new Intl.PluralFormat(‘en’, {style: ‘ordinal’});
console.log(ordinal.format(11)); // “one”
console.log(ordinal.format(22)); // “two”
console.log(ordinal.format(33)); // “few”
console.log(ordinal.format(44)); // “other”
There are two major areas where relative time/date formatting could use standardization:
Good example are timers - representing time like "31 hours, 15 minutes and 23 seconds" as "31:15:23".
All of those patterns are available in CLDR and it would enable web authors to provide better user interfaces if we could incorporate this kind of formatting either into DateTimeFormat or into a separate RelativeDateTimeFormat.
Mozilla and Microsoft use a convention of pseudo-locales to help developers ensure their user interfaces are localizable. Two pseudo-locales we use are:
You can see an example screenshot of Firefox OS using qps-ploc here: https://bug900182.bmoattachments.org/attachment.cgi?id=8418596
And Microsoft docs here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd319106%28v=vs.85%29.aspx
It would be helpful if Intl API followed the convention and recognized pseudolocales for Numbers and DateTime formatting.
Why not support any arbitrary code inside template strings expressions?
This is not supported i.e:
`${ if (a == 2) {
console.log('hi');
} }`
And with more complex code, like if, if-else, for loops, etc..
It would be nice to develop a template engine that uses this native approach instead of a string manipulation.
Currently, DateTimeFormat is always displaying dayperiod token if hour12 is set. This is not always matching UX design.
We have encountered multiple scenarios in which the hour12 setting and dayperiod displaying are controlled separately, including MacOS Date&Time Settings.
I'd like to get dayperiod as another token for DateTimeFormat options as a boolean flag where 'true' means to display it if hour12 is true, and 'false' means not to display it even when hour12 is true.
Since there's little progress with deciding on what to do with durations, and ICU's API doesn't seem too defined about it, I'd like to start a discussion about possible choices. It seems to me that we have four:
The common use cases are:
The formatting differs from time formatter, because the UI will define the range of units to be displayed, like h:m:s
, m:s.S
and the number of digits per unit (m:ss
vs mm:ss
or mm:ss.SSS
vs mm:ss.SS
)
NumberFormat
would be convenient for that reason - it would benefit from similar API of defining minimumIntegerDigits
and maximumIntegerDigits
for each component, and the value itself may be just a number of milliseconds.
Fitting it into UnitFormat
seems to be natural fit for CLDR Unit Elements [0] together with compound units
, unit sequences
and coordinate units
, but may be hard and requiring us to bend over backward to accommodate for the options that will be required.
Custom DurationFormat
seems to be the easiest choice from the API design perspective, but it increases the number of objects we specify. It's been the choice I made so far for Firefox OS patterns, but I'll be happy to migrate our code once we reach a consensus here.
RuleBasedNumberFormat
is what ICU seems to be using now for duration patterns, but it's a pretty big API most similar to Mozilla's proprietary Date.prototype.toLocaleFormat
and I'm not sure if we want to go there.
The API I use in DurationFormat
looks like this:
var f = Intl.DurationFormat(locales, {
minUnit: 'second', // millisecond | second | minute | hour
maxUnit: 'hour' // millisecond | second | minute | hour
});
f.format(i); // 52:34:51
Thougths?
[0] http://www.unicode.org/reports/tr35/tr35-general.html#Unit_Elements
The 1st Edition of ECMA 402 specified the [[Call]] behavior for Intl
constructors; e.g. Intl.DateTimeFormat.call(this [, locales [, options]])
to return the this
context object that was passed-in. In the 2nd Edition, the [[Call]] behavior no longer states that the passed-in context object should be retuned. This change is a potential compatibility hazard.
As the developer and maintainer of the popular FormatJS i18n libraries, I've begun receiving issues from developers testing Chrome Canary (49) that their dates and numbers were failing for format, causing an Error to be thrown.
The high-level framework integration libs that are part of FormatJS — react-intl, ember-intl, handlebars-intl — are used by many web apps, including many of Yahoo's web apps. All these libraries use the underlying intl-format-cache which memoizes the Intl
constructors because they are expensive to create. The memoization technique essentially does the following:
function constructIntlInstance(IntlConstructor) {
return function () {
var args = Array.prototype.slice.call(arguments);
var instance = Object.create(IntlConstructor.prototype);
IntlConstructor.apply(instance, args);
return instance;
};
}
Note: That this code depends on the following invariant:
var instance = Object.create(IntlConstructor.prototype);
instance === IntlConstructor.call(instance); // true
It is dependent on the the Intl
constructors being .call()
-able and the returning the context object passed-in. This .call()
behavior for the Intl
constructors is supported in all ECMA 402 1st Edition implementations.
After receiving an issue report about this code causing an Error to be thrown Chrome Canary (49), I dug in and found this recent V8 change which updates V8's implementation to match ECMA 402 2nd Edition, thus removing the code that returns the passed-in context object when the Intl
constructors are .call()
-ed.
Today, I've released [email protected]
which changes the memoization implementation to make sure the [[Construct]] behavior always happens by invoking the Intl
constructors with new
. Essentially doing the following:
function constructIntlInstance(IntlConstructor) {
return (...args) => new IntlConstructor(...args);
}
In ES5 an equivalent would be:
function constructIntlInstance(IntlConstructor) {
return function () {
var args = Array.prototype.slice.call(arguments);
return new (Function.prototype.bind.apply(IntlConstructor, [null].concat(args)))();
};
}
While this issue is now "fixed" in intl-format-cache, developers must upgrade their dependencies and re-deploy their apps. I will help to communicate this change, but I'm worried that removing the 1st Edition [[Call]] behavior will break many apps/sites 😞
How should we move forward to prevent end-users from having broken experiences?
Edited based on @rwaldron's feedback.
A common use case for Collator is to sort long lists of terms, and a common UX pattern for long lists of words is to group them by letters and potentially show a sidebar letter scrolling.
Collator allows for different levels of sensitivity when comparing, but that doesn't help with grouping. We can sort "żaba" into words starting with "z", but we can't identify that after removing diacritics and reducing it to a base form it is "zaba" (and that's what Collator is using with sensitivity = base
).
Could we expose to users?
It seems like there's no way with the current specifications to know the first day of the week for a given locale.
I understand ISO 8601 does consider Monday as the first working day: maybe this is the reason. However there are some locales (AFAIK Canada, U.S. and Mexico) where Sunday should be shown as first day of the week - eg in calendars or date pickers. Is there any plan to make available this information?
Errors look like:
[2015-09-29T14:37:25.218Z] Warning: can't find clause, production, note or example with id #table-1
...
Using eg. <emu-clause id="table-1">
:
Error: Clause doesn't have header: <emu-clause id="table-1">
But we don't want a header here and tables aren't:
Using eg.
<emu-example id="table-1">
<figure>
<figcaption>Table 1 — Collator options settable through both extension keys and options properties</figcaption>
<table class="real-table">
...
</table>
</figure>
</emu-example>
Looks more like what we want, but includes a generated <figcaption>
that we don't want:
Maybe @bterlson has advice? Maybe we need an <emu-figure>
?
<emu-figure id="table-1">
<figcaption>...</figcaption>
<table>...</table>
</emu-figure>
Note that <figure id=...>
has same error.
There's a lot of data related to Language names, Timezone names, Script names and Region names contained in CLDR that would allow many App Settings to be easier to develop.
Many popular applications contain some combination of "language selector" and "timezone selector" in their UI. We use it in Firefox OS and Firefox desktop, but most of popular webapps like Gmail, Facebook or Twitter do the same.
It would be awesome to tap into CLDR resources and expose the ability to get localized versions of those tokens. We'll want to do this for Firefox OS as part of the 'mozIntl' API and I think it would make sense to standardize it.
Open questions:
Notice the issue in andyearnshaw/Intl.js#84
Actually I have three proposals:
Number.prototype.toLocaleString()
and Intl.NumberFormat
. There is already currencyDisplay
option but it can be only name
, symbol
or code
and there is no way to display alternative symbol.From the algo described in "2.6.2 Elements availableFormats, appendItems", it seems that the requested options should have a leading role when formatting the best match pattern. Below is the relevant section:
http://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems
Once a skeleton match is found, the corresponding pattern is used, but with adjustments. Consider the following dateFormatItem:
<dateFormatItem id="yMMMd">d MMM y</dateFormatItem>
If this is the best match for yMMMMd, pattern is automatically expanded to produce the pattern "d MMMM y" in response to the request. Of course, if the desired behavior is that a request for yMMMMd should produce something other than "d MMMM y", a separate dateFormatItem must be present, for example:
<dateFormatItem id="yMMMMd">d 'de' MMMM 'de' y</dateFormatItem>
402 does not consider the requested options as part of the resolvedOptions, instead, it only uses the required options to find the best match. In algo 12.1.1 InitializeDateTimeFormat (dateTimeFormat, locales, options), in step 26.c.1, Set dateTimeFormat.[[<prop>]] to p.
, it stores the output option in an internal slot (e.g.: "long" for "{month}") from the format returned from best match algo, the proposal is to overruling that with the value of property "month" from requested options if exist, as described in unicode documentation cited above.
This issue is the major source of problems in Intl.js (Intl polyfill issues: 125, 117, 124, 145 and co.).
For reference, see: http://tc39.github.io/ecma262/#sec-error-message
The incorrect text is here: https://github.com/tc39/ecma402/blob/master/spec/collator.html#L129
The 1.0 specification is available
http://www.ecma-international.org/ecma-402/1.0/
but there is no html version for 2.0
http://www.ecma-international.org/ecma-402/2.0/
Thanks for putting the current draft (3.0) on GitHub pages 👍
(referencing and deeplinking specs is just nicer with html)
There are already two languages for specifying options to a locale:
"en-US"
) plus an object with further options ({ tz: "Asia/Jerusalem" }
)"en-US-u-tz-jerusalm"
)Each has some advantages:
An idea here: resolvedOptions()
shouldn't return just a plain old object, but an instance of a special class whose toString()
method returns the BCP47 locale for those options! This way, we get the best of all worlds. The output of resolvedOptions()
can also then be used as the first (and only) argument to constructors like Intl.DateTimeFormat
.
These Intl.Locale
objects can be used generally as the way of representing locale information within ECMAScript. HTML could include an Array of relevant complex Locales (not just simple locales like "en-US") in navigator.locales
, which would be usable either as a string or to get these higher-level properties; detailed user preferences could be reflected in the Locale which would be difficult to reflect in the simple navigator.language. Locales would still have own properties for things that resolvedOptions()
currently includes as own properties, and would therefore probably be web-compatible as an upgrade. We probably couldn't just replace navigator.language
with a Locale for web compatibility reasons, but there's no reason that the string first argument to formatters can't be more flexible (if it isn't already) by continuing to call ToString() and then parsing the BCP47.
(To clarify, this isn't my idea but was something a group of us came up with yesterday.)
UPDATE: Latest Proposal - Nov 2015
I started working on a spec and polyfill for Intl.Locale API.
I should have some proposal for this next week. Currently I have five functions on it:
The rationale for those five are:
this exposes the core function used by all Intl formatters that allows people to write their own custom formatters or polyfills.
This is basic language negotiation function for all localization libraries. It takes requestedLocales list (navigator.languages or custom), availableLocales (app provided locales), and a fallback defaultLocale (default app locale, not default host env locale) and returns the list of prioritized available locales that best matches user requested locales.
Simple function allowing l10n libraries to set html.dir based on their negotiated locale.
@caridy - One question I'd like to get help with is about using Set
instead of Array
. In all cases where we carry a list of locales, we always want it deduplicated and Set
does just that. It would be sweet to use it instead of an Array
, but it's API seems to be significantly less user friendly than that of the Array
- it's impossible to just retrieve locales[0]
, it's not easy to map
etc.
Should I stick to Array
or try to use Set
?
2020 Update: See #342 for further discussion.
Currently the spec has no APIs for parsing dates or numbers. Supposedly the response used to be "use a datepicker" when asking about date parsing. I'm wondering if that changed.
A common scenario is to have a list of items (Array in JS) which has to be formatted in a given language. CLDR provides all the patterns for that.
Example use cases:
Proposed API:
var f = new Intl.ListFormat(navigator.languages, {
'type': 'regular', // regular or duration
'variant': 'regular', // regular, short, narrow
});
f.format(['John', 'Amy', 'Nick']); // "John, Amy and Nick" in en-US
this would also supersede current Array.prototype.toLocaleString
with ability to build better localized lists.
I can imagine this not having it's own object and only extending Array.prototype.toLocaleString with options.
Standardize Intl.v8BreakIterator
.
Backpointers:
Update 1 (Sept 26th, 2016):
Collator is enabling users to build locale-aware sorted list of words. Common UI scenario like Contacts list, or Music app list is to group the list by character and then provide a sidebar grouping selector.
I can see two APIs that could facilitate that and I'm not sure which one would be better:
var input = ['abcd', 'foo', 'γβς', 'أبجدية عربية'];
var ret = [
['a', ['abcd']],
['f', ['foo']],
['γ', ['γβς']],
['أبجدية عربية']
]
In this scenario, the words are sorted by alphabet, the alphabets are sorted by language preferences, and then you receive a nested list of your terms with grouping characters.
The settings would be similar to collator - sensitivity, maybe max number of groups (which would give a..c
on one hand, and aa...ac
, ad...ag
on the other)
I like it because it's simple and works well for simple scenarios, but I feel it doesn't scale well.
The problem with this input is that it operates on strings and requires the user to later compare the result back to his original database. If there are 10000 song titles, then operating on the result of this API in order to get the song entry that is associated with a given name is going to be very confusing.
One other limitation of this approach is that with very big databases, a common practice is to avoid having to load the whole database into memory. So instead of that, we could combine two different techniques:
In that scenario, Intl API is called when the entry is being added to the database and it receives an indexable version of the string with things like sensitivity taken into account.
Then, another API provides the list of group names [a..z][ا...غ]
etc. so that the UI can show options. Then it's on the user and his database to be able to load just entries from a visible/selected range.
I like the latter API because it feels more scalable and leads to better performance.
I'm looking for opinions, prior work or ideas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.