Right now Readme's in github are not internationalised. Even if they could be. I.e <co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Done. Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Specification for i18n about standard-readme HOT 20 CLOSED

richardlitt commented on May 18, 2024 3

Specification for i18n

from standard-readme.

Comments (20)

RichardLitt commented on May 18, 2024 2

so: if you open the spec to all languages, all languages will likely be used.

I don't think this is true. It's up to each project to decide which languages should be used. If I write a spec saying all languages are possible for standard-readme, I won't wake up tomorrow to 7000 translations from all human languages.

That would suggest translators are not essential members of the project... heresy?

Sure they are. But only if the project has multilingual users or support.

BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure.

I think that sums it up for me. I am going to go with BCP_47. We can revisit this later if we need to.

from standard-readme.

sotayamashita commented on May 18, 2024

@martinheidegger I agree with it so we need to decide which language code use.

from standard-readme.

martinheidegger commented on May 18, 2024

I tend towards ISO 639-1 (even though ISO 639-3 would be more complete) because as-a-readme it should try to find a balance of effort/usefulness. I think any person that can read "Cantonese" can read "Chinese" (at least to my knowledge).

from standard-readme.

sotayamashita commented on May 18, 2024

@RichardLitt Please add the discussion label.

from standard-readme.

RichardLitt commented on May 18, 2024

Done. Thanks @sotayamashita.

I agree about this issue; thank you so much for making it!

Can you point me to some repos which already do translations? I would love to see how others have done it, already, before deciding on a new standard.

Why wouldn't you lean towards ISO 639-3? What would be the downsides?

from standard-readme.

martinheidegger commented on May 18, 2024

Downsides to ISO639-3:

Uncommon in usage

Usually people know ISO 639-1 (en, ja, etc.) they usually have not heard of the more extended forms. Mistakes and irritation is expectable.

Usefulness questionable

It makes sense to translate content into more than one language because the major languages all have millions of people that talk it. Several of the ISO 639-3 languages are dialects of people that are spoken additionally to one of the ISO 639-1 languages. Those languages provide little or no value.

Growth of Maintenance cost

Maintain translations is a pain in the ass. A restriction to 184 languages at least restricts a little bit from having the translations to grow overboard. (tbh. I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers - 25 languages would qualify)

from standard-readme.

RichardLitt commented on May 18, 2024

Those are fair points. I think limiting to 184 languages is fine.

Would it be possible to specify both language codes? Is there a self-describing language code - as in, can it be clear that we are using ISO 639-1 as opposed to ISO639-3 easily?

from standard-readme.

martinheidegger commented on May 18, 2024

ISO 639-1 are two letter codes, ISO 639-3 are three-letter codes. Usually specifying "two letter code" implies that you use ISO 639-1.

I am not quite sure if this is what you asked but: it is possible to support either language code:

if (characterCodePart.length === 2) {
 iso639_1Check(characterCodePart)
} else if (characterCodePart.length === 3) {
 iso639_3Check(characterCodePart)
} else {
 throw watIsDat();
}

from standard-readme.

RichardLitt commented on May 18, 2024

Ah. So, it's very easy to tell the difference, then. In that case, why not add something saying: "Use ISO 639-1 if you can. If you can't, using ISO 639-3 is also valid." Is there a need to eliminate one for the sake of the other?

from standard-readme.

martinheidegger commented on May 18, 2024

see my reasoning above. i still somehow think the top 25 would be enough

from standard-readme.

RichardLitt commented on May 18, 2024

Maybe I am not being clear about where I am confused.

Even if they could be. I.e README.ja.md could show the readme in japanese. I think it would be a step in the right direction to specify this format as well.

What I am seeing is that we add this to the format:

If you have i18n for your READMEs, the standard is to name your files accordingly: README.ja.md, where ja is the ISO 639-1 code. All 639-1 codes are valid; if your language falls outside of ISO 639-1, then you may use ISO 639-3, which has three letters. For instance, README.ask.md for Askunu. However, if your language has both a ISO 639-1 and ISO 639-3 code, default to the ISO 639-1 code. So, for instance, README.en.md instead of README.eng.md.

This is what I am considering adding to the spec. For the linter, we can check, if it is two digits, if it is an ISO 639-1 language. If it is three, than it is the other - as your code suggests. I don't see a feasible way to limit the languages to 25, or why we would even want to.

Does this sound alright? What am I missing from your understanding of intentionally excluding ISO 639-3 languages?

from standard-readme.

wooorm commented on May 18, 2024

Maybe this can be more permissive by allowing BCP-47 tags? E.g., de, en-GB, nl-BE, and the like. That would open up regions as well, and it allows both 639-1 and 639-3 (preferring the shortest) too.

from standard-readme.

RichardLitt commented on May 18, 2024

I'm finding the spec a bit hard to parse. I'm also not sure I want to live in a world where README.en.md and README.en-GB.md are two READMEs I need to keep updated. But it does seem to have the best ratification elsewhere - it's used by a lot of other computing standards [See wiki]. So, I am for that.

Use the appropriate IETF tag seems to be a pretty fine thing to say; it allows us to not worry about conforming to one ISO variant over another, and it puts the burden on the translator to know what tag they should use in their README version (which they should already know, anyway).

from standard-readme.

wooorm commented on May 18, 2024

Having BCP47 also allows for different currencies; comma or full-stop as number separators; multiple scripts (Chinese, some Slavic languages, etc)!

from standard-readme.

RichardLitt commented on May 18, 2024

Well, that's sold me. Others?

from standard-readme.

martinheidegger commented on May 18, 2024

I wrote above this:

I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more

Using BCP47 would go straight against this.

Of the following two options:

A

B

I am going with B) ー Godspeed.

from standard-readme.

RichardLitt commented on May 18, 2024

@martinheidegger I'm really sorry man; I'm still a bit confused why you feel that we need to restrict languages. That's why we keep talking past each other.

What would restricting languages functionally mean? Is this not just about naming the README.md files?

from standard-readme.

martinheidegger commented on May 18, 2024

Okay, not trying to win the argument here. Just trying to convey my point: At the core there is one question: "Why does github not support internationalization?" And my guessing point on this is: because multiple languages would split the community and the effectiveness of open source code.

With this in mind, the counter question becomes "Why would you even try to support different languages in open source?". The only answer I can come up with is: "incompetency" (sort of): Not every person in the world speaks/writes/reads english. (See EPI)

By providing different translations we accept that and try to accommodate people who don't speak english but that is an effort and comes at a cost. To argue that cost, to make it worth it, translators should focus on the biggest amount of people that can not deal with english. Because more translations mean more effort and less work on the open-source code itself. Which I think is not good to facilitate.

allows for different currencies; comma or full-stop

This means to me just that we can put more effort into some place to which the effort doesn't help much, if at all.

from standard-readme.

RichardLitt commented on May 18, 2024

I understand all of that; thank you so much for laying it out clearly.

I agree about the cost, about why GitHub doesn't support i18n, and about focusing on the most amount of people.

I am curious about this possible point: If I speak a language that is uncommon, what is to stop me naming my README using my language code and doing my own work of translating the README? Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages - buy-in would be the responsibility of the translators and language communities, not the project. Limiting languages to the top 25 most spoken would be detrimental towards their efforts, I think. Do you understand what I am getting at? I may be misinformed! Please let me know if so. I agree that, as a translator, if I spoke three languages, I should focus on the more common one, but I don't see a problem with also translating the other one if I wanted.

Regarding currencies, commas, and the like; those may be important, and I don't see a problem with using a standard that does away with any possible bike-shedding. That's the point of standard-readme in the end, too.

from standard-readme.

martinheidegger commented on May 18, 2024

Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages.

The problem is here, I think, that you can't really separate the translators spec from the writers spec. Also the translators need to work from some base. so: if you open the spec to all languages, all languages will likely be used.

buy-in would be the responsibility of the translators and language communities, not the project.

That would suggest translators are not essential members of the project... heresy? 😛

... rd that does away with any possible bike-shedding.

Just for a complete picture: It is possible to limit the "25" languages to "25 BCP 47 codes". In other words: "en" could automatically stand for "en_UK".

There is an argument for and against limitation. BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure.

from standard-readme.

Specification for i18n about standard-readme HOT 20 CLOSED

Comments (20)

Uncommon in usage

Usefulness questionable

Growth of Maintenance cost

A

B

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent