Comments (20)
so: if you open the spec to all languages, all languages will likely be used.
I don't think this is true. It's up to each project to decide which languages should be used. If I write a spec saying all languages are possible for standard-readme, I won't wake up tomorrow to 7000 translations from all human languages.
That would suggest translators are not essential members of the project... heresy?
Sure they are. But only if the project has multilingual users or support.
BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure.
I think that sums it up for me. I am going to go with BCP_47. We can revisit this later if we need to.
from standard-readme.
@martinheidegger I agree with it so we need to decide which language code use.
from standard-readme.
I tend towards ISO 639-1 (even though ISO 639-3 would be more complete) because as-a-readme it should try to find a balance of effort/usefulness. I think any person that can read "Cantonese" can read "Chinese" (at least to my knowledge).
from standard-readme.
@RichardLitt Please add the discussion
label.
from standard-readme.
Done. Thanks @sotayamashita.
I agree about this issue; thank you so much for making it!
Can you point me to some repos which already do translations? I would love to see how others have done it, already, before deciding on a new standard.
Why wouldn't you lean towards ISO 639-3? What would be the downsides?
from standard-readme.
Downsides to ISO639-3:
Uncommon in usage
Usually people know ISO 639-1 (en, ja, etc.) they usually have not heard of the more extended forms. Mistakes and irritation is expectable.
Usefulness questionable
It makes sense to translate content into more than one language because the major languages all have millions of people that talk it. Several of the ISO 639-3 languages are dialects of people that are spoken additionally to one of the ISO 639-1 languages. Those languages provide little or no value.
Growth of Maintenance cost
Maintain translations is a pain in the ass. A restriction to 184 languages at least restricts a little bit from having the translations to grow overboard. (tbh. I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers - 25 languages would qualify)
from standard-readme.
Those are fair points. I think limiting to 184 languages is fine.
Would it be possible to specify both language codes? Is there a self-describing language code - as in, can it be clear that we are using ISO 639-1 as opposed to ISO639-3 easily?
from standard-readme.
ISO 639-1 are two letter codes, ISO 639-3 are three-letter codes. Usually specifying "two letter code" implies that you use ISO 639-1.
I am not quite sure if this is what you asked but: it is possible to support either language code:
if (characterCodePart.length === 2) {
iso639_1Check(characterCodePart)
} else if (characterCodePart.length === 3) {
iso639_3Check(characterCodePart)
} else {
throw watIsDat();
}
from standard-readme.
Ah. So, it's very easy to tell the difference, then. In that case, why not add something saying: "Use ISO 639-1 if you can. If you can't, using ISO 639-3 is also valid." Is there a need to eliminate one for the sake of the other?
from standard-readme.
see my reasoning above. i still somehow think the top 25 would be enough
from standard-readme.
Maybe I am not being clear about where I am confused.
Even if they could be. I.e README.ja.md could show the readme in japanese. I think it would be a step in the right direction to specify this format as well.
What I am seeing is that we add this to the format:
If you have i18n for your READMEs, the standard is to name your files accordingly:
README.ja.md
, whereja
is the ISO 639-1 code. All 639-1 codes are valid; if your language falls outside of ISO 639-1, then you may use ISO 639-3, which has three letters. For instance,README.ask.md
for Askunu. However, if your language has both a ISO 639-1 and ISO 639-3 code, default to the ISO 639-1 code. So, for instance,README.en.md
instead ofREADME.eng.md
.
This is what I am considering adding to the spec. For the linter, we can check, if it is two digits, if it is an ISO 639-1 language. If it is three, than it is the other - as your code suggests. I don't see a feasible way to limit the languages to 25, or why we would even want to.
Does this sound alright? What am I missing from your understanding of intentionally excluding ISO 639-3 languages?
from standard-readme.
Maybe this can be more permissive by allowing BCP-47 tags? E.g., de
, en-GB
, nl-BE
, and the like. That would open up regions as well, and it allows both 639-1 and 639-3 (preferring the shortest) too.
from standard-readme.
I'm finding the spec a bit hard to parse. I'm also not sure I want to live in a world where README.en.md
and README.en-GB.md
are two READMEs I need to keep updated. But it does seem to have the best ratification elsewhere - it's used by a lot of other computing standards [See wiki]. So, I am for that.
Use the appropriate IETF tag seems to be a pretty fine thing to say; it allows us to not worry about conforming to one ISO variant over another, and it puts the burden on the translator to know what tag they should use in their README version (which they should already know, anyway).
from standard-readme.
Having BCP47 also allows for different currencies; comma or full-stop as number separators; multiple scripts (Chinese, some Slavic languages, etc)!
from standard-readme.
Well, that's sold me. Others?
from standard-readme.
I wrote above this:
I wonder if it wouldn't make sense to restrict the list to languages spoken by 50 million people or more
Using BCP47 would go straight against this.
Of the following two options:
A
or
B
I am going with B) γΌ Godspeed.
from standard-readme.
@martinheidegger I'm really sorry man; I'm still a bit confused why you feel that we need to restrict languages. That's why we keep talking past each other.
What would restricting languages functionally mean? Is this not just about naming the README.md files?
from standard-readme.
Okay, not trying to win the argument here. Just trying to convey my point: At the core there is one question: "Why does github not support internationalization?" And my guessing point on this is: because multiple languages would split the community and the effectiveness of open source code.
With this in mind, the counter question becomes "Why would you even try to support different languages in open source?". The only answer I can come up with is: "incompetency" (sort of): Not every person in the world speaks/writes/reads english. (See EPI)
By providing different translations we accept that and try to accommodate people who don't speak english but that is an effort and comes at a cost. To argue that cost, to make it worth it, translators should focus on the biggest amount of people that can not deal with english. Because more translations mean more effort and less work on the open-source code itself. Which I think is not good to facilitate.
allows for different currencies; comma or full-stop
This means to me just that we can put more effort into some place to which the effort doesn't help much, if at all.
from standard-readme.
I understand all of that; thank you so much for laying it out clearly.
I agree about the cost, about why GitHub doesn't support i18n, and about focusing on the most amount of people.
I am curious about this possible point: If I speak a language that is uncommon, what is to stop me naming my README using my language code and doing my own work of translating the README? Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages - buy-in would be the responsibility of the translators and language communities, not the project. Limiting languages to the top 25 most spoken would be detrimental towards their efforts, I think. Do you understand what I am getting at? I may be misinformed! Please let me know if so. I agree that, as a translator, if I spoke three languages, I should focus on the more common one, but I don't see a problem with also translating the other one if I wanted.
Regarding currencies, commas, and the like; those may be important, and I don't see a problem with using a standard that does away with any possible bike-shedding. That's the point of standard-readme in the end, too.
from standard-readme.
Is there a high cost for people who are not the translators? Because if there isn't, than I don't think it's a bad thing to support i18n for all possible languages.
The problem is here, I think, that you can't really separate the translators spec from the writers spec. Also the translators need to work from some base. so: if you open the spec to all languages, all languages will likely be used.
- buy-in would be the responsibility of the translators and language communities, not the project.
That would suggest translators are not essential members of the project... heresy? π
... rd that does away with any possible bike-shedding.
Just for a complete picture: It is possible to limit the "25" languages to "25 BCP 47 codes". In other words: "en" could automatically stand for "en_UK".
There is an argument for and against limitation. BCP_47 will likely be a little loose to be effictive but is generally more inviting. Limiting the languages would be a "bolder choice" that might not be favorable by people but could result in a nicer infra-structure.
from standard-readme.
Related Issues (20)
- Improvement in License section HOT 1
- Question on Usage section HOT 1
- Question: clarifying on rules for install, usage, and contributing HOT 3
- Installation fails because of missing dependency to opencollective-postinstall HOT 1
- README.cn.md doesn't follow naming convention HOT 8
- Introduce REUSE compliance/compatibility HOT 1
- ζ¨ε₯½οΌθ½εδΈδΈͺζ°ζθ½ηζηζη¨εοΌ
- cat: spec.md: No such file or directory HOT 1
- README, Markdown and other formats HOT 1
- It's a good markdown file HOT 1
- Create a logo for Open Collective
- The "Install" section may not seem right for deployable websites HOT 2
- Does this project comply with standard-readme HOT 1
- Table of Contents Built in to GitHub HOT 2
- Demo
- Some links in the maximal example are broken HOT 4
- Add support for CHANGELOG.md? HOT 2
- Add a Credits/Thanks/Acknowledgements section HOT 6
- Examples use "contributing" which results in an error HOT 3
- Chinese translation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from standard-readme.