python-babel / babel Goto Github PK

View Code? Open in Web Editor NEW

1.3K 53.0 437.0 3.48 MB

The official repository for Babel, the Python Internationalization Library

Home Page: http://babel.pocoo.org/

License: BSD 3-Clause "New" or "Revised" License

Python 99.35% JavaScript 0.59% Makefile 0.06% HTML 0.01%

hacktoberfest cldr gettext i18n l10n

babel's People

Contributors

Stargazers

Watchers

Forkers

dasich sharoonthomas nickretallack keitheis gjo rtoskit yoloseem matthewwilkes bacher09 s0undt3ch hongquan bsimpson63 nandoflorestan andreebrazeau mardiros sjord ithacadream geoffroyr allanlei javacruft gfronza sfx aodag h4ck3rm1k3 jonathanrrogers shnwang entequak xlevus prmtl dmbasso joernhees buma dpgaspar genba anru hanteng astaric cedricmessiant afedotov89 tescalada saisrk keoven foosel atomd-zz voleg mauritsvanrees handshake cojocarumdl lukasbalaga regisb abdelouahabb tubaman imclab benselme rrafal kvesteri sander2012 st4lk neutralord mbirtwell merchise-autrement vesauimonen ashokkumar2016 jrfern thanatos rwdxll jespino basis remyroy jtuulos stuartleigh jacobsvante urwithajit9 remunj88 coldricksotk tesfay79 maicotimmerman erickwilder jadsn jeremydw sils philiptzou exg77 jun66j5 sys-git lepistone craigloftus tommydroptables drloboto gitter-badger moreati gutsy terryjbates josephbreihan smn imankulov fschulze dee42 upman third-repo

babel's Issues

To add ca_VALENCIA.dat

ca_VALENCIA.dat is here: https://www.dropbox.com/sh/p138bff6j8l8mwn/Sv0ckU8ACY

Remove 2.4 and 2.5 Support

Remove support for old Python versions and remove hacks previously in place for those.

messages not extracted from folders starting with _ or .

Unfortunately I have folders like _m containing templates I would like to be translated. Right now pybabel is ignoring those. I can patch this but was wondering if there is a known reason for this before doing so.

auto_comments lost when updating a message

This is a copy of old ticket 228.

I have some utilities which convert po files to xls and back again. I noticed that when merging translations from the xls file into the catalog parts of the auto comments are lost. Our comments tend to be very long (5-15 lines of text), but definitely important.

I can reproduce this with a simple bit of code:

import sys
from babel.messages.pofile import read_po
from babel.messages.pofile import write_po

catalog=read_po(open(sys.argv[1]))
for msg in catalog:
    if not msg.id:
        continue
    msg.string=msg.string
output=open(sys.argv[2], "wt")
write_po(output, catalog)
output.close()

If I do not touch msg.string the auto comments are not lost.

Test failure with enabled warnings

1 test fails when warnings are enabled (using e.g. PYTHONWARNINGS="d" environmental variable).

$ PYTHONWARNINGS="d" py.test-3.3
==================================================================== test session starts ====================================================================
platform linux -- Python 3.3.4 -- pytest-2.5.1
collected 327 items 

tests/test_core.py ..............................................
tests/test_dates.py ............................................................
tests/test_localedata.py ......
tests/test_numbers.py .............................
tests/test_plural.py ........
tests/test_support.py ..........................
tests/test_util.py ..
tests/messages/test_catalog.py .....................................
tests/messages/test_checkers.py ......
tests/messages/test_extract.py ....................................
tests/messages/test_frontend.py ...................F............
tests/messages/test_jslexer.py .
tests/messages/test_mofile.py ...
tests/messages/test_plurals.py .
tests/messages/test_pofile.py ..................................

========================================================================= FAILURES ==========================================================================
______________________________________ CommandLineInterfaceTestCase.test_compile_catalog_with_more_than_2_plural_forms ______________________________________

self = <tests.messages.test_frontend.CommandLineInterfaceTestCase testMethod=test_compile_catalog_with_more_than_2_plural_forms>

    def test_compile_catalog_with_more_than_2_plural_forms(self):
        po_file = self._po_file('ru_RU')
        mo_file = po_file.replace('.po', '.mo')
        try:
            self.cli.run(sys.argv + ['compile',
                '--locale', 'ru_RU', '--use-fuzzy',
                '-d', self._i18n_dir()])
            assert os.path.isfile(mo_file)
            self.assertEqual("""\
    compiling catalog %r to %r
>   """ % (po_file, mo_file), sys.stderr.getvalue())
E   AssertionError: "compiling catalog '/tmp/babel/tests/messages/data/project/i18n/ru_RU/LC_MESSAGE [truncated]... != "compiling catalog '/tmp/babel/tests/messages/data/project/i18n/ru_RU/LC_MESSAGE [truncated]...
E     compiling catalog '/tmp/babel/tests/messages/data/project/i18n/ru_RU/LC_MESSAGES/messages.po' to '/tmp/babel/tests/messages/data/project/i18n/ru_RU/LC_MESSAGES/messages.mo'
E   + /tmp/babel/babel/messages/frontend.py:794: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
E   +   write_mo(outfile, catalog, use_fuzzy=options.use_fuzzy)

tests/messages/test_frontend.py:1051: AssertionError
=========================================================== 1 failed, 326 passed in 5.86 seconds ============================================================

default_locale() fails on 'C.UTF-8'

babel.core.default_locale doesn't handle 'C.UTF-8' as a special case and parse_locale('C.UTF-8') returns ('c', None, None, None) which breaks things.

Would you guys like to take ownership of django-babel

I've ported BabelDjango to github:

https://github.com/graingert/django-babel

Would you guys like to take over ownership?

Make Locale objects immutable

Currently babel creates locale objects left and right. That's pretty pointless and the better idea would be to cache them through the metaclass and perform some locking there. This should also make Locale.parse a bit faster for the common case.

In addition to that we should probably add a get_locale function that is an alias for Locale.parse which has been the preferred interface for a really long time now as people really like the short identifiers.

babel.cfg keywords?

I'm finding it hard to locate documentation on how to specify keywords in my babel.cfg. I have a config file that looks like this:

[python: **.py]
encoding = utf-8
[jinja2: **/templates/**.html]
encoding = utf-8
extensions=jinja2.ext.autoescape,jinja2.ext.with_
[javascript: .tmp/**.js]
encoding = utf-8
keywords = translate, ifPlural

All messages are extracted as expected except javascript. I've tried just setting one keyword in the config, changing the key from keywords to extract_messages, and keyword but none seem to work. I've checked what keywords get passed to babel.messages.extract.extract_javascript (by simply doing sys.exit(keywords) at the start of the function) and they never appear. However, doing pybabel extract -k translate -k ifPlural works just fine.

Issues on the edgewall.org tracker

The previous tracker has tons of open issues: http://babel.edgewall.org/query
Are they still relevant? Should they be migrated?

Not all locales have plural_form

get_currency_name and format_timedelta assume that plural_form exists:

import babel
locale = babel.core.Locale.parse('asm', sep='-')
babel.numbers.format_currency(100, 'USD', locale=locale)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/reddit/babel/babel/numbers.py", line 282, in format_currency
    return pattern.apply(number, locale, currency=currency)
  File "/home/reddit/babel/babel/numbers.py", line 659, in apply
    get_currency_name(currency, value, locale))
  File "/home/reddit/babel/babel/numbers.py", line 48, in get_currency_name
    plural_form = loc.plural_form(count)
  File "/home/reddit/babel/babel/core.py", line 740, in plural_form
    return self._data['plural_form']
  File "/home/reddit/babel/babel/localedata.py", line 189, in __getitem__
    orig = val = self._data[key]
KeyError: 'plural_form'

import babel
from datetime import timedelta
locale = babel.core.Locale.parse('asm', sep='-')
babel.dates.format_timedelta(timedelta(days=3), locale=locale)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/reddit/babel/babel/dates.py", line 779, in format_timedelta
    plural_form = locale.plural_form(value)
  File "/home/reddit/babel/babel/core.py", line 740, in plural_form
    return self._data['plural_form']
  File "/home/reddit/babel/babel/localedata.py", line 189, in __getitem__
    orig = val = self._data[key]
KeyError: 'plural_form'

Please support 'Language' header field of PO files

First of all my apologies, I'm neither a developer nor a native speaker of English, please excuse all the inaccuracies below.

In our project exelearning.net we use pybabel. I'm the coordinator of the translation teams, and some months ago found out that msgfmt had started producing warnings that a header field was missing in our PO files (see bug ticket https://forja.cenatic.es/tracker/?func=detail&atid=883&aid=1905&group_id=197). I searched the changelog of gettext and found out that in 2010 a new header field had been introduced, Language.

Documentation about the new header (from /usr/share/doc/gettext/NEWS)

Version 0.18 - May 2010

(...)
PO file format:
There is a new field 'Language' in the header entry. It denotes the language
code (plus optional country code) for the PO file. This field can be used
by automated tools, such as spell checkers. It is expected to be more
reliable than looking at the file name or at the 'Language-Team' field in
the header entry.
msgmerge, msgcat, msgen have a new option --lang that allows to specify
this field. Additionally, msgmerge fills in this new field by looking at
the 'Language-Team' field (if the --lang option is not given).

Does this gettext issue have anything to do with pybabel? Well, yes. Even if I added by hand the field to all our PO files, anytime we did a 'pybabel extract' and 'pybabel update' the field disppeared.

Pedro Peña prepared a patch for babel 1.3 that solved the problem for us

diff --git a/babel/messages/catalog.py b/babel/messages/catalog.py
index 501763b..e26e8f0 100644
--- a/babel/messages/catalog.py
+++ b/babel/messages/catalog.py
@@ -349,7 +349,10 @@ class Catalog(object):
else:
headers.append(('Language-Team', self.language_team))
if self.locale is not None:

       headers.append(('Language', str(self.locale)))
     headers.append(('Plural-Forms', self.plural_forms))

```
   else:
```

       headers.append(('Language', 'LANGUAGE'))
 headers.append(('MIME-Version', '1.0'))
 headers.append(('Content-Type',
                 'text/plain; charset=%s' % self.charset))

Please check the patch and add it to your sources, or else find a different way to support the not-so-new header field.

PluralRule.parse() can't accept a dict

From what I understand, the order of the rules defined in CLDR's plurals.xml matter. So the order must be maintained, and can't be sorted by the tag name. See the example here:

http://unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules

Specially this part: "Also note that a modulus is applied to n in the last rule, thus its condition holds for 119, 219, 319...". This means that the "zero" rule there must be evaluated before the rules for "one" and "few", that is, the compiled result for the "zero" rule is not evaluated lastly, as it would happen if the rules could be sorted.

And this all means that PluralRule.parse() can't accept a dict at all, since order matters.

The function get_display_name doesn't work on python3.2

Because https://github.com/mitsuhiko/babel/blob/master/babel/core.py#L371

Traceback (most recent call last):
  File "/home/agent/project/frigoglass/rdb/.tox/py32/lib/python3.2/site-packages/nose/failure.py", line 38, in runTest
    raise self.exc_val.with_traceback(self.tb)
...
  File "/home/agent/project/frigoglass/rdb/.tox/py32/lib/python3.2/site-packages/babel/__init__.py", line 20, in <module>
    from babel.core import UnknownLocaleError, Locale, default_locale, \
  File "/home/agent/project/frigoglass/rdb/.tox/py32/lib/python3.2/site-packages/babel/core.py", line 371
    retval += ' (%s)' % u', '.join(details)
                            ^
SyntaxError: invalid syntax

pybabel extract --no-wrap feature doesn't work (pybabel 1.2)

Using --no-wrap on the pybabel command line tool doesn't seem to work. My msgstr strings in the messages.po files are split into several lines. --width also seems to be omitted, because I tried using a width such as 10000 without any difference.

Example command:

pybabel extract --mapping=./pybabel.cfg --no-wrap --output=./locale/messages.pot ./

Locales lacking pluralisation rules?

Pluralisation rules are implicitly used by a number of babel operations, but there seem to be locales which have no pluralisation rules at all.

As a result, these error out when using any babel operation needing plural rules e.g.

>>> format_timedelta(timedelta(hours=2), locale='aa')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "babel/dates.py", line 779, in format_timedelta
    plural_form = locale.plural_form(value)
  File "babel/core.py", line 740, in plural_form
    return self._data['plural_form']
  File "babel/localedata.py", line 189, in __getitem__
    orig = val = self._data[key]
KeyError: 'plural_form'
>>> format_timedelta(timedelta(hours=2), locale='uz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "babel/dates.py", line 779, in format_timedelta
    plural_form = locale.plural_form(value)
  File "babel/core.py", line 740, in plural_form
    return self._data['plural_form']
  File "babel/localedata.py", line 189, in __getitem__
    orig = val = self._data[key]
KeyError: 'plural_form'

In CLDR23 (and babel) there are locales which have "empty" pluralisation rules (an empty pluralRules tag in CLDR data, a PluralRule([]) in Babel). I think it would make sense that this be used as a default value for locale then overridden with the locale's actual pluralisation rules if any. This would avoid the error.

Plus CLDR24 apparently removed the empty pluralRule entirely (although it added plural rules for a number of locales, maybe all of them, it still has a comment stating "if locale is known to have no plurals, there are no rules"), so that would be necessary in the future to avoid blowing up on such locales without plurals.

Alternatively, Locale.plural_form could be altered to return a default value of PluralRule([]) if the locale's data has no plural_form key.

Add Support for CLDR 24 Pluralization Rules

CLDR completely changed how it handles pluralization in 24. The following big questions are raised by it:

do we support old ones still?
how is this supposed to work, why does german have an other rule now? Knock-on effects for users
can we fudge the old behavior somehow?
what are the changes of supporting decimal places for the API?

http://www.unicode.org/reports/tr35/tr35-numbers.html#Samples

python-format regexp is incorrect

Copied from old ticket 319.

The regular expression used to check if something is a python-format string is not correct: it thinks the sentence "This is a text about 50% of all users" is a python-format string which is not true.

It also fails to identify new-style python format strings using the curly-brace style notation used by string.format().

Add support for new Pluralization Rules

The new CLDR dataset has more expressive pluralization rules than Babel currently supports. Add support for those.

Python 3 support?

Hi!

There is python 3 support declared on pypi, but when I tried to install Babel today in virtualenv with python 3.2 using pip, it was not working.
Syntax errors during installation and same for importing the package.

In babel's documentation I haven't found anything regarding python 3 support.

Does this mean, that flag in pypi is incorrect?

Thanks,
Josh

babel.localedata.load("en") fails saying "No such file root.dat"

This:

import babel
babel.localedata.load("en")

fails with:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../babel/localedata.py", line 99, in load
    data = load(parent).copy()
  File ".../babel/localedata.py", line 101, in load
    fileobj = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: '.../babel/localedata/root.dat'

after either python setup.py develop or python setup.py install in a Python 2.7.5 virtualenv.

Babel needs to support Fuzzy Locales

Right now Babel only supports a locale if there is a .xml file for it in the CLDR. This is questionable since certain locales just don't have data but need to be differentiated. Right now this for instance practically affects all variants.

You cannot create a de_DE_1999 even though that is a valid and useful locale. The way ICU deals with this is that it allows a locale object to always be generated and then internally finds the most appropriate data files for it.

I temporarily added some hacky workaround that does likely subtag search for locales that people use but don't have data files any more. For instance zh_TW now automatically expands to zh_Hant_TW which does exist. While that is a separate issue by itself that algorithm is specified for both minimizing and maximizing a locale identifier and we should use that one to find appropriate data files.

Order also the locations in the message header with --sort-output

I have recently started using --sort-output, to be nice with the version control tool.

I notice that the list of locations in the header of each message is not sorted, and seems not "stable", that is, from one pybabel update execution to the next the order may vary, resulting in pointless changes in the VC, for example:

-#: …/assets/js/data/MetaData.js:503
 #: …/templates/extjs-l10n.mako:14
 #: …/templates/extjs-l10n.mako:28
 #: …/templates/extjs-l10n.mako:33
 #: …/templates/extjs-l10n.mako:172
+#: …/assets/js/data/MetaData.js:503

I think it would be reasonable to assume that when --sort-output is given, also that list should be sorted.

parse_encoding forced encoding crashes with UnicodeEncodeError

I've seen in this commit: 48cba25 some Python 3 support changes.

But in util.py I've found those changes a bit problematic. Let me show you one change there:

            try:
                import parser
                parser.suite(line1.decode('latin-1'))
            except (ImportError, SyntaxError):

This expects the first line to be decodeable with latin-1. But It might raise a UnicodeEncodeError if it cannot be decoded.

May be the code should check for that exception too?

            try:
                import parser
                parser.suite(line1.decode('latin-1'))
            except (ImportError,UnicodeEncodeError, SyntaxError):

python-format flag incorrectly forced on all extractors

This is a copy of old ticket 318.

During extraction the results from an extractor plugin are used to create a message which is then added to the catalog. When the Message instance is created it can set the python-format flag if the text matches a specific regular expression. That means that a text like this:

This is a text about 50% of all users.

will be flagged as a python-format string even it does not originate from python code, or from a piece of python code that will never use formatting. This is breaking our translations in a pretty bad way currently.

As far as I can see the extractor is the only thing that should decide the flags for a message. The Message class itself should never try to guess flags or force them upon plugins, especially since extractors have no way to override this behaviour.

Add Tox and Travis Support

Make sure that babel tests through tox and travis.

Trac-1.0.1 I18nDateFormatTestCase tests fail with >=Babel-1.0

These work for me with 0.9.6.

======================================================================
ERROR: test_i18n_parse_date_date (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 945, in test_i18n_parse_date_date
    datefmt.parse_date(u'2010-8-28', tz, zh_CN))
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/datefmt.py", line 466, in parse_date
    date=text, hint=hint), _('Invalid Date'))
TracError: "2010-8-28" is an invalid date, or the date format is not known. Try "y\u5e74M\u6708d\u65e5" instead.

======================================================================
ERROR: test_i18n_parse_date_datetime (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 880, in test_i18n_parse_date_datetime
    tz, zh_CN))
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/datefmt.py", line 466, in parse_date
    date=text, hint=hint), _('Invalid Date'))
TracError: "2010-8-28 \u4e0b\u534801:45:56" is an invalid date, or the date format is not known. Try "y\u5e74M\u6708d\u65e5" instead.

======================================================================
ERROR: test_i18n_parse_date_datetime_meridiem (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 906, in test_i18n_parse_date_datetime_meridiem
    zh_CN))
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/datefmt.py", line 466, in parse_date
    date=text, hint=hint), _('Invalid Date'))
TracError: "2011-2-22 \u4e0a\u53480:45:56" is an invalid date, or the date format is not known. Try "y\u5e74M\u6708d\u65e5" instead.

======================================================================
ERROR: test_i18n_parse_date_roundtrip (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 957, in test_i18n_parse_date_roundtrip
    actual = datefmt.parse_date(formatted, tz, locale)
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/datefmt.py", line 466, in parse_date
    date=text, hint=hint), _('Invalid Date'))
TracError: "2010/8/28 \u4e0b\u53481:45:56" is an invalid date, or the date format is not known. Try "y/M/d" instead.

======================================================================
FAIL: test_format_compatibility (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 974, in test_format_compatibility
    datefmt.format_datetime(t, '%x %X', tz, en_US))
AssertionError: 'Aug 28, 2010 1:45:56 PM' != u'Aug 28, 2010, 1:45:56 PM'

======================================================================
FAIL: test_i18n_date_hint (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 794, in test_i18n_date_hint
    datefmt.get_date_format_hint(ja))
AssertionError: 'yyyy/MM/dd' != u'y/MM/dd'

======================================================================
FAIL: test_i18n_datetime_hint (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 767, in test_i18n_datetime_hint
    in ('MMM d, yyyy h:mm:ss a', 'MMM d, y h:mm:ss a'))
AssertionError: False is not true

======================================================================
FAIL: test_i18n_format_date (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 733, in test_i18n_format_date
    datefmt.format_date(t, tzinfo=tz, locale=zh_CN))
AssertionError: u'2010-8-7' != u'2010\u5e748\u67087\u65e5'
- 2010-8-7
+ 2010\u5e748\u67087\u65e5


======================================================================
FAIL: test_i18n_format_datetime (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 694, in test_i18n_format_datetime
    locale=en_US))
AssertionError: 'Aug 28, 2010 1:45:56 PM' != u'Aug 28, 2010, 1:45:56 PM'

======================================================================
FAIL: test_i18n_format_time (trac.util.tests.datefmt.I18nDateFormatTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/www-apps/trac-1.0.1/work/Trac-1.0.1/trac/util/tests/datefmt.py", line 756, in test_i18n_format_time
    datefmt.format_time(t, tzinfo=tz, locale=zh_CN))
AssertionError: u'\u4e0b\u534801:45:56' != u'\u4e0b\u53481:45:56'
- \u4e0b\u534801:45:56
?   -
+ \u4e0b\u53481:45:56```

Python version check in setup.py

Trac uses Babel, and the latest version of Trac supports Python 2.5. For a Trac instance running Python <= 2.5, a messy traceback can result from installing Babel. Would you be willing to add a Python version check in setup,py, as shown here?

Make it work with latest CLDR

Make Babel work with the latest CLDR data sets.

Test failures with Jython

Jython does not support "utf_8" encoding in header comment in source code, but supports "utf-8":

$ python2.7 -c 'eval("# coding=utf-8\nNone")'
$ jython2.7 -c 'eval("# coding=utf-8\nNone")
$ python2.7 -c 'eval("# coding=utf_8\nNone")'
$ jython2.7 -c 'eval("# coding=utf_8\nNone")'
  File "<string>", line 1
SyntaxError: Unknown encoding: utf_8

This patch fixes Jython-specific test failures:

--- babel/util.py
+++ babel/util.py
@@ -80,7 +80,7 @@
                 raise SyntaxError(
                     "python refuses to compile code with both a UTF8 "
                     "byte-order-mark and a magic encoding comment")
-            return 'utf_8'
+            return 'utf-8'
         elif m:
             return m.group(1).decode('latin-1')
         else:

Test failures with non-English locales

There are some test failures with non-English locales. I use Babel 1.3.
The following output is with pl_PL.UTF-8 locale:

$ py.test-2.7
/usr/lib64/python2.7/site-packages/_pytest/assertion/oldinterpret.py:3: DeprecationWarning: The compiler package is deprecated and removed in Python 3.x.
  from compiler import parse, ast, pycodegen
============================= test session starts ==============================
platform linux2 -- Python 2.7.6 -- pytest-2.3.5
collected 344 items

babel/__init__.py .
babel/_compat.py .
babel/core.py .
babel/dates.py F
babel/localedata.py .
babel/numbers.py .
babel/plural.py .
babel/support.py .
babel/util.py .
babel/localtime/__init__.py .
babel/localtime/_unix.py .
babel/localtime/_win32.py .
babel/messages/__init__.py .
babel/messages/catalog.py .
babel/messages/checkers.py .
babel/messages/extract.py .
babel/messages/frontend.py .
babel/messages/jslexer.py .
babel/messages/mofile.py .
babel/messages/plurals.py .
babel/messages/pofile.py .
tests/test_core.py ..............................................
tests/test_dates.py ............................................................
tests/test_localedata.py ......
tests/test_numbers.py ..........F................
tests/test_plural.py .......
tests/test_support.py ..........................
tests/test_util.py ..
tests/messages/test_catalog.py ....................................
tests/messages/test_checkers.py ......
tests/messages/test_extract.py ....................................
tests/messages/test_frontend.py ................................
tests/messages/test_jslexer.py .
tests/messages/test_mofile.py ...
tests/messages/test_plurals.py .
tests/messages/test_pofile.py ..................................

=================================== FAILURES ===================================
__________________________________ [doctest] ___________________________________
727 
728     >>> format_timedelta(timedelta(hours=23), threshold=0.9, locale='en_US')
729     u'1 day'
730     >>> format_timedelta(timedelta(hours=23), threshold=1.1, locale='en_US')
731     u'23 hours'
732 
733     In addition directional information can be provided that informs
734     the user if the date is in the past or in the future:
735 
736     >>> format_timedelta(timedelta(hours=1), add_direction=True)
Expected:
    u'In 1 hour'
Got:
    u'Za 1 godzin\u0119'

/tmp/Babel-1.3/babel/dates.py:736: DocTestFailure
____________________________ test_get_currency_name ____________________________

    def test_get_currency_name():
>       assert numbers.get_currency_name('USD', 'en_US') == u'US dollars'

tests/test_numbers.py:176: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

currency = 'USD', count = 'en_US', locale = 'pl_PL'

    def get_currency_name(currency, count=None, locale=LC_NUMERIC):
        """Return the name used by the locale for the specified currency.

        >>> get_currency_name('USD', locale='en_US')
        u'US Dollar'

        .. versionadded:: 0.9.4

        :param currency: the currency code
        :param count: the optional count.  If provided the currency name
                      will be pluralized to that number if possible.
        :param locale: the `Locale` object or locale identifier
        """
        loc = Locale.parse(locale)
        if count is not None:
>           plural_form = loc.plural_form(count)

babel/numbers.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <PluralRule 'one: n is 1, few: n mod 10 in 2..4 and n mod 100 not in 12..14, many: n is not 1 and n mod 10 in 0..1 or n mod 10 in 5..9 or n mod 100 in 12..14'>
n = 'en_US'

    def __call__(self, n):
        if not hasattr(self, '_func'):
            self._func = to_python(self)
>       return self._func(n)

babel/plural.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

n = 'en_US'

>   ???

<rule>:2: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = 'en_US', b = 10

    def cldr_modulo(a, b):
        """Javaish modulo.  This modulo operator returns the value with the sign
        of the dividend rather than the divisor like Python does:

        >>> cldr_modulo(-3, 5)
        -3
        >>> cldr_modulo(-3, -5)
        -3
        >>> cldr_modulo(3, 5)
        3
        """
        reverse = 0
        if a < 0:
            a *= -1
            reverse = 1
        if b < 0:
            b *= -1
>       rv = a % b
E       TypeError: not all arguments converted during string formatting

babel/plural.py:247: TypeError
===================== 2 failed, 342 passed in 4.76 seconds =====================
�$ py.test-3.3
============================= test session starts ==============================
platform linux -- Python 3.3.3 -- pytest-2.3.5
collected 323 items

tests/test_core.py ..............................................
tests/test_dates.py ............................................................
tests/test_localedata.py ......
tests/test_numbers.py ..........F................
tests/test_plural.py .......
tests/test_support.py ..........................
tests/test_util.py ..
tests/messages/test_catalog.py ....................................
tests/messages/test_checkers.py ......
tests/messages/test_extract.py ....................................
tests/messages/test_frontend.py ...................F............
tests/messages/test_jslexer.py .
tests/messages/test_mofile.py ...
tests/messages/test_plurals.py .
tests/messages/test_pofile.py ..................................

=================================== FAILURES ===================================
____________________________ test_get_currency_name ____________________________

    def test_get_currency_name():
>       assert numbers.get_currency_name('USD', 'en_US') == u'US dollars'

tests/test_numbers.py:176: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

currency = 'USD', count = 'en_US', locale = 'pl_PL'

    def get_currency_name(currency, count=None, locale=LC_NUMERIC):
        """Return the name used by the locale for the specified currency.

        >>> get_currency_name('USD', locale='en_US')
        u'US Dollar'

        .. versionadded:: 0.9.4

        :param currency: the currency code
        :param count: the optional count.  If provided the currency name
                      will be pluralized to that number if possible.
        :param locale: the `Locale` object or locale identifier
        """
        loc = Locale.parse(locale)
        if count is not None:
>           plural_form = loc.plural_form(count)

babel/numbers.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <PluralRule 'one: n is 1, few: n mod 10 in 2..4 and n mod 100 not in 12..14, many: n is not 1 and n mod 10 in 0..1 or n mod 10 in 5..9 or n mod 100 in 12..14'>
n = 'en_US'

    def __call__(self, n):
        if not hasattr(self, '_func'):
            self._func = to_python(self)
>       return self._func(n)

babel/plural.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

n = 'en_US'

>   ???

<rule>:2: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = 'en_US', b = 10

    def cldr_modulo(a, b):
        """Javaish modulo.  This modulo operator returns the value with the sign
        of the dividend rather than the divisor like Python does:

        >>> cldr_modulo(-3, 5)
        -3
        >>> cldr_modulo(-3, -5)
        -3
        >>> cldr_modulo(3, 5)
        3
        """
        reverse = 0
>       if a < 0:
E       TypeError: unorderable types: str() < int()

babel/plural.py:242: TypeError
 CommandLineInterfaceTestCase.test_compile_catalog_with_more_than_2_plural_forms 

self = <tests.messages.test_frontend.CommandLineInterfaceTestCase testMethod=test_compile_catalog_with_more_than_2_plural_forms>

    def test_compile_catalog_with_more_than_2_plural_forms(self):
        po_file = self._po_file('ru_RU')
        mo_file = po_file.replace('.po', '.mo')
        try:
            self.cli.run(sys.argv + ['compile',
                '--locale', 'ru_RU', '--use-fuzzy',
                '-d', self._i18n_dir()])
            assert os.path.isfile(mo_file)
            self.assertEqual("""\
    compiling catalog %r to %r
>   """ % (po_file, mo_file), sys.stderr.getvalue())
E   AssertionError: "compiling catalog '/tmp/Babel-1.3/tests/messages/data/project/i18n/ru_RU/LC_MES [truncated]... != "compiling catalog '/tmp/Babel-1.3/tests/messages/data/project/i18n/ru_RU/LC_MES [truncated]...
E     compiling catalog '/tmp/Babel-1.3/tests/messages/data/project/i18n/ru_RU/LC_MESSAGES/messages.po' to '/tmp/Babel-1.3/tests/messages/data/project/i18n/ru_RU/LC_MESSAGES/messages.mo'
E   + /tmp/Babel-1.3/babel/messages/frontend.py:794: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
E   +   write_mo(outfile, catalog, use_fuzzy=options.use_fuzzy)

tests/messages/test_frontend.py:1051: AssertionError
===================== 2 failed, 321 passed in 3.48 seconds =====================

Babel 1.0+ doesn't support `zh_CN`

Babel 1.0+ currently doesn't accept zh_CN. But Trac requires the locale. Probably, the other applications require it. See http://trac.edgewall.org/ticket/11258.

Also, Trac expects str(Locale.parse('zh_TW')) is 'zh_TW'. However, Babel 1.0+ automatically expands to 'zh_Hant_TW'.

Babel unable to parse sphinx-generated pot

I am using Babel==1.3 and Sphinx==1.1.3.

I use sphinx-build -b gettext to generate pot files.

You can see an example pot generated here

I attempt to initialize a po file for this template:

$ pybabel init -i copyright.pot -l de -d '.'

And I get this error:

Traceback (most recent call last):
  File "/home/ivo/.virtualenvs/ots/beginners/bin/pybabel", line 9, in <module>
    load_entry_point('Babel==1.3', 'console_scripts', 'pybabel')()
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/frontend.py", line 1151, in main
    return CommandLineInterface().run(sys.argv)
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/frontend.py", line 665, in run
    return getattr(self, cmdname)(args[1:])
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/frontend.py", line 1010, in init
    catalog = read_po(infile, locale=options.locale)
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/pofile.py", line 211, in read_po
    _add_message()
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/pofile.py", line 164, in _add_message
    catalog[msgid] = message
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/catalog.py", line 618, in __setitem__
    self.mime_headers = _parse_header(message.string).items()
  File "/home/ivo/.virtualenvs/ots/beginners/lib/python2.7/site-packages/babel/messages/catalog.py", line 384, in _set_mime_headers
    value, tzoffset, _ = re.split('([+-]\d{4})$', value, 1)
ValueError: need more than 1 value to unpack

cldr cannot be built without global.dat present

Specifically:

del babel\global.dat
python scripts\download_import_cldr.py
J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git>python scripts\download_import_cldr.py
Local copy 'J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\cldr\core-23.1.zip' not found
Downloading 'core-23.1.zip'
 ========================================================================== 100%
Extracting CLDR to 'J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\cldr'
Traceback (most recent call last):
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\import_cldr.py", line 27, in <module>
    from babel import dates, numbers
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\..\babel\dates.py", line 28, in <module>
    from babel.util import UTC, LOCALTZ
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\..\babel\util.py", line 278, in <module>
    from babel import localtime
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\..\babel\localtime\__init__.py", line 21, in <module>
    from babel.localtime._win32 import _get_localzone
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\..\babel\localtime\_win32.py", line 13, in <module>
    tz_names = get_global('windows_zone_mapping')
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\..\babel\core.py", line 53,
in get_global
    _raise_no_data_error()
  File "J:\dev\workspace\pythonxy-xy-27\src\python\babel\__workspace\babel-git\scripts\..\babel\core.py", line 25,
in _raise_no_data_error
    raise RuntimeError('The babel data files are not available. '
RuntimeError: The babel data files are not available. This usually happens because you are using a source checkout
from Babel and you did not build the data files.  Just make sure to run "python setup.py import_cldr" before installing the library.
Traceback (most recent call last):
  File "scripts\download_import_cldr.py", line 104, in <module>
    main()
  File "scripts\download_import_cldr.py", line 100, in main
    common_path])
  File "C:\Python27\lib\subprocess.py", line 542, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['C:\\Python27\\python.exe', 'J:\\dev\\workspace\\pythonxy-xy-27\\src\\python\\babel\\__workspace\\babel-git\\scripts\\import_cldr.py', 'J:\\dev\\workspace\\pythonxy-xy-27\\src\\python\\babel\\__workspace\\babel-git\\cldr\\common']' returned non-zero exit status 1

accessing data files at runtime with pkg_resources

It seems data files (CLDR) are currently accessed with a plain open().
This assumes the library is not bundled later on (e.g., into a packages.zip as recommended on appengine or an egg).

http://pythonhosted.org/setuptools/setuptools.html#accessing-data-files-at-runtime suggests to access data files at runtime using pkg_resources.

Outdated Command Line Interface Docs

The documentation for the command line interface is heavily outdated. It comes from a copy paste at one point but it really should be extracted from the API.

Pofile parser can't handle multi-line obsolete IDs or Strings

Any multi-line obsolete message just comes out as "".

It should be obvious why, when you look at this line. Every line of an obsolete string begins with a #, but every time the script sees a # it throws away all its state variables assuming it is probably starting a new message now, or accumulating occurrences and comments about one.

Here's a test case:

import babel.messages.pofile as pofile
from StringIO import StringIO

buffer = StringIO("""
#~ msgid ""
#~ "ID"
#~ msgstr ""
#~ "String"
""")

catalog = pofile.read_po(buffer)
assert catalog.obsolete.keys() == ['ID']
assert catalog.obsolete.values() == 'String'

Extractor: Support translation string not being the first argument

I'm working on a project using jed for translating strings in my javascript. Its ifPlural function takes the translation string as its second argument. Would it be possible to support this? One could make the lexer check for the first argument that is a string, but that feels kinda guess:y. Perhaps a better way would be to add new syntax like this:

$ pybabel -k translate -k ifPlural:2

Where :2 is the position of the argument that contains the string to translate. If left out the first function argument is used.

Allow some form of Babel supplied locales if not supplied by CLDR

From http://babel.edgewall.org/ticket/258:
"If CLDR does not provide some locale, e.g. Asturian (#254), then we should provide a mechanism that allows the locale identifier to be used as well as to be overriden with user data to support said locale."

In my case, I'm trying to add support for Aymara, which locales were recently submitted to GLIBC for inclusion ( http://sourceware.org/bugzilla/show_bug.cgi?id=14828 ) . Now I'm dumbfounded by learning that Babel doesn't use the glibc locales but something else called CLDR.

It would be great if I could simply use the glibc localedata.

Add support for Python 3.3

Babel should support Python 3.3.

Port Documentation to Sphinx

Make Babel docs available through Sphinx.

Convert Doctests to Unit Tests

There are a ton of doctests and they are a pain to maintain (and further to port to Python 3). We should make them be actual unittests.

specify a minimum pytz version

Babel 1.0 depends on pytz without specifying a minimum version. Pytz uses date-based versioning with letters. Pip 1.4 sees all version strings containing letters as prereleases and will not install them without being asked to do so. The upshot of this is that installing latest Babel with latest pip fails unless any version of pytz is manually installed first.

One common way to work around this issue is to have the pytz dependency specified with a minimum version which contains a letter, for example "pytz>=0a".

Unicode bug on Windows

After compiling a PO file with pybabel, ü appears as Ã¼ in the MO file. It also appears in the PO file again after updating.

The PO file before compiling and updating:

# Dutch (Netherlands) translations for PROJECT.
# Copyright (C) 2014 ORGANIZATION
# This file is distributed under the same license as the PROJECT project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2014.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PROJECT VERSION\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2014-01-23 14:20+0100\n"
"PO-Revision-Date: 2014-01-23 14:19+0100\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: nl_NL <[email protected]>\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 1.3\n"

#: foo/web/templates/control_panel/new_map.html:102
msgid "Uploaded Data"
msgstr "Geüploade gegevens"

The commands executed:

> pybabel compile -f -d ./foo/web/translations
> pybabel update -d ./foo/web/translations -i ./foo/web/translations/messages.pot --no-fuzzy-matching

The same PO file after executing these commands:

# Dutch (Netherlands) translations for PROJECT.
# Copyright (C) 2014 ORGANIZATION
# This file is distributed under the same license as the PROJECT project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2014.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PROJECT VERSION\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2014-01-23 14:20+0100\n"
"PO-Revision-Date: 2014-01-23 14:19+0100\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: nl_NL <[email protected]>\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 1.3\n"

#: foo/web/templates/control_panel/new_map.html:102
msgid "Uploaded Data"
msgstr "GeÃ¼ploade gegevens"

I'm using pybabel-script.py 1.3 with Python 3.3.2 on Windows 7 64-bit. I have no idea where the source of the problem lies. Any help is appreciated.

Update Catalog always discards old obsolete strings.

If you have a *.po file with some obsolete messages in it and you run the update_catalog command, no matter what options you pass, those obsolete messages will go away. It's really misleading that options like "--previous" and "--ignore_obsolete" exist if they don't preserve your obsolete messages.

I've isolated the problem. It is this line. Removing this line fixes the problem.

Rewrite gettext support

Right now the gettext module from the stdlib is used. This is nice in theory but it has one big problem: it's inconsistent to use between 2.x and 3.x which makes it really bloody annoying to deal with. I came across that when looking into some of the test failures.

I believe the best plan is to make gettext and ugettext do the same on 3.x and made a mode for 2.x to make gettext behave like ugettext. Aside from that we can also use this to clean up the madness with the null translations.

More than that, we might actually start using po and mo files interchangeably with catalogs.

Subtag Expansion does not tag Defaults into Account

#37 shows up a bug that is related to how subtags are expanded in our very basic implementation. Essentially locales that are only defined through the language (like zh_CN) do not expand.

This is problematic because some of the most common locales are affected by this.

Change maintainers?

I didn't see any topic in the mail list, is this a really official repository?

Remove Locale Aliases

Babel currently has a manually maintained list of locale aliases. This is inconsistently used throughout the system. For instance negotiate_locale will expand de to de_DE but few other functions are doing that.

The correct new solution through the CLDR would be to remove the alias map entirely and perform likely subtag expansion. Because those would not expand de to de_DE a new parameter should be added that controls weather the minimum (de) or maximum (de_Latn_DE, though that one is equivalent to de_DE so the latter should be shown) should be expanded to.

This is related to #30.

Please add support for Territory -> Currency mapping

Hi,

It would be great if babel could support territory -> currency mapping as detailed at: http://unicode.org/cldr/charts/supplemental/detailed_territory_currency_information.html

Supported currencies per territory and historical territory currency data is contained within supplementalData.xml

Many Thanks,
Scott