elixir-gettext / expo Goto Github PK
View Code? Open in Web Editor NEWLow-level Elixir parser for GNU Gettext files (PO, POT, MO).
License: Apache License 2.0
Low-level Elixir parser for GNU Gettext files (PO, POT, MO).
License: Apache License 2.0
Hello! When switching from the parsers inside Gettext to using Expo, I noticed that the typesepc for msgctxt
is incorrect.
Line 18 in e03b873
It should be:
@type msgctxt :: [String.t(), ...]
When a message has a msgctxt
, it will be a list of strings, not a single string -- just like the msgid
and msgstr
typespecs above. This can be verified elsewhere in this repo in several tests (example). I've provided an example (using v0.4.1) here as well:
msgid "single without context"
msgstr "without"
msgctxt "context single"
msgid "single with context"
msgstr "with"
msgid "singular form without context"
msgid_plural "plural form without context"
msgstr[0] "one without"
msgstr[1] "some without"
msgctxt "context plural"
msgid "singular form with context"
msgid_plural "plural form with context"
msgstr[0] "one with"
msgstr[1] "some with"
iex(22)> Expo.PO.parse_file!("context.po")
%Expo.Messages{
headers: [],
messages: [
#Expo.Message.Singular<
msgid: ["single without context"],
msgstr: ["without"],
msgctxt: nil,
comments: [],
extracted_comments: [],
flags: [],
previous_messages: [],
references: [],
obsolete: false,
...
>,
#Expo.Message.Singular<
msgid: ["single with context"],
msgstr: ["with"],
msgctxt: ["context single"],
comments: [],
extracted_comments: [],
flags: [],
previous_messages: [],
references: [],
obsolete: false,
...
>,
#Expo.Message.Plural<
msgid: ["singular form without context"],
msgid_plural: ["plural form without context"],
msgstr: %{0 => ["one without"], 1 => ["some without"]},
msgctxt: nil,
comments: [],
extracted_comments: [],
flags: [],
previous_messages: [],
references: [],
obsolete: false,
...
>,
#Expo.Message.Plural<
msgid: ["singular form with context"],
msgid_plural: ["plural form with context"],
msgstr: %{0 => ["one with"], 1 => ["some with"]},
msgctxt: ["context plural"],
comments: [],
extracted_comments: [],
flags: [],
previous_messages: [],
references: [],
obsolete: false,
...
>
],
top_comments: [],
file: "context.po"
}
Thanks!
Based on: https://github.com/jshmrtn/expo/tree/performance_comparisor/performance_test
read.exs
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.13.3
Erlang 24.3.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s
Benchmarking Expo.Parser.Mo.parse ...
Benchmarking Expo.Parser.Po.parse ...
Benchmarking Gettext.PO.parse_string ...
Name ips average deviation median 99th %
Expo.Parser.Mo.parse 525.98 1.90 ms ±24.75% 1.81 ms 2.80 ms
Gettext.PO.parse_string 116.38 8.59 ms ±10.98% 8.77 ms 10.80 ms
Expo.Parser.Po.parse 62.41 16.02 ms ±13.50% 15.61 ms 23.78 ms
Comparison:
Expo.Parser.Mo.parse 525.98
Gettext.PO.parse_string 116.38 - 4.52x slower +6.69 ms
Expo.Parser.Po.parse 62.41 - 8.43x slower +14.12 ms
Memory usage statistics:
Name Memory usage
Expo.Parser.Mo.parse 1.57 MB
Gettext.PO.parse_string 10.35 MB - 6.59x memory usage +8.78 MB
Expo.Parser.Po.parse 45.78 MB - 29.12x memory usage +44.21 MB
**All measurements for memory usage were the same**
write.exs
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.13.3
Erlang 24.3.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s
Benchmarking Expo.Composer.Mo.compose ...
Benchmarking Expo.Composer.Po.compose ...
Benchmarking Gettext.PO.dump ...
Name ips average deviation median 99th %
Expo.Composer.Mo.compose 2354.75 0.42 ms ±19.23% 0.40 ms 0.84 ms
Gettext.PO.dump 148.69 6.73 ms ±17.47% 6.54 ms 9.87 ms
Expo.Composer.Po.compose 136.89 7.30 ms ±34.08% 6.64 ms 16.90 ms
Comparison:
Expo.Composer.Mo.compose 2354.75
Gettext.PO.dump 148.69 - 15.84x slower +6.30 ms
Expo.Composer.Po.compose 136.89 - 17.20x slower +6.88 ms
Memory usage statistics:
Name Memory usage
Expo.Composer.Mo.compose 0.50 MB
Gettext.PO.dump 3.59 MB - 7.19x memory usage +3.10 MB
Expo.Composer.Po.compose 3.81 MB - 7.63x memory usage +3.31 MB
**All measurements for memory usage were the same**
Comparison is based on the follwoing gettext file and its mo
counterpart: https://github.com/jshmrtn/hygeia/blob/4f08c2b68f5de8cad6a84b9d4a0b01be63a7c32c/priv/gettext/de/LC_MESSAGES/default.po
It contains 6'355 lines of po content for 1'398 translations + the header.
Support restructuring of message strings.
Right now, the only option is to preserve the line splitting:
.po
(read & write) should produce an identical file.mo
does not contain line information, everything is one single stringrebalance_strings
function on translation structrebalance_strings
function on all translations (including headers)rebalance_strings
msgid
msgid_plural
msgstr
headers
Evaluating the index
for every call is very expensive.
Therefore a macro PluralForms.compile_index/1
should be provided that converts the plural expression into Elixir AST.
== Compilation error in file lib/pleroma/web/gettext.ex ==
1092** (Expo.PO.SyntaxError) priv/gettext/en_test/LC_MESSAGES/static_pages.po:16: unexpected token: "#" (codepoint U+0023)
1093 (expo 0.1.0) lib/expo/po.ex:171: Expo.PO.parse_file!/2
1094 (gettext 0.21.0) lib/gettext/compiler.ex:504: Gettext.Compiler.compile_po_file/5
1095 (gettext 0.21.0) lib/gettext/compiler.ex:449: Gettext.Compiler.compile_unified_po_file/4
1096 (elixir 1.11.4) lib/enum.ex:1411: Enum."-map/2-lists^map/1-0-"/2
1097 (elixir 1.11.4) lib/enum.ex:1411: Enum."-map/2-lists^map/1-0-"/2
1098 (gettext 0.21.0) expanding macro: Gettext.Compiler.__before_compile__/1
1099 lib/pleroma/web/gettext.ex:5: Pleroma.Web.Gettext (module)
Full log: https://git.pleroma.social/pleroma/pleroma/-/jobs/227931
Permalink to priv/gettext/en_test/LC_MESSAGES/static_pages.po:16
: https://git.pleroma.social/pleroma/pleroma/-/blob/2a244b391d8c1d9d8e960532758110928cb5ef7c/priv/gettext/en_test/LC_MESSAGES/static_pages.po#L16
If parsing from a .pot?
file, add line information to the struct.
I believe the current Gettext parser (and the Gettext 'standard') support multiline messages. Currently they do not parse:
iex(4)> Expo.Parser.Po.parse """
...(4)> msgid "hello
...(4)> beautiful"
...(4)> msgstr "ciao
...(4)> bella"
...(4)> """
{:error,
{:parse_error, "did not expect newline inside string",
"\nbeautiful\"\nmsgstr \"ciao\nbella\"\n", 1}}
Ubuntu 18.04 images are no longer supported: https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/
This cuases our CI to fail: https://github.com/elixir-gettext/expo/actions/runs/4853307131
I think the idea was to test the oldest possible version combination and the newest one.
Do we want to make sure 21.3
can install on ubuntu 20.04 or should we just raise the minimum requirements?
Reported upstream: erlef/setup-beam#175
Hi, I just stumbled over this problem, where multi line strings don't work in plural messages.
Multi line strings in singular messages as well as single line strings in plural messages work very much fine:
msgid "a"
msgstr "This is a"
"multi line string"
msgid "b"
msgid_plural "b_plural"
msgstr[0] "single line"
msgstr[1] "single line"
iex(7)> Expo.PO.parse_file!("good.po")
%Expo.Messages{
headers: [],
messages: [
#Expo.Message.Singular<
msgid: ["One participation request for event %{title} to process"],
msgstr: ["a", "a"],
msgctxt: nil,
comments: [],
extracted_comments: [],
flags: [],
previous_messages: [],
references: [],
obsolete: false,
...
>,
#Expo.Message.Plural<
msgid: ["One participation request for event %{title} to process"],
msgid_plural: ["One participation request for event %{title} to process"],
msgstr: %{0 => ["a"], 1 => ["a"]},
msgctxt: nil,
comments: [],
extracted_comments: [],
flags: [],
previous_messages: [],
references: [],
obsolete: false,
...
>
],
top_comments: [],
file: "good.po"
}
But the combination of plural message and multi line string does not:
msgid "a"
msgid_plural "a_plural"
msgstr[0] "single line"
msgstr[1] "This is a"
"multi line string"
iex(8)> Expo.PO.parse_file!("bad.po")
** (Expo.PO.SyntaxError) bad.po:5: syntax error before: "multi line string"
(expo 0.4.0) lib/expo/po.ex:171: Expo.PO.parse_file!/2
iex:8: (file)
But from my understanding, both files should be valid .po files? Plus the necessary headers ofc.
I already tried to look into the parsing logic, but realised that Elixir is still pretty new to me.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.