Code Monkey home page Code Monkey logo

expo's People

Contributors

dependabot[bot] avatar github-actions[bot] avatar josevalim avatar kianmeng avatar maennchen avatar whatyouhide avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

expo's Issues

Typespec for `msgctxt` is incorrect

Hello! When switching from the parsers inside Gettext to using Expo, I noticed that the typesepc for msgctxt is incorrect.

@type msgctxt :: String.t()

It should be:

  @type msgctxt :: [String.t(), ...]

When a message has a msgctxt, it will be a list of strings, not a single string -- just like the msgid and msgstr typespecs above. This can be verified elsewhere in this repo in several tests (example). I've provided an example (using v0.4.1) here as well:

msgid "single without context"
msgstr "without"

msgctxt "context single"
msgid "single with context"
msgstr "with"

msgid "singular form without context"
msgid_plural "plural form without context"
msgstr[0] "one without"
msgstr[1] "some without"

msgctxt "context plural"
msgid "singular form with context"
msgid_plural "plural form with context"
msgstr[0] "one with"
msgstr[1] "some with"
iex(22)> Expo.PO.parse_file!("context.po")         
%Expo.Messages{
  headers: [],
  messages: [
    #Expo.Message.Singular<
      msgid: ["single without context"],
      msgstr: ["without"],
      msgctxt: nil,
      comments: [],
      extracted_comments: [],
      flags: [],
      previous_messages: [],
      references: [],
      obsolete: false,
      ...
    >,
    #Expo.Message.Singular<
      msgid: ["single with context"],
      msgstr: ["with"],
      msgctxt: ["context single"],
      comments: [],
      extracted_comments: [],
      flags: [],
      previous_messages: [],
      references: [],
      obsolete: false,
      ...
    >,
    #Expo.Message.Plural<
      msgid: ["singular form without context"],
      msgid_plural: ["plural form without context"],
      msgstr: %{0 => ["one without"], 1 => ["some without"]},
      msgctxt: nil,
      comments: [],
      extracted_comments: [],
      flags: [],
      previous_messages: [],
      references: [],
      obsolete: false,
      ...
    >,
    #Expo.Message.Plural<
      msgid: ["singular form with context"],
      msgid_plural: ["plural form with context"],
      msgstr: %{0 => ["one with"], 1 => ["some with"]},
      msgctxt: ["context plural"],
      comments: [],
      extracted_comments: [],
      flags: [],
      previous_messages: [],
      references: [],
      obsolete: false,
      ...
    >
  ],
  top_comments: [],
  file: "context.po"
}

Thanks!

Performance Comparison with Gettext

⚠️ Currently, this library is not performance optimized at all.

Based on: https://github.com/jshmrtn/expo/tree/performance_comparisor/performance_test

read.exs
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.13.3
Erlang 24.3.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking Expo.Parser.Mo.parse ...
Benchmarking Expo.Parser.Po.parse ...
Benchmarking Gettext.PO.parse_string ...

Name                              ips        average  deviation         median         99th %
Expo.Parser.Mo.parse           525.98        1.90 ms    ±24.75%        1.81 ms        2.80 ms
Gettext.PO.parse_string        116.38        8.59 ms    ±10.98%        8.77 ms       10.80 ms
Expo.Parser.Po.parse            62.41       16.02 ms    ±13.50%       15.61 ms       23.78 ms

Comparison: 
Expo.Parser.Mo.parse           525.98
Gettext.PO.parse_string        116.38 - 4.52x slower +6.69 ms
Expo.Parser.Po.parse            62.41 - 8.43x slower +14.12 ms

Memory usage statistics:

Name                       Memory usage
Expo.Parser.Mo.parse            1.57 MB
Gettext.PO.parse_string        10.35 MB - 6.59x memory usage +8.78 MB
Expo.Parser.Po.parse           45.78 MB - 29.12x memory usage +44.21 MB

**All measurements for memory usage were the same**
write.exs
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.13.3
Erlang 24.3.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking Expo.Composer.Mo.compose ...
Benchmarking Expo.Composer.Po.compose ...
Benchmarking Gettext.PO.dump ...

Name                               ips        average  deviation         median         99th %
Expo.Composer.Mo.compose       2354.75        0.42 ms    ±19.23%        0.40 ms        0.84 ms
Gettext.PO.dump                 148.69        6.73 ms    ±17.47%        6.54 ms        9.87 ms
Expo.Composer.Po.compose        136.89        7.30 ms    ±34.08%        6.64 ms       16.90 ms

Comparison: 
Expo.Composer.Mo.compose       2354.75
Gettext.PO.dump                 148.69 - 15.84x slower +6.30 ms
Expo.Composer.Po.compose        136.89 - 17.20x slower +6.88 ms

Memory usage statistics:

Name                        Memory usage
Expo.Composer.Mo.compose         0.50 MB
Gettext.PO.dump                  3.59 MB - 7.19x memory usage +3.10 MB
Expo.Composer.Po.compose         3.81 MB - 7.63x memory usage +3.31 MB

**All measurements for memory usage were the same**

Comparison is based on the follwoing gettext file and its mo counterpart: https://github.com/jshmrtn/hygeia/blob/4f08c2b68f5de8cad6a84b9d4a0b01be63a7c32c/priv/gettext/de/LC_MESSAGES/default.po

It contains 6'355 lines of po content for 1'398 translations + the header.

String Formatting

Support restructuring of message strings.

Right now, the only option is to preserve the line splitting:

  • .po (read & write) should produce an identical file
  • .mo does not contain line information, everything is one single string

TODO

  • Introduce rebalance_strings function on translation struct
  • Introduce rebalance_strings function on all translations (including headers)

Behaviour of rebalance_strings

  • Fields
    • msgid
    • msgid_plural
    • msgstr
    • headers
  • Split at newlines and put every line in its own string
  • Split words at maxlength?

Fails to parse lines starting with `#~ ##`

== Compilation error in file lib/pleroma/web/gettext.ex ==
1092** (Expo.PO.SyntaxError) priv/gettext/en_test/LC_MESSAGES/static_pages.po:16: unexpected token: "#" (codepoint U+0023)
1093    (expo 0.1.0) lib/expo/po.ex:171: Expo.PO.parse_file!/2
1094    (gettext 0.21.0) lib/gettext/compiler.ex:504: Gettext.Compiler.compile_po_file/5
1095    (gettext 0.21.0) lib/gettext/compiler.ex:449: Gettext.Compiler.compile_unified_po_file/4
1096    (elixir 1.11.4) lib/enum.ex:1411: Enum."-map/2-lists^map/1-0-"/2
1097    (elixir 1.11.4) lib/enum.ex:1411: Enum."-map/2-lists^map/1-0-"/2
1098    (gettext 0.21.0) expanding macro: Gettext.Compiler.__before_compile__/1
1099    lib/pleroma/web/gettext.ex:5: Pleroma.Web.Gettext (module)

Full log: https://git.pleroma.social/pleroma/pleroma/-/jobs/227931

Permalink to priv/gettext/en_test/LC_MESSAGES/static_pages.po:16: https://git.pleroma.social/pleroma/pleroma/-/blob/2a244b391d8c1d9d8e960532758110928cb5ef7c/priv/gettext/en_test/LC_MESSAGES/static_pages.po#L16

Support multiline msgid and msgstr

I believe the current Gettext parser (and the Gettext 'standard') support multiline messages. Currently they do not parse:

iex(4)> Expo.Parser.Po.parse """
...(4)> msgid "hello            
...(4)> beautiful"              
...(4)> msgstr "ciao            
...(4)> bella"
...(4)> """
{:error,
 {:parse_error, "did not expect newline inside string",
  "\nbeautiful\"\nmsgstr \"ciao\nbella\"\n", 1}}

Ubuntu 18.04 no longer supported in CI

Ubuntu 18.04 images are no longer supported: https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/

This cuases our CI to fail: https://github.com/elixir-gettext/expo/actions/runs/4853307131

I think the idea was to test the oldest possible version combination and the newest one.

Do we want to make sure 21.3 can install on ubuntu 20.04 or should we just raise the minimum requirements?

Multi line strings for plural messages in PO files lead to syntax error

Hi, I just stumbled over this problem, where multi line strings don't work in plural messages.

Multi line strings in singular messages as well as single line strings in plural messages work very much fine:

msgid "a"
msgstr "This is a"
"multi line string"

msgid "b"
msgid_plural "b_plural"
msgstr[0] "single line"
msgstr[1] "single line"
iex(7)> Expo.PO.parse_file!("good.po")
%Expo.Messages{
  headers: [],
  messages: [
    #Expo.Message.Singular<
      msgid: ["One participation request for event %{title} to process"],
      msgstr: ["a", "a"],
      msgctxt: nil,
      comments: [],
      extracted_comments: [],
      flags: [],
      previous_messages: [],
      references: [],
      obsolete: false,
      ...
    >,
    #Expo.Message.Plural<
      msgid: ["One participation request for event %{title} to process"],
      msgid_plural: ["One participation request for event %{title} to process"],
      msgstr: %{0 => ["a"], 1 => ["a"]},
      msgctxt: nil,
      comments: [],
      extracted_comments: [],
      flags: [],
      previous_messages: [],
      references: [],
      obsolete: false,
      ...
    >
  ],
  top_comments: [],
  file: "good.po"
}

But the combination of plural message and multi line string does not:

msgid "a"
msgid_plural "a_plural"
msgstr[0] "single line"
msgstr[1] "This is a"
"multi line string"
iex(8)> Expo.PO.parse_file!("bad.po") 
** (Expo.PO.SyntaxError) bad.po:5: syntax error before: "multi line string"
    (expo 0.4.0) lib/expo/po.ex:171: Expo.PO.parse_file!/2
    iex:8: (file)

But from my understanding, both files should be valid .po files? Plus the necessary headers ofc.

I already tried to look into the parsing logic, but realised that Elixir is still pretty new to me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.