lukasgeiter / gettext-extractor Goto Github PK

View Code? Open in Web Editor NEW

98.0 4.0 21.0 341 KB

A flexible and powerful Gettext message extractor with support for JavaScript, TypeScript, JSX and HTML.

License: MIT License

TypeScript 99.37% HTML 0.33% JavaScript 0.30%

gettext extractor typescript po-files i18n l10n translation

gettext-extractor's Introduction

Gettext Extractor

A flexible and powerful Gettext message extractor with support for JavaScript, TypeScript, JSX and HTML

It works by running your files through a parser and then uses the AST (Abstract Syntax Tree) to find and extract translatable strings from your source code. All extracted strings can then be saved as .pot file to act as template for translation files.

Unlike many of the alternatives, this library is highly configurable and is designed to work with most existing setups.

For the full documentation check out the Github Wiki.

Installation

Note: This package requires Node.js version 6 or higher.

Yarn

yarn add gettext-extractor

NPM

npm install gettext-extractor

Getting Started

Let's start with a code example:

const { GettextExtractor, JsExtractors, HtmlExtractors } = require('gettext-extractor');

let extractor = new GettextExtractor();

extractor
    .createJsParser([
        JsExtractors.callExpression('getText', {
            arguments: {
                text: 0,
                context: 1
            }
        }),
        JsExtractors.callExpression('getPlural', {
            arguments: {
                text: 1,
                textPlural: 2,
                context: 3
            }
        })
    ])
    .parseFilesGlob('./src/**/*.@(ts|js|tsx|jsx)');

extractor
    .createHtmlParser([
        HtmlExtractors.elementContent('translate, [translate]')
    ])
    .parseFilesGlob('./src/**/*.html');

extractor.savePotFile('./messages.pot');

extractor.printStats();

A detailed explanation of this code example and much more can be found in the Github Wiki.

Contributing

From reporting a bug to submitting a pull request: every contribution is appreciated and welcome. Report bugs, ask questions and request features using Github issues. If you want to contribute to the code of this project, please read the Contribution Guidelines.

gettext-extractor's People

Contributors

Stargazers

Watchers

Forkers

seriesly tste sinedied sidloki leipert aoiujz atatter chrisnicola juliusza bazz zipmex phrobix vbraun lzurbriggen velezh arm1n youthlin jeffersoncarvalh0 rgglez

gettext-extractor's Issues

Can not extract comments when assigning to variable

I'm having difficulty extracting comments above a gettext extraction when assigning it to a variable first.

Here is a breaking test:

Modifying https://github.com/lukasgeiter/gettext-extractor/blob/master/tests/js/extractors/comments.test.ts

    const LEADING_LINE_AND_TRAILING_LINE = `
            // Leading line comment
            getText('Foo'); // Trailing line comment
        `;

    const LEADING_LINE_AND_TRAILING_LINE = `
            // Leading line comment
            var myTranslation = getText('Foo'); // Trailing line comment
        `;

breaks the test.

Allow #| comments

Gettext allows the following comments:

#| msgctxt previous-message-context
#| msgid previous-untranslated-string-singular
#| msgid_plural previous-untranslated-string-plural

It would be nice if these would be supported for custom extractors.

Why are escaped character later unescaped?

gettext-extractor/src/html/utils.ts

Line 20 in cea9a47

// Un-escape characters that get escaped by parse5

I am trying to debug an issue where I basically can't match translation keys for HTML with HTML entities like & in it. It brought me to the above line of code which seems problematic.

In my use case I'm translating an element at runtime using element.innerHTML using innerText is not practical because some translations may actually require the HTML to be part of the translation like with a hyperlink.

As a result the innerHTML has the entity as & but the key is forced to be & by the extractor so they can never match.

Is this intended. Could it be made an optional capability instead?

trimWhiteSpace only trims newlines at the start of content

The documentation for the trimWhiteSpace option of HtmlExtractor indicates:

If set to true, white space at the very beginning and at the end of the content will get removed

The actual implementation, below, only trims newline characters from the start of string, but all whitespace characters from the end.

if (options.trimWhiteSpace) {
   content = content.replace(/^\n+|\s+$/g, '');
}

I believe that should be content.replace(/^\s+|\s+$/g, '').

Will .parseFilesGlob accept an array of strings?

It seems it only validates a non empty string ... but I could be wrong.

Cannot extract the multiline comments because regex does not seem to match

Hi,

Trying to extract the multine comment that goes as this:

/* TRANSLATORS:
    Line 1
    Line 2
*/
pgettext('context', 'key')

Using the following options for comments:

{
    // take the comment from the preceding line
    otherLineLeading: true,

    // run the comment through regex
    regex: /TRANSLATORS\:\s*(.*)/m
};

It works for single line comments but not for multiline ones. Any idea how to tackle this?

Same happens if I remove the regex option and only leave the otherLineLeading: true

support for ngx-translate

The last release of ngx-translate-extract for extracting translatables using ngx-translate is almost two years old, so I thought maybe this project can fill the gap.

The extraction of HTML, TypeScript & Javascript should working out of the box, the "only" feature missing is the "pipe" pattern:

<table>
    <thead>
    <tr>
        <th>{{'BOOKINGS.passengers' | translate}}</th>
        <th>{{'BOOKINGS.children' | translate:params}}</th>
        <th [title]="'BOOKINGS.bookingDate' | translate">#</th>
    </tr>
    </thead>
</table>

("translate" is the getText keyword for extracting messages, param is a TypeScript variable)

What do you think? Probably requires a new HTMLExtractor...

Provide content options for HTML attribute extractor

Hi Lukas,

first of all thanks for your greate piece of software, it works like a charm. I'd like to suggest one enhancement, which would avoid the necessity for custom extractors, if it were built into the current implementation - especially because all the required tools are already there, as it's used HtmlExtractors.elementContent.

Would it be possible to perform content normalization on extracted attribute values as well? Even though if I'm working around the missing support in HtmlExtractors.elementAttribute by using HtmlExtractors.elementContent, I'm still facing the missing content options when dealing with textPlural attribute. For this reason it would be great to have this options there as well.

Furthermore, some possibility of content sanitization would be useful as well. I had the case where JSX expressions in attributes are ending up as {'Message'} in the POT string. Offering some kind of callback in the extractor options would provide the flexibility to act on such cases.

What's your take on that? Thanks in advance!

Problems extracting messages from a Vue app

Hi!

We are trying to extract messages from our app, which is built with Vue.js. We are using vue-i18n to handle the translations. Instead of passing a message ID like homepage.welcome_message to the translator function, we're passing the message itself to translate, in Spanish.

Our first attempt was to extract messages directly from our .vue single file components.

The parser successfully extracted some of the messages fron the <template> part (probably beacause of the similarity with JSX templates?). In the templates, translation functions are called as {{$t('message')}}. However, the extractor failed to match the functions from the <script> part.

Our second attempt was to run the message extractor against the built file (using Webpack). In the built JS file, the translation functions from the templates are compiled to the following method call:

_vm.$t('message')

These ones are correctly matched, by using the factory for method calls as follows:

JsExtractors.methodCall('_vm', '$t', {
    arguments: { text: 0 }
})

However, there are a couple of cases where we cannot match the generated method calls:

_vue2.default.t('message')

which is the compiled call from our source files' Vue.t('message'), and

this.$t('message')

which are the calls to the translator function inside our component instances.

Is there a way to use the built-in extractor function creator factories to correctly parse instances of _vue2.default.t (or, as a generalisation <obj1>.<obj2>.<method>), and this.$t (or this.<method>)?

Thanks!

Some issues with parse5

After updating to version 3.3.1 i get this errors when trying to build (it doesn't ocure for version 3.2.1).

node_modules/gettext-extractor/dist/html/parser.d.ts(4,35): error TS2694: Namespace '"path-to-project/node_modules/parse5/lib/index"' has no exported member 'DefaultTreeNode'.
node_modules/gettext-extractor/dist/html/parser.d.ts(5,39): error TS2694: Namespace '"path-to-project/node_modules/parse5/lib/index"' has no exported member 'DefaultTreeTextNode'.
node_modules/gettext-extractor/dist/html/parser.d.ts(6,38): error TS2694: Namespace '"path-to-project/node_modules/parse5/lib/index"' has no exported member 'DefaultTreeElement'.

one module where im using has this import:

import { GettextExtractor, JsExtractors } from 'gettext-extractor';

im using typescript version 2.8.3 for building with this command

tsc --module commonjs

Unable to parse dynamic attributes from *.vue files

I'm parsing *.vue single file components and it works pretty well, except for a small issue. I'm not able to parse the translations from dynamic properties. See example below:

<template>
    <div style="padding: 24px;">
        <h1 :title="c_title">Translations</h1>
        <p>TRMSave -> {{ $t("Save") }}</p>
        <p>Message from script: {{ message }}</p>
        <button :title="$t('Quotation')" class="btn btn--is-primary">{{ $t('Invoice') }}</button>
    </div>
</template>

<script>

export default {
    data() {
        return {
            message: ""
        };
    },
    computed: {
        c_title() {
            return this.$t('Close'); 
        }	
    },
    created() {
        this.message = this.$t('All messages')
    }
};
</script>

In this example everything works, except for the :title="$t('Quatation')" It's just not recognized.

Is there any work around the issue?

Add async support

The extraction is a bit slow to include in a build pipeline. It might be good to allow the execution to be async.

String templates are not parsed

Thank you for the module you've created.
I have a problem with generating a .pot file for the code like
i18nPlural(issuesAmount, `${issuesAmount} issue`, `${issuesAmount} issues`).

i18nPlural is implemented with the node-gettext

No plural form generated.
Please advise.

Full source code of the extractor

const { GettextExtractor, JsExtractors } = require('gettext-extractor');
const fs = require('fs');

const dir = './translations';
if (!fs.existsSync(dir)){
  fs.mkdirSync(dir);
}
const extractor = new GettextExtractor();
extractor
  .createJsParser([
    JsExtractors.callExpression('i18n', {
      arguments: {
        text: 0,
        context: 1,
      },
    }),
    JsExtractors.callExpression('i18nPlural', {
      arguments: {
        text: 1,
        textPlural: 2,
        context: 3,
      },
    }),
  ])
  .parseFilesGlob('./src/**/!(*.spec).js');
extractor.savePotFile('./translations/default.pot');
extractor.printStats();

HtmlExctractors returns more characters than expected

First, thanks for creating a wonderful library! I ran into a bug where HtmlExctractors creates a message with excess characters when html is used within a prop. The following script returns more than the expected: Some good text. <a href="example.com">Learn more</a>.

import { GettextExtractor, HtmlExtractors } from 'gettext-extractor';

  const markupExtractor = new GettextExtractor();
  markupExtractor
    .createHtmlParser([HtmlExtractors.elementContent('[translated]', {})])
    .parseFilesGlob('**/*.js', undefined, {});

  markupExtractor.getMessages().forEach((message) => {
    console.log(message.text);
  });

The parsed file

const Text = ({ children }) => <div>{children}</div>;
const Container = ({ children, secondaryText }) => (
  <div>
    {children}
    {secondaryText}
  </div>
);

const Parent = () => {
  return (
    <Container
      secondaryText={
        <Text translated>
          Some good text. <a href="example.com">Learn more</a>.
        </Text>
      }
      maxlength={25}
    />
  );
};

no strings found on keybase/client project

I'm trying to use your tool to extract all text strings from the Keybase client.

They have several files that match your supported extensions.

mathieu:keybase-client :-) (master) $ find shared/ -name "*.ts" | wc -l 
158
mathieu:keybase-client :-) (master) $ find shared/ -name "*.js" | wc -l 
29
mathieu:keybase-client :-) (master) $ find shared/ -name "*.jsx" | wc -l 
0
mathieu:keybase-client :-) (master) $ find shared/ -name "*.tsx" | wc -l 
1789
mathieu:keybase-client :-) (master) $ find shared/ -name "*.html" | wc -l 
0
mathieu:keybase-client :-) (master) $

but the extractor ran from REPL, could not find any...

here is the full output

mathieu:keybase-client :-) (master) $ node
Welcome to Node.js v12.21.0.
Type ".help" for more information.
> const { GettextExtractor, JsExtractors, HtmlExtractors } = require('gettext-extractor');
undefined
> 
> let extractor = new GettextExtractor();
undefined
> 
> extractor.createJsParser([
...         JsExtractors.callExpression('getText', { arguments: { text: 0, context: 1 } }),
...         JsExtractors.callExpression('getPlural', { arguments: { text: 1, textPlural: 2, context: 3 } })
...     ]).parseFilesGlob('./shared/**/*.@(ts|js|tsx|jsx)');
JsParser {
  builder: CatalogBuilder {
    stats: {
      numberOfMessages: 0,
      numberOfPluralMessages: 0,
      numberOfMessageUsages: 0,
      numberOfContexts: 0,
      numberOfParsedFiles: 1965,
      numberOfParsedFilesWithMessages: 0
    },
    contexts: {}
  },
  extractors: [ [Function], [Function] ],
  stats: {
    numberOfMessages: 0,
    numberOfPluralMessages: 0,
    numberOfMessageUsages: 0,
    numberOfContexts: 0,
    numberOfParsedFiles: 1965,
    numberOfParsedFilesWithMessages: 0
  }
}
> extractor.createHtmlParser([ HtmlExtractors.elementContent('translate, [translate]') ]).parseFilesGlob('./shared/**/*.html');
HtmlParser {
  builder: CatalogBuilder {
    stats: {
      numberOfMessages: 0,
      numberOfPluralMessages: 0,
      numberOfMessageUsages: 0,
      numberOfContexts: 0,
      numberOfParsedFiles: 1965,
      numberOfParsedFilesWithMessages: 0
    },
    contexts: {}
  },
  extractors: [ [Function] ],
  stats: {
    numberOfMessages: 0,
    numberOfPluralMessages: 0,
    numberOfMessageUsages: 0,
    numberOfContexts: 0,
    numberOfParsedFiles: 1965,
    numberOfParsedFilesWithMessages: 0
  }
}
> 
> extractor.savePotFile('./messages.pot');
undefined
> 
> extractor.printStats();

     0 messages extracted
  -----------------------------
     0 total usages
  1965 files (0 with messages)
     0 message contexts

undefined
>

anything else that I can try?
Thanks

how can use for extract strings not translated into XREngine repo?

https://github.com/XRFoundation/XREngine
pls give me cheatsheet for use your app
commands and paths pls

Typescript errors

I just added gettext-extractor to my project, but I get the following complaints from typescript when I try to build:

node_modules/gettext-extractor/dist/html/parser.d.ts(4,35): error TS2694: Namespace '"/Users/jwalton/benbria/loop/node_modules/parse5/lib/index"' has no exported member 'DefaultTreeNode'.
node_modules/gettext-extractor/dist/html/parser.d.ts(5,39): error TS2694: Namespace '"/Users/jwalton/benbria/loop/node_modules/parse5/lib/index"' has no exported member 'DefaultTreeTextNode'.
node_modules/gettext-extractor/dist/html/parser.d.ts(6,38): error TS2694: Namespace '"/Users/jwalton/benbria/loop/node_modules/parse5/lib/index"' has no exported member 'DefaultTreeElement'.

Extract comments from HTML

Hi, I'm looking at using this project with vue-gettext. That projects supports comments like:

<translate translate-comment="My comment for translators">Foo</translate>

I'd like to extract this comment, I'd imagine I'd configure like this:

HtmlExtractors.elementContent('translate', {
    attributes: {
        textPlural: 'translate-plural',
        context: 'translate-context',
        comment: 'translate-comment',
    }
}

Is this possible / feasible?

Add support for empty-attribute selectors

This adds support for [someattr=""] style selectors.

diff --git a/node_modules/gettext-extractor/dist/html/selector.js b/node_modules/gettext-extractor/dist/html/selector.js
index e290d12..cdb1d0b 100644
--- a/node_modules/gettext-extractor/dist/html/selector.js
+++ b/node_modules/gettext-extractor/dist/html/selector.js
@@ -120,7 +120,7 @@ class ElementSelector {
             if (elementAttributeValue === null) {
                 return false;
             }
-            if (attribute.value) {
+            if (attribute.value !== undefined) {
                 switch (attribute.operator) {
                     case '^=':
                         if (elementAttributeValue.slice(0, attribute.value.length) !== attribute.value) {

This issue body was partially generated by patch-package.

Replace SyntaxKing by ScriptKind in the wiki API Reference

Hi !
I wanted to propose a PR but I couldn't fork the wiki. Anyway, it's only a small change request; SyntaxKind does not list JS or TS, ScriptKind does. Here's the diff below.

--- a/API Reference.md
+++ b/API Reference.md
@@ -296,12 +296,12 @@ The enum values can be imported from the typescript package. The available opt

const ts = require('typescript');

-ts.SyntaxKind.Unknown; // = 0
-ts.SyntaxKind.JS; // = 1
-ts.SyntaxKind.JSX; // = 2
-ts.SyntaxKind.TS; // = 3
-ts.SyntaxKind.TSX; // = 4
-ts.SyntaxKind.External; // = 5
+ts.ScriptKind.Unknown; // = 0
+ts.ScriptKind.JS; // = 1
+ts.ScriptKind.JSX; // = 2
+ts.ScriptKind.TS; // = 3
+ts.ScriptKind.TSX; // = 4
+ts.ScriptKind.External; // = 5

Add message does not allow null values

The addMessage method on the Extractor class does not allow to add messages where some fields are null. But the definition of the IMessage interface allows null values for text, textPlural and context.

Order of messages in generated POT file

The generated POT file's messages are sorted alphabetically (i.e. msgid "a" is above msgid "b")

In other implementations such as Python's Babel (http://babel.pocoo.org) messages are sorted by source file path. For example

base/a.py
base/b.py
base/a/a.py
base/a/b.py

Note that files in 'higher' directories are kept at the top and subdirectories come afterwards.

When messages are present in multiple files babel seems to select the directory with the least subdirectories as the sort key.

This makes translators lives easier in tools that rely on the order of messages in the PO files as similar messages are grouped together.

Can we have an option to set the sort order of the generated POT file?

Disable line numbers

Hi,

Is it possible to disable the output of line numbers in references field? The idea behind is to reduce the amount of noise in gettext catalogues because our codebase changes a lot but translations normally remain the same, so when running the gettext-extractor we get a dirty git because of the line numbers change.

extracting comments not working in assignment

Hi,
it seems that extracting comments is not working properly if the calleName is in an assignment. But not always.
This is what I get:

    let foo;

    function f(){
        foo=/* global foo leading not working */ lan.gettext('global foo leading not working');
        foo= lan.gettext('global foo trailing not working'); /* global foo trailing not working */

        let local_foo=/* local foo leading not working */ lan.gettext('local foo leading not working');
        let local_foo1= lan.gettext('local foo trailing working'); /* local foo trailing working */
    }

How to solve this?
Thank you

Emanuele

Comments don't get extracted

Hi Lukas!
This is great utility, but could you please tweak an algorithm for extracting comments a bit.
E.g. in this situation:

<Text style={styles.listitemtextbold} numberOfLines={1}>
  {/* Translators: 'grade' in the meaning of a school class, e.g. 'sixth grade'. */
  gettext('Select grade')}
</Text>

comments don't get extracted at all.
If I put them on the same line, it works, but eslint / prettier doesn't like it and starts complaining (underlining)...
I think you need to include comments which are on the line above, if no comments found on the same line.
Thanks!

Cannot extract context strings from identifiers

Hi,

thanks for your great work! I encountered a smallish problem where the extractor fails to extract context strings that are not string literals from js/ts files. I like to put all my context strings into a typescript const enum to keep them dry.

Would it be feasible for your parser to resolve identifiers that point to stringifyable const and enum values in context arguments?

Can not add multiline strings with JsExtractor

When using \n in a string, the \n gets removed in the generated .pot file. Many po editors generate something like the following when there is a multiline string as msgid:

msgid ""
"line 1\n"
"line 2"

Can it support flow-type/babel ?

Hi, I'm using the gettext-extractor to implement the i18n in a Flow+JSX project. It's pretty good and very useful!

Until now, all my files work well with gettext-extractor, except this one below:

// @flow
import { type SearchApp } from '../../components/AppSearchSelector/types';

const getAppSearchSelectorOptions = (
    searchTerm: string,
    callback: (
        error: ?{},
        {
            options: Array<SearchApp>,
        }
    ) => *
) => {};

const platforms = {
    test: { id: 'ios', name: gettext('this is test') },
};

For this file, the gettext-extractor can not extract the translations text this is test, and also doesn't throw any error.

Now I can work around it by change the Array<SearchApp> to Array<{ name: string }>, or just delete the line error: ?{},, then it works well and the text can be picker up.
Is there something wrong with the syntax parsing ?{} or Array<SearchApp>?

So is there any chance we can support the parsing for flow + JSX files ? And also it's better to throw some error/warning message when can not parse some content.

Thanks a lot !

Create a CLI to complement your lib

Just tried your lib, it works perfectly for my use case with TypeScript/node-gettext 👍

Your lib is great, but it would be even better if there was a CLI with it that use some json config (either in a standalone .gettext-extract or directly embedded in package.json).

Are you open for a PR on this? Or do you prefer if I create a separate project for the CLI?

Thanks again for the great work.

illegal operation on a directory

Copied the example code, unmodified on one line.
running it in node REPL
on the keybase/client code base
read/write access to all files, fresh checkout
node v12.21.0

anything else that I can try?
thanks

> extractor.createJsParser([ JsExtractors.callExpression('getText', { arguments: { text: 0, context: 1 } }), JsExtractors.callExpression('getPlural', { arguments: { text: 1, textPlural: 2, context: 3 } }) ]).parseFilesGlob('./src/**/*.@(ts|js|tsx|jsx)');

Uncaught Error: EISDIR: illegal operation on a directory, read
    at Object.readSync (fs.js:568:3)
    at tryReadSync (fs.js:353:20)
    at Object.readFileSync (fs.js:390:19)
    at JsParser.parseFile (/home/mathieu/projects/opensource/node_modules/gettext-extractor/dist/parser.js:57:29)
    at JsParser.parseFilesGlob (/home/mathieu/projects/opensource/node_modules/gettext-extractor/dist/parser.js:65:18) {
  errno: -21,
  syscall: 'read',
  code: 'EISDIR'
}

[Bug?|Question] Function name inside calleeName using callExpression

When configuring the extractor to look for the following:

JsExtractors.callExpression('context.getGt().gettext')

It fails to find code defined as:

new Error(context.getGt().gettext('Hello'));

Is this supported? Or is there something I need to write to escape the parenthesis.

savePotFile() headers order

I tried to add some headers using the savePotFile(fileName, [headers]) method, it works well but the "Content-Type: text/plain; charset=UTF-8\n" header that is added automatically is at the bottom wheras it should be the first header.

Should be simple to fix :)

Better exception message

When extractor tries to extract a given line which contains syntaxt error, for example it contains

translate.instant('test1' + 'text2')

, it fails with following error, which is alright:

.\node_modules\gettext-extractor\dist\js\extractors\factories\callExpression.js:93
    let concatenated = ts.createStringLiteral('');
                                       ^
TypeError: ts.createStringLiteral is not a function

However, the stacktrace does not contain any relevant information about the file and line number, where the extraction failed.

Would it be possible to include this information somehow, so developer knows where he made an error? Maybe adding some try/catch block.

Thanks in advance.

Support for Handlebar syntax. {{#translate}} Hello world {{/translate}}

Hello, I have succesfully implemented your extractor. Great work btw! 👍
Unfortunately I can't use element tags in my html template because I use handlebar for translations. Is it possible to use the createHtmlParser but then for a non element, but a partial string? In my case I have to scan for {{#translate}} Hello world {{/translate}}
I could fork your repository, and try to create scan for this but could you point me in the right direction where I could make the adjustment?

How do I wrap strings for translation?

I guess I have to wrap strings that should be extracted in some way ( __("Translate this.") or the like ), right?

What will this extractor look for? Or where in the configuration can I set this ... if.

Can't extract comments from some situations

Hi, thanks for this project -

I have some situations where comments are not properly extracted. I don't have an exact test case, but it's something like:

multiple map key assignments

var x = {
   "foo": translate("foo"), /// TRANSLATORS: description of foo
   "bar": translate("bar"), /// TRANSLATORS: description of bar
   "baz": translate("baz"), /// TRANSLATORS: description of baz
}

In this case only the first comment would be extracted and matched.

Some long concatenation

var x = "foo" +
   /// TRANSLATORS: comments for bar
   translate("bar") + 
   /// TRANSLATORS: comments for baz
   translate("baz") +
   "foo"
;

In this case none of the comments are extracted and matched.

I read the discussion in #4 (comment) so I guess these constructs have some different representation in the AST that is not understood.

In my codebase I can work around this by extracting to a local variable first before building the longer construct. The comments can be extracted in this way, but, it's less clear.

Can't extract from angular template component input attributes

I have the following angular template:

<div [myAngularInput]="__('Click to Import')">{{__('Import')}}</div>

The 'Import' string is successfully extracted using your lib, but the 'Click to Import' text is ignored. I assume this happens because the latter is not enclosed in double curly braces. Is there a way we could make this work?

Thanks in advance.

This is my core extractor code:

const ALLOWED_METHODS = ['[this].__', '__', '[this].translate.getInstant', 'TranslateService.getInstant'];

let extractor = new GettextExtractor();

extractor
    .createJsParser([
        JsExtractors.callExpression(ALLOWED_METHODS, {
            arguments: {
                text: 0,
                context: 1
            }
        }),
        JsExtractors.callExpression(ALLOWED_METHODS, {
            arguments: {
                text: 1,
                textPlural: 2,
                context: 3
            }
        })
    ])
    .parseFilesGlob('./src/app/**/*.@(js|jsx|ts|html)');

Support template literals

Template literals are an integral part of modern JS (as well as TS). I have seen you use them in your codebase, yet they are not recognized as strings when using the extractor.

// Not parsed
gt.gettext(`Starts on ${startsOn}`)

  0 messages extracted
  --------------------------
  0 total usages
  0 files (0 with messages)
  0 message contexts

// Parsed
gt.gettext('Starts on ${startsOn}')

   1 message extracted
  -----------------------------
   1 total usage
  18 files (1 with messages)
   1 message context (default)

Obviously template literals are very valuable in translatable strings as they allow translators to move parameters around as needed in their locale.

In my opinion this is a critical feature to support. I'm open to working on a PR for this if needed.

gettext-extractor fails when glob matches a directory

I had a directory named chart.js which caused gettext-extractor to try to incorrectly read it as a file. I had to console.log every matched fileName to find the culprit. Do you think it may make sense to add special handling for this case?

node:fs:756
  handleErrorFromBinding(ctx);
  ^

Error: EISDIR: illegal operation on a directory, read
    at Object.readSync (node:fs:756:3)
    at tryReadSync (node:fs:437:20)
    at Object.readFileSync (node:fs:483:19)
    at JsParser.parseFile (/Users/federicobond/code/signatura-connect/node_modules/gettext-extractor/dist/parser.js:58:29)
    at JsParser.parseFilesGlob (/Users/federicobond/code/signatura-connect/node_modules/gettext-extractor/dist/parser.js:66:18)
    at file:///Users/federicobond/code/signatura-connect/js-gettext.mjs:21:4
    at ModuleJob.run (node:internal/modules/esm/module_job:193:25) {
  errno: -21,
  syscall: 'read',
  code: 'EISDIR'
}

Issues with latest pofile

I just deleted my package-lock.json and reinstalled everything, so I'm pickup up the latest gettext-extractor and pofile, and I'm getting these errors in my build:

node_modules/gettext-extractor/dist/extractor.d.ts(1,25): error TS2497: This module can only be referenced with ECMAScript imports/exports by turning on the 'allowSyntheticDefaultImports' flag and referencing its default export.
node_modules/gettext-extractor/dist/extractor.d.ts(23,36): error TS2702: 'pofile' only refers to a type, but is being used as a namespace here.
node_modules/gettext-extractor/dist/extractor.d.ts(24,53): error TS2702: 'pofile' only refers to a type, but is being used as a namespace here.
node_modules/gettext-extractor/dist/extractor.d.ts(25,58): error TS2702: 'pofile' only refers to a type, but is being used as a namespace here.

No extraction sometimes

I've posted it in the svelte-specific fork, but quickly realisied I get same with a bare. Some strings do not get extracted as expected, so I've tried to narrow it down to the following:

<script lang="ts">
    import { __ } from '$lib/i18n';
</script>

<!-- These will be extracted: -->
<div>{@html __('More Settings')}</div>
<button type="button" aria-label="{__('I agree')}">{__('I agree')}</button>

<!-- These won't: -->
{__('Settings')}
<div>{__('Other Settings')}</div>
<button type="button">{__('I still agree')}</button>

Parser and config:

import { GettextExtractor, JsExtractors } from 'gettext-extractor';

let extractor = new GettextExtractor();

extractor
	.createJsParser([
		JsExtractors.callExpression('__', {
			arguments: {
				text: 0
			}
		})
	])
	.parseFilesGlob('src/**/Test.svelte');

extractor.savePotFile('./src/translations/source.pot');
extractor.printStats();

Did I miss something in the config?
Thanks in advance for looking into this!

(feature/advice) Parse HTML from JS strings

In my softwares, I have template literals from which I generate HTML components. Example:

new Gui(`<span><i18n>Hello</i18n>, %{name}!</span>`, { name: "Lukas" })

However, this "Hello" won't get extracted. So for now I do:

new Gui(`<span>${__("Hello")}, %{name}!</span>`, { name: "Lukas" })

However, this is a bad solution as translations can now inject html & executable scripts.

Is there any other way to make those translatable string extractable?

Empty line comments are not properly extracted

Hi,

As per your suggestion in #22 and based on docs, I use multiple single line comments prefixed with the keyword, in my case it's TRANSLATORS:.

However the empty lines are not extracted properly, consider this:

// TRANSLATORS: Product price label
// TRANSLATORS:
// TRANSLATORS: Available placeholder:
// TRANSLATORS: {currency} - the runtime substitution for the price currency
// TRANSLATORS: {amount}   - the runtime substation for the amount

Extracted as:

#. Product price label
#. TRANSLATORS:
#. Available placeholder:
#. {currency} - the runtime substitution for the price currency
#. the runtime substation for the amount

I would expect that the second line in the generated POT has to be an empty string, i.e: #. only.

The regex I use: /^TRANSLATORS:\s*(.*)$/.
Running it manually via match works fine:

> "TRANSLATORS:".match(/^TRANSLATORS:\s*(.*)$/)
[ 'TRANSLATORS:',
  '', // <-- empty string - ALL GOOD
  index: 0,
  input: 'TRANSLATORS:',
  groups: undefined ]

Incomplete plural form returned from toPotString

This is more of an assumption, because I don't have a deep understanding of the pot file structure, however it seems that some parts of the plural msgstr is omitted.

Looking into the code, it looks like this is from a third party library. But what I would expect from:

gettext("One new message", "{{n}} new messages", 1);

is the following:

#: tests/e2e/fixtures/js/view.jsx:22
msgid "One new message"
msgid_plural "{{n}} new messages"
msgstr[0] ""
msgstr[1] ""

instead I receive:

#: tests/e2e/fixtures/js/view.jsx:22
msgid "One new message"
msgid_plural "{{n}} new messages"
msgstr[0] ""

The change is fairly minor, and applications such as poedit seem to deal with it alright, however when parsing it to json using i18next-gettext-converter it does not include the plural form in the output. If it looks like the first example when running through the convert, it does work.

To fix this for now, I run the output generated from gettext-extractor through pofile which adds the missing values. This suggests that the extra values should be there (and that does make sense).

pofile has the added benefit of including some headers as well.

Allow #

Something went wrong

gettext-extractor does not support private #

gettext-extractor does not support private #, we need to remove them

Variable with type and without initializer causes error

This code

let a = __('hello'), b: string

throws:

node_modules/gettext-extractor/dist/js/extractors/comments.js:112
            if (lineNumber === sourceFile.getLineAndCharacterOfPosition(nodes[index + 1].getStart()).line) {
                                                                                         ^

TypeError: Cannot read properties of undefined (reading 'getStart')

And this works:

let a = __('hello'), b: string = ''

Update typescript dependency requirement

Since typescript is required in version 2-4, it doesn't support any minor or patch versions of the latest typescript.

https://github.com/lukasgeiter/gettext-extractor/blob/master/package.json#L41

I'm not sure if it could be a devDep only, or if the range should just be widened.

Thanks.

Can I pass in an array of strings to this?

Would be nice to be able to pass an array of strings to this, as I have more specific needs than just being able to point to a src directory.

I traverse modules for dependencies, and would like to be able to only extract strings from these files, as I tag them on my localizing service.

Character encoding issues

Hi!

First of all, this library looks very interesing, really good work!

I am looking for a tool to extract the messages of a website whose base language is Spanish. After trying to extract the messages, I noticed that there's an issue with accents:

#: src/components/layout/AppHeader.vue:123
msgid "Reg�strate"
msgstr ""

It should be:

#: src/components/layout/AppHeader.vue:123
msgid "Regístrate"
msgstr ""

Ths source files of the project are encoded in UTF-8.

Here's the extraction script:

import { GettextExtractor, JsExtractors } from 'gettext-extractor';

const extractor = new GettextExtractor();

extractor
    .createJsParser([
        JsExtractors.functionCall('$t', {
            arguments: {
                text: 0
            }
        }),
        JsExtractors.methodCall('Vue', 't', {
            arguments: {
                text: 0
            },
            ignoreMemberInstance: true
        })
    ])
    .parseFilesGlob('./src/**/*.@(js|vue)');

extractor.savePotFile('./i18n/js-messages.pot');

extractor.printStats();