Code Monkey home page Code Monkey logo

lexical-models's Introduction

Open Source Keyman lexical models

File Layout

Models are grouped into two folders:

  • release -
  • experimental -

Within each of the folders, models are further grouped by the template author/bcp47.[uniq]. For example, the folder structure may be:

  • release/example/en.custom/

The components must be lower case and are:

  • author: a short unique identifier, such as nrc or sil.
  • bcp47: the canonical BCP 47 tag for the model. For example km for Khmer, or en-au for Australian English.
  • uniq: an optional component that can be provided when a given language has multiple models from a single author. For example, en.custom vs en.wordlist. We do recommend always using a uniquifer even if there are no current plans to produce more than one for a language.

Building Models

Preqrequisites

  • Node.js
  • Git for your platform
  • You will need to use Git Bash or equivalent to build.

Build instructions

build.sh can be used to build all the models from the command line.

  • Common build.sh parameters:
    • -t, -test Runs tests on models
    • -b, -build Creates compiled models
    • -c, -clean Cleans intermediate and output files
    • -no-npm Skip all npm steps
    • -s Quiet build
    • [target] The specific model(s) to build, e.g. release or release/example/en.template

lexical-models's People

Contributors

anvalon avatar bennylin avatar caforbes avatar darcywong00 avatar davidlrowe avatar dotland avatar dy2288 avatar dyacob avatar eddieantonio avatar erros84 avatar ind-nt avatar jahorton avatar jeffheath-sil avatar katelem24 avatar lornasil avatar madskinner2 avatar makarasok avatar mattgyverlee avatar mcdurdin avatar meng-heng avatar nnyny avatar postmodernenglish avatar rmlockwood avatar sapradhan avatar shavian-info avatar shiami avatar sku21 avatar svarnimn avatar tomasbm01 avatar victoriaq22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lexical-models's Issues

Error when building models when .kps indicates mixed path separators

The .kps seems to have hardcoded paths with backslashes in the <Files> section:

    <File>
      <Name>..\build\example.en.custom.model.js</Name>
      <Description>Lexical model example.en.custom.model.js</Description>
      <CopyLocation>0</CopyLocation>
      <FileType>.model.js</FileType>
    </File>

This fails to build on platforms with / path separators (e.g., macOS, Linux).

Validating model /Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/
Building model /Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/
fs.js:115
    throw err;
    ^

Error: ENOENT: no such file or directory, open '../source/..\build\example.crk.wordlist_wahkohtowin.model.js'
    at Object.openSync (fs.js:439:3)
    at Object.readFileSync (fs.js:344:35)
    at /Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/tools/kmp-compiler.js:104:27
    at Array.forEach (<anonymous>)
    at KmpCompiler.buildKmpFile (/Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/tools/kmp-compiler.js:103:27)
    at LexicalModelCompiler.compile (/Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/tools/index.js:141:21)
    at Object.<anonymous> (/Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/release/example/crk.wordlist_wahkohtowin/source/model.js:4:23)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
Validating model /Users/santoseadmin/Work/lexical-models/release/example/en.custom/
Building model /Users/santoseadmin/Work/lexical-models/release/example/en.custom/
fs.js:115
    throw err;
    ^

Error: ENOENT: no such file or directory, open '../source/..\build\example.en.custom.model.js'
    at Object.openSync (fs.js:439:3)
    at Object.readFileSync (fs.js:344:35)
    at /Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/tools/kmp-compiler.js:104:27
    at Array.forEach (<anonymous>)
    at KmpCompiler.buildKmpFile (/Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/tools/kmp-compiler.js:103:27)
    at LexicalModelCompiler.compile (/Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/tools/index.js:141:21)
    at Object.<anonymous> (/Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/release/example/en.custom/source/model.js:4:23)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
Validating model /Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/
Building model /Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/
fs.js:115
    throw err;
    ^

Error: ENOENT: no such file or directory, open '../source/..\build\example.en.wordlist.model.js'
    at Object.openSync (fs.js:439:3)
    at Object.readFileSync (fs.js:344:35)
    at /Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/tools/kmp-compiler.js:104:27
    at Array.forEach (<anonymous>)
    at KmpCompiler.buildKmpFile (/Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/tools/kmp-compiler.js:103:27)
    at LexicalModelCompiler.compile (/Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/tools/index.js:141:21)
    at Object.<anonymous> (/Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/release/example/en.wordlist/source/model.js:4:23)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)

Deprecating gff.ti.gff_tigrinya

The gff.ti.gff_tigrinya lexical model is based on the Unilex project's wordlist for Tigrinya. Unfortunately, the contents contain many misspellings and non-Tigrinya words that come corpus of unknown provenance and pedigree. The contents also combine conflicting spelling conventions of both Eritrea and Ethiopia which also impact the frequency counts negatively.

An approach that would better meet user expectations is to have separate wordlists for each region. PR #216 and #217 address this directly. The gff.ti.gff_tigrinya lexicon can then be deleted from the repository or moved into a legacy directory if there is interest to preserve it.

Remove <FollowKeyboardVersion/> from .kps files

While creating a guide for submitting lexical models to the repo, I ran into a Keyman Developer package compilation error using the "Build all" button in the Projects view.

nrc.en.mtnt.model.ts: Compiling 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\source\nrc.en.mtnt.model.ts'...
nrc.en.mtnt.model.ts: Success: 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\source\nrc.en.mtnt.model.ts' was compiled successfully  to 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\build\nrc.en.mtnt.model.js'.
nrc.en.mtnt.model.kps: Compiling package nrc.en.mtnt.model.kps...
nrc.en.mtnt.model.kps: Fatal Error: The option "Follow Keyboard Version" is set but there are no keyboards in the package.
nrc.en.mtnt.model.kps: Failure: 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\source\nrc.en.mtnt.model.kps' was not compiled successfully.

Since we decided the package version is what gets used for lexical model version, we should

  1. Remove <FollowKeyboardVersion/> in all the .kps files
  2. Remove <Version> </Version> within the <LexicalModel> nodes.

Configuration

Keyman Developer 12.0.58.0 stable
Latest master branch of lexical-models repo

feat(nrc.en.mtnt): Add languageUsesCasing flag

This should be a bug on https://github.com/keymanapp/lexical-models/ against the MTNT model:

const source: LexicalModelSource = {
format: 'trie-1.0',
wordBreaker: 'default',
sources: ['mtnt.tsv']
};

Cross-reference with keymanapp/keyman#3824.

The predictive engine will now detect the casing pattern used by the current context (when languageUsesCasing == true)...

Originally posted by @jahorton in keymanapp/keyman#4115 (comment)

[jo.wbl-cyrl-tj.wakhi_cyrillic_minimal] add casing

release/jo/jo.wbl-cyrl-tj.wakhi_cyrillic_minimal

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

Move lexical model compiler to keyman repo

So. A few things:

  1. The merge of the model info should warn if there are mismatching version numbers, and ...
  2. The version number shouldn't be in the .model_info, and ...
  3. The canonical data is always in the source files, not the .model_info, because ...
  4. Lexical models can be built, stored and distributed outside the lexical-models repository and CI chain. Those lexical models won't have a .model_info file, because that is used only for the CI deployment. It is important we don't end up building too many dependencies on the repository structure by accident because it is likely, if we are successful, that this repo will get very large. The intent is for models in this repo to be of high release-level quality and we generally want to discourage experimental models here, because models here can be automatically installed by Keyman. This leads me to conclude that ...
  5. The compiler needs to be moved out of the lexical-models repo and into the keyman repo, and the .model_info management needs to be clearly delineated. Now, the compiler is part of the Keyman Developer toolchain. So I think what we need to do is update the build process to pull the latest stable (or $tier) version of the compiler from downloads.keyman.com (so that's something that needs to be added to the Keyman Developer CI). We could register an NPM package but I don't think we are quite ready to do that; happy to defer to wiser heads though. For consistency, then ...
  6. This same process needs to be done with the keyboards repo -- so we don't include the kmcomp compiler in the repo, rather pull it at first build (or at appropriate times) from downloads.keyman.com.

Originally posted by @mcdurdin in #14 (comment)

[newa] Newa autocomplete not working

Not getting proper suggestions after newa wordlist 2.0 is installed. I get only this suggestion ( 𑐺𑐕𑐣𑑂𑐟 ) when one letter is typed, (regardless of what is typed) and no suggestions when more than one is typed.
distributed

However if i build locally and then install, suggestions are shown as expected.
local-build

this is the build that works

[LMLayer][Android] Angle quotes not showing up correctly - unicode issue?

I get the following in my keyboard. Note the diamond with question mark characters.
image
I'm not sure if this is a bug with keyman or an issue in how the .ts file should be done.

My .ts file looks like this:
/*Gilaki wordlist ptwl1 1.0 */

const source: LexicalModelSource = {
format: 'trie-1.0',
wordBreaker: 'default',
sources: ['wordlist.tsv'],
punctuation: {
quotesForKeepSuggestion: {
open: "«", close: "»"
},
...
This is for lexical model: sil.glk-arab.ptwl1

[sil.bcc-latn.upp_ptwl1] add casing

@rmlockwood
release/sil/sil.bcc-latn.upp_ptwl1

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

bug(sil_brao): word picked from the word suggestion also replace the symbols next to the letter

The behavior of the issue (Video).

Details: If a symbol is standing next to the letter that is being predicted for words then when the word is chosen (clicked) the symbols would disappear too.
Symbol: ៗ, (), #, $, @, []...etc.
Please let me know if more information is needed.

Keyman apps

  • Keyman Developer
  • Keyman for iOS

Keyboard name

  • sil_brao

Keyman version

  • 16.0.144-stable (Keyman Developer)
  • 17.0.257-alpha (Keyman for iOS)

Operating system

  • Windows 10
  • iOS version 16.3.1

Keyboard version

  • 1.0

Language name

  • Brao

Additional context

Relevant issue

[gff.xan.gff_xamtanga] bug: gff.xan.gff_xamtanga.model_info seems to be missing

09:39:09   Uploading /c/BuildAgent/work/e22cfa4d1a6faf97/models/release/gff/gff.xan.gff_xamtanga/
09:39:09   Failed to locate /c/BuildAgent/work/e22cfa4d1a6faf97/models/release/gff/gff.xan.gff_xamtanga/build/gff.xan.gff_xamtanga.model_info
09:39:09   Aborting with error 1

Per https://build.palaso.org/buildConfiguration/Keyman_Models_BuildAndDeploy/384842?buildTab=log&focusLine=2037&logView=flowAware&linesState=1993, https://build.palaso.org/buildConfiguration/Keyman_Models_BuildAndDeploy/384844?buildTab=log&focusLine=0&logView=flowAware&linesState=1681

[benny_lin.id.kamus_indonesia] add casing

@bennylin
release/benny_lin/benny_lin.id.kamus_indonesia

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

[brao] bug: not working as expected when there are at least two characters to the left of the caret

Describe the bug

Screenshot_1616397167

When the left context is "[space] ឆ្រ", the suggested words in the banner are not expected, i.e. not the words beginning with ឆ្រ-.

To Reproduce

  1. Install this keyboard and lm 'brao_keyboard_and_lm.zip' on 14.0.266-beta
  2. In the text editor area, type in "ឆ្រ ឆ្រ"
  3. Keep the cursor to the right of the second string
  4. See error

Expected behavior

Words beginning with ឆ្រ should be in the suggestion.


Keyman for Android:

  • Device: Pixel 2 API 29
  • OS: Android 10
  • Keyman version: 14.0.266-beta
  • Target application: Keyman

Keyboard

  • Keyboard name: Brao (SIL)
  • Keyboard version: 1.0
  • Language name: Brao

Additional context

Regular spaces are used in between words, but the model still cannot provide meaningful suggestion when there are a consonant + subscript (i.e. ឆ្រ) to the left of the caret.

The model is able to provide meaningful and expected suggestions when there are "a consonant and a diacritic/vowel" to the left of the caret, i.e. កំ, ឆា.

កំ >> កំឡាំង | កំប្រឹន | កំប្លីង

Screenshot_1616397619

ឆា >> ឆា | ឆាល់ | ឆារ

Screenshot_1616397835

bug(sil_jarai): khmer word prediction does not disassociate any letter after a symbol

The behavior of the issue (Video).

Details: The word predicted from typing replaced the letter AND the symbol between them. It seems that the word prediction is only associated with the letter in front of the symbol. Anything that comes after the symbol is associated with the letter in front of the symbol.
Please let me know if more information is needed.

Keyman apps

  • Keyman Developer

Keyboard name

  • sil_jarai

Keyman version

  • 16.0.144-stable

Operating system

  • Windows 10

Keyboard version

  • 1.0

Language name

  • Jarai

Additional context

Relevant issue

[nrc.en.mtnt] Revise wordlist

From a team review of the Keyman for Android UX (keymanapp/keyman#7161)

Aside from the contractions issue noted in #143, @mcdurdin notes the default English lexical-model wordlist needs the following adjustments:

Add common words such as:

  • Covid (the original wordlist gathered from reddit was pre-covid)
  • Qantas (airline) (and a number of other brands!)
  • Coronavirus

Remove these entries (along with any other typos found):

  • becasue 10
  • être 6
  • reccomend 5
  • sheild 5

Is there any value in keeping single-character entries (e.g. $ 1898)?

[wyc_eth.mym-latn.me_en] add casing

release/wyc_eth/wyc_eth.mym-latn.me_en

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

What is the purpose of the .kps file?

What purpose does the .kps file serve separate from the .model_info file? The only think I figure is that the KMP compiler needs it as an argument :/

Can the .kps file be automatically generated? Having to change things in multiple locations (model.ts, .model_info, .kps) is tedious and error prone.

bug(nrc.en.mtnt): `we're` not offered as a suggestion for `were`

I think this still needs to remain open for @jahorton to confirm this is resolved:

With a context of "were", I get 3 suggestions:
"were" (frequency 5309)
"weren't" (frequency 385)
"werent" (frequency 16)

"we're" has a frequency of 927 so I would expect it to appear

On my alpha build of Keyman for Android, I'm still not getting "we're" as a suggestion.

Originally posted by @darcywong00 in #143 (comment)

Reproduced in Keyman 17.0.219-alpha

bug(sil_jarai): khmer word prediction replacing the symbols with the word predicted

The behavior of the issue (Video).

Details: The word predicted from typing replaced the letter AND the symbols in front of the letter.
Please let me know if more information is needed.

Keyman apps

  • Keyman for iPhone and iPad

Keyboard name

  • sil_jarai

Keyman version

  • 17.0.254-alpha

Operating system

  • iOS 16.3.1

Device

  • iPhone 11 Pro Max

Keyboard version

  • 1.0

Language name

  • Jarai

Additional context

Relevant issue

bug(gff.byn.gff_blin): Blin Dictionary Downloads But Does Not Load

Describe the bug

The new gff.byn.gff_blin lexicon does not get setup to work with the gff_blin keyboard. The lexicon is discovered as an available dictionary, but after retrieval, does not appear as installed for the language and the words are not offered for selection. The keyboard continues to work as if there is no connected dictionary.

Reproduce the bug

  1. Keyman > Installed languages > Bilin > Dictionary (Check for available dictionary)
  2. The messages appear: "Checking for associated...." , "Downloading dictionary..." , "Dictionary download is finished" , "Resources successfully updated!"
  3. The "Check for available dictionary" text remains, it is not replaced with the dictionary name.
  4. Going into the Keyman app editor, or in local editors, and launching the Blin keyboard, the dictionary is not loaded, the top of the screen with the word options does not appear.
  5. I've restarted Keyman, and restarted my phone, but no change. Other keyboards work fine.
  6. Steps (1) and (2) are repeated, and the results are identical.

Expected behavior

Following the "Resources successfully updated!" message, the Blin keyboard would begin to offer the terminology from the dictionary for selection.

Related issues

No response

Keyman apps

  • Keyman for Android
  • Keyman for iPhone and iPad
  • Keyman for Linux
  • Keyman for macOS
  • Keyman for Windows
  • Keyman Developer
  • KeymanWeb
  • Other - give details at bottom of form

Keyman version

16.0.138

Operating system

Android 13

Device

Samsung S22

Target application

Keyman App, any editor

Browser

No response

Keyboard name

gff_blin

Keyboard version

1.5.1

Language name

Blin

Additional context

This issue could be related to BCP 47 code processing. Both the keyboard and lexicon use language ID "byn-Ethi" , this is important because there is also a Latin convention for writing Blin endorsed by the Eritrean government (thus "byn-Latn" is possible).

The lexicon file name does not include the -Ethi part, it is gff.byn.gff_blin. Still, it is odd that the dictionary is located, and downloaded, it just does not get associated with the keyboard afterwards.

[bennylin.jv-latn.bausastra_jawa] add casing

@bennylin
release/bennylin/bennylin.jv-latn.bausastra_jawa

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

bug: khmer model custom wordbreaker issues

Describe the bug

The crash happened after this activity was done. See the crash in action:

predictive.text.crashes.mov

Reproduce the bug

No response

Expected behavior

No response

Related issues

No response

Keyman apps

  • Keyman for Android
  • Keyman for iPhone and iPad
  • Keyman for Linux
  • Keyman for macOS
  • Keyman for Windows
  • Keyman Developer
  • KeymanWeb
  • Other - give details at bottom of form

Keyman version

17.0.104-alpha

Operating system

iOS 16.4

Device

iPhone Pro Max Simulator

Target application

No response

Browser

No response

Keyboard name

sil_jarai

Keyboard version

1.0

Language name

Jarai

Additional context

https://keyman.com/keyboards/sil_jarai?bcp47=jra-khmr

[sil_kmhmu] LM only matches the last character being typed

From the get go, I see that the prediction only match the last character being typed rather than a continuous string coming before it, i.e. When one type ເ, the model tries to match words beginning with that character, but then when the next character (ຄ) is typed, the model now tries to match word beginning with ຄ, not the combination of the two (ເຄ), so one may not be able to get the suggestion for words like ເຄືອນ at all. Talk to me if the description not understandable.

The lexical model package: https://drive.google.com/file/d/1Gsz6U5Ww45AjWbfiilLdmz7qYeg0mnKz/view?usp=sharing

The associated keyboard: https://keyman.com/keyboards/sil_kmhmu

Package compilation improperly bundles resources

While the individual resources for each model are properly built at present, the .model.kmp package file itself is not. The issue: both welcome.htm and the *.js file with compiled model code within the .model.kmp are instead relative paths to the individual build products, rather than copies of the actual files themselves.

Upon trying to actually utilize the .kmps for development work in our apps, we get something like the following:

image

As you might imagine, this rapidly generates errors and results in an unusable model.

[cjp-latn] update model for automatic case selection

[gonzalez_quint_coto.cjp-latn.cabecar]
Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enables use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

feat: build with kmc instead of kmlmc

Part of the 17.0-alpha work. @mcdurdin will work on this.

TODO:

  • We need to verify against the older kmlmp output for packages.
  • Add LICENSE.md to each package (#225)
  • Create a filelist and build it with a single command for performance (#226)

chore: Updates for Keyman 15.0

I was taking a prelimiary look that this repo will compile with Keyman 15 by setting the following changes in package.json

  "dependencies": {
    "@keymanapp/lexical-model-compiler": "^15.0.247-beta",
    "@keymanapp/models-types": "^15.0.247-beta",
  },
  "devDependencies": {
    "jszip": "^3.7.0",
  }

I had to add jszip in devDependencies (because of keymanapp/keyman#5770 ?)


The build.sh script halts on fv.bea.tsaadane.model.js

The target folder contains unexpected files:
fv.bea.tsaadane.model_info source source/fv.bea.tsaadane.model.js source/fv.bea.tsaadane.model.kmp
Aborting build
Aborting with error 999

Several of the lexical-models had warnings about duplicate words being found, but the build went to completion.
The duplicated words may not be properly encoded though?

For example:

Building model /c/src/lexical-models/release/shavian_info/shavian_info.en-shaw.readlex/
wordlist.tsv (10): Warning: 2802 duplicate word “𐑑” found in same file; summing counts
wordlist.tsv (35): Warning: 2802 duplicate word “𐑞𐑨𐑑” found in same file; summing counts

[jo.isk-cyrl-tj.ishkashimi_cyrillic_minimal_model] add casing

release/jo/jo.isk-cyrl-tj.ishkashimi_cyrillic_minimal_model

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

bug(sil_bunong): word picked from the word suggestion also replace the symbols next to the letter

The behavior of the issue (Video).

Details: If a symbol is standing next to the letter that is being predicted for words then when the word is chosen (clicked) the symbols would disappear too.
Symbol: ៗ, (), #, $, @, []...etc.
Please let me know if more information is needed.

Keyman apps

  • Keyman Developer
  • Keyman for iOS

Keyboard name

  • sil_bunong

Keyman version

  • 16.0.144-stable (Keyman Developer)
  • 17.0.257-alpha (Keyman for iOS)

Operating system

  • Windows 10
  • iOS version 16.3.1

Keyboard version

  • 1.5

Language name

  • Bunong

Additional context

Relevant issue

Rebuild certain LMs to use ES3 code generation

Relates to keymanapp/keyman#7926

The lexical-model compiler update in the fix keymanapp/keyman#7297 forces ES3 code generation to support Android 5.0 devices.

this will require version bumps on all affected LMs in order to deploy an updated version with the new version of the compiler once it lands.)

The fix is available in kmlmc 16.0.128-beta

And the following lexical-models will need to be rebuilt:

  • dotland.ru.russian
  • iles.chp.indigenous_nt
  • iles.dgr.indigenous_nt
  • sil.bcc-arab.upp_ptwl1
  • sil.glk-arab.ptwl1
  • sil.kjg-laoo.ptwl1
  • wyc_eth.mym-latn.me_en

"If omitted, builds all models" is untrue for -b and -t

The usage statement ends with "If omitted, builds all models" (speaking of the optional target argument).
This is true for a bare ./build.sh or for ./build.sh -c, but fails for any combination (that I've tried) that includes either -t or -b.
For example:
$ ./build.sh -c -b
Usage: ./build.sh [-t(est)|-b(uild)|-c(lean)] [-s] [target]
-t || -test Runs tests on models
-b || -build Creates compiled models
-c || -clean Cleans intermediate and output files
-s Quiet build
target The specific model(s) to build, e.g. release or release/example/en.template
If omitted, builds all models

Rename or move example template models

I think as we head to beta we should be changing our example template models to use a custom BCP47 tag such as qaa so that they don't appear as options to end users, esp as we have an English model now. We definitely need to do this for release. That means a bunch of renaming though!

I would also like to see the English model move out of 'example' and into something real.

An alternative is to move the example models out of release/ into a sample/ folder. Perhaps that's better?

If we do it in alpha, we avoid the need to have backward compat and can just do cleanup on the downloads server. Once we go to beta, we have to deprecate and have a deprecation pathway.

Himyarit Musnad

Dear all I am build lexicon for used with himyarit Musnad keyboard. I didn't have idea how can add fills her to you can you help@mcdurdin

[Meta] Rename folders from e.g. nrc/str.sencoten to nrc/nrc.str.sencoten

For ease of dealing with folder names, and consistency with the keyboards repo and the downloads.keyman.com folder names, I'd like to rename the model folders to match the model id in its entirety. This simplifies some processes that then don't need to parse the model id in order to derive the folder name.

A cursory look at this suggests that this is a build environment change only, with no other significant side-effects.

Change name of repo to keymanapp/dictionaries

Obviously, this incurs MASSIVE changes to e.g., the API, and the CI, but—we've been claiming that users will be installing dictionaries and not lexical models. Just like users were able to download and publish keyboards from keymanapp/keyboards, they should be able to download and publish dictionaries from keymanapp/dictionaries.

@mcdurdin @darcywong00 @jahorton, your comments are very welcome!

[ptg.nan-latn-tw.taigipoj] add casing

@shiami
experimental/ptg/ptg.nan-latn-tw.taigipoj

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(2) README.md will need the version number changed. Probably the copyright date (or date range) will need to change as well, for example from "(c) 2020 Acme, Inc." to "(c) 2020-2021 Acme, Inc."

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.