The lexical-models from keymanapp

Error when building models when .kps indicates mixed path separators

The .kps seems to have hardcoded paths with backslashes in the <Files> section:

    <File>
      <Name>..\build\example.en.custom.model.js</Name>
      <Description>Lexical model example.en.custom.model.js</Description>
      <CopyLocation>0</CopyLocation>
      <FileType>.model.js</FileType>
    </File>

This fails to build on platforms with / path separators (e.g., macOS, Linux).

Validating model /Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/
Building model /Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/
fs.js:115
    throw err;
    ^

Error: ENOENT: no such file or directory, open '../source/..\build\example.crk.wordlist_wahkohtowin.model.js'
    at Object.openSync (fs.js:439:3)
    at Object.readFileSync (fs.js:344:35)
    at /Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/tools/kmp-compiler.js:104:27
    at Array.forEach (<anonymous>)
    at KmpCompiler.buildKmpFile (/Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/tools/kmp-compiler.js:103:27)
    at LexicalModelCompiler.compile (/Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/tools/index.js:141:21)
    at Object.<anonymous> (/Users/santoseadmin/Work/lexical-models/release/example/crk.wordlist_wahkohtowin/build/obj/release/example/crk.wordlist_wahkohtowin/source/model.js:4:23)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
Validating model /Users/santoseadmin/Work/lexical-models/release/example/en.custom/
Building model /Users/santoseadmin/Work/lexical-models/release/example/en.custom/
fs.js:115
    throw err;
    ^

Error: ENOENT: no such file or directory, open '../source/..\build\example.en.custom.model.js'
    at Object.openSync (fs.js:439:3)
    at Object.readFileSync (fs.js:344:35)
    at /Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/tools/kmp-compiler.js:104:27
    at Array.forEach (<anonymous>)
    at KmpCompiler.buildKmpFile (/Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/tools/kmp-compiler.js:103:27)
    at LexicalModelCompiler.compile (/Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/tools/index.js:141:21)
    at Object.<anonymous> (/Users/santoseadmin/Work/lexical-models/release/example/en.custom/build/obj/release/example/en.custom/source/model.js:4:23)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
Validating model /Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/
Building model /Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/
fs.js:115
    throw err;
    ^

Error: ENOENT: no such file or directory, open '../source/..\build\example.en.wordlist.model.js'
    at Object.openSync (fs.js:439:3)
    at Object.readFileSync (fs.js:344:35)
    at /Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/tools/kmp-compiler.js:104:27
    at Array.forEach (<anonymous>)
    at KmpCompiler.buildKmpFile (/Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/tools/kmp-compiler.js:103:27)
    at LexicalModelCompiler.compile (/Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/tools/index.js:141:21)
    at Object.<anonymous> (/Users/santoseadmin/Work/lexical-models/release/example/en.wordlist/build/obj/release/example/en.wordlist/source/model.js:4:23)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)

Deprecating gff.ti.gff_tigrinya

The gff.ti.gff_tigrinya lexical model is based on the Unilex project's wordlist for Tigrinya. Unfortunately, the contents contain many misspellings and non-Tigrinya words that come corpus of unknown provenance and pedigree. The contents also combine conflicting spelling conventions of both Eritrea and Ethiopia which also impact the frequency counts negatively.

An approach that would better meet user expectations is to have separate wordlists for each region. PR #216 and #217 address this directly. The gff.ti.gff_tigrinya lexicon can then be deleted from the repository or moved into a legacy directory if there is interest to preserve it.

Remove <FollowKeyboardVersion/> from .kps files

While creating a guide for submitting lexical models to the repo, I ran into a Keyman Developer package compilation error using the "Build all" button in the Projects view.

nrc.en.mtnt.model.ts: Compiling 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\source\nrc.en.mtnt.model.ts'...
nrc.en.mtnt.model.ts: Success: 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\source\nrc.en.mtnt.model.ts' was compiled successfully  to 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\build\nrc.en.mtnt.model.js'.
nrc.en.mtnt.model.kps: Compiling package nrc.en.mtnt.model.kps...
nrc.en.mtnt.model.kps: Fatal Error: The option "Follow Keyboard Version" is set but there are no keyboards in the package.
nrc.en.mtnt.model.kps: Failure: 'C:\src\lexical-models\release\nrc\nrc.en.mtnt\source\nrc.en.mtnt.model.kps' was not compiled successfully.

Since we decided the package version is what gets used for lexical model version, we should

Remove <FollowKeyboardVersion/> in all the .kps files
Remove <Version> </Version> within the <LexicalModel> nodes.

Configuration

Keyman Developer 12.0.58.0 stable
Latest master branch of lexical-models repo

Add `bal` to `bcc` lexical-models

Yes, we should update those models to also include bal (don't remove bcc at this point to maintain compatibility with 13.0 keyboards)

Originally posted by @mcdurdin in keymanapp/keyboards#1452 (comment)

Document how to create predictive keyboard

From @eddieantonio via email: You need an TSV file with words and optionally counts. You can just make a two column Excel spreadsheet with rows and counts, if that’s convenient for you. Instructions do not exist yet.

[nrc.en.mtnt] More contractions in suggestions

A user suggested that more contractions are given in the suggestion banner of the EuroLatin (SIL) keyboard.

chore: Cleanup includes/ folder?

As of stable-14.0, are the Type files in includes/ still needed?

LMLayer.d.ts
LMLayerWorker.d.ts
message.d.ts

If they're meant to keep in sync with the main repo as the readme says, I note they're out of date with common\predictive-text\build in the main repo.

Maybe cause we're using https://www.npmjs.com/package/@keymanapp/models-types now?

bug: `t_red: unbound variable` in resources/util.sh

[08:45:34][Step 3/3] /c/BuildAgent/work/25551df685ba4211/models/resources/util.sh: line 51: t_red: unbound variable
[08:45:34][Step 3/3] /c/BuildAgent/work/25551df685ba4211/models/resources/util.sh: line 52: t_red: unbound variable

Per https://build.palaso.org/viewLog.html?buildId=312250&buildTypeId=Keyman_Models_TestPullRequests&tab=buildLog&_focus=506

feat(nrc.en.mtnt): Add languageUsesCasing flag

This should be a bug on https://github.com/keymanapp/lexical-models/ against the MTNT model:

lexical-models/release/nrc/nrc.en.mtnt/source/nrc.en.mtnt.model.ts

Lines 1 to 5 in 11b70de

    
           const source: LexicalModelSource = { 
        
             format: 'trie-1.0', 
        
             wordBreaker: 'default', 
        
             sources: ['mtnt.tsv'] 
        
           };

Cross-reference with keymanapp/keyman#3824.

The predictive engine will now detect the casing pattern used by the current context (when languageUsesCasing == true)...

Originally posted by @jahorton in keymanapp/keyman#4115 (comment)

[jo.wbl-cyrl-tj.wakhi_cyrillic_minimal] add casing

release/jo/jo.wbl-cyrl-tj.wakhi_cyrillic_minimal

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

Move lexical model compiler to keyman repo

So. A few things:

The merge of the model info should warn if there are mismatching version numbers, and ...
The version number shouldn't be in the .model_info, and ...
The canonical data is always in the source files, not the .model_info, because ...
Lexical models can be built, stored and distributed outside the lexical-models repository and CI chain. Those lexical models won't have a .model_info file, because that is used only for the CI deployment. It is important we don't end up building too many dependencies on the repository structure by accident because it is likely, if we are successful, that this repo will get very large. The intent is for models in this repo to be of high release-level quality and we generally want to discourage experimental models here, because models here can be automatically installed by Keyman. This leads me to conclude that ...
The compiler needs to be moved out of the lexical-models repo and into the keyman repo, and the .model_info management needs to be clearly delineated. Now, the compiler is part of the Keyman Developer toolchain. So I think what we need to do is update the build process to pull the latest stable (or $tier) version of the compiler from downloads.keyman.com (so that's something that needs to be added to the Keyman Developer CI). We could register an NPM package but I don't think we are quite ready to do that; happy to defer to wiser heads though. For consistency, then ...
This same process needs to be done with the keyboards repo -- so we don't include the kmcomp compiler in the repo, rather pull it at first build (or at appropriate times) from downloads.keyman.com.

Originally posted by @mcdurdin in #14 (comment)

[newa] Newa autocomplete not working

Not getting proper suggestions after newa wordlist 2.0 is installed. I get only this suggestion ( 𑐺𑐕𑐣𑑂𑐟 ) when one letter is typed, (regardless of what is typed) and no suggestions when more than one is typed.

However if i build locally and then install, suggestions are shown as expected.

this is the build that works

[LMLayer][Android] Angle quotes not showing up correctly - unicode issue?

I get the following in my keyboard. Note the diamond with question mark characters.

I'm not sure if this is a bug with keyman or an issue in how the .ts file should be done.

My .ts file looks like this:
/*Gilaki wordlist ptwl1 1.0 */

const source: LexicalModelSource = {
format: 'trie-1.0',
wordBreaker: 'default',
sources: ['wordlist.tsv'],
punctuation: {
quotesForKeepSuggestion: {
open: "«", close: "»"
},
...
This is for lexical model: sil.glk-arab.ptwl1

[sil.bcc-latn.upp_ptwl1] add casing

@rmlockwood
release/sil/sil.bcc-latn.upp_ptwl1

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

bug(sil_brao): word picked from the word suggestion also replace the symbols next to the letter

The behavior of the issue (Video).

Details: If a symbol is standing next to the letter that is being predicted for words then when the word is chosen (clicked) the symbols would disappear too.
Symbol: ៗ, (), #, $, @, []...etc.
Please let me know if more information is needed.

Keyman apps

Keyman Developer
Keyman for iOS

Keyboard name

sil_brao

Keyman version

16.0.144-stable (Keyman Developer)
17.0.257-alpha (Keyman for iOS)

Operating system

Windows 10
iOS version 16.3.1

Keyboard version

1.0

Language name

Brao

Additional context

https://keyman.com/keyboards/sil_brao?bcp47=brb-khmr

Relevant issue

#230

[gff.xan.gff_xamtanga] bug: gff.xan.gff_xamtanga.model_info seems to be missing

09:39:09   Uploading /c/BuildAgent/work/e22cfa4d1a6faf97/models/release/gff/gff.xan.gff_xamtanga/
09:39:09   Failed to locate /c/BuildAgent/work/e22cfa4d1a6faf97/models/release/gff/gff.xan.gff_xamtanga/build/gff.xan.gff_xamtanga.model_info
09:39:09   Aborting with error 1

Per https://build.palaso.org/buildConfiguration/Keyman_Models_BuildAndDeploy/384842?buildTab=log&focusLine=2037&logView=flowAware&linesState=1993, https://build.palaso.org/buildConfiguration/Keyman_Models_BuildAndDeploy/384844?buildTab=log&focusLine=0&logView=flowAware&linesState=1681

[benny_lin.id.kamus_indonesia] add casing

@bennylin
release/benny_lin/benny_lin.id.kamus_indonesia

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

[brao] bug: not working as expected when there are at least two characters to the left of the caret

Describe the bug

When the left context is "[space] ឆ្រ", the suggested words in the banner are not expected, i.e. not the words beginning with ឆ្រ-.

To Reproduce

Install this keyboard and lm 'brao_keyboard_and_lm.zip' on 14.0.266-beta
In the text editor area, type in "ឆ្រ ឆ្រ"
Keep the cursor to the right of the second string
See error

Expected behavior

Words beginning with ឆ្រ should be in the suggestion.

Keyman for Android:

Device: Pixel 2 API 29
OS: Android 10
Keyman version: 14.0.266-beta
Target application: Keyman

Keyboard

Keyboard name: Brao (SIL)
Keyboard version: 1.0
Language name: Brao

Additional context

Regular spaces are used in between words, but the model still cannot provide meaningful suggestion when there are a consonant + subscript (i.e. ឆ្រ) to the left of the caret.

The model is able to provide meaningful and expected suggestions when there are "a consonant and a diacritic/vowel" to the left of the caret, i.e. កំ, ឆា.

កំ >> កំឡាំង | កំប្រឹន | កំប្លីង

ឆា >> ឆា | ឆាល់ | ឆារ

bug(sil_jarai): khmer word prediction does not disassociate any letter after a symbol

The behavior of the issue (Video).

Details: The word predicted from typing replaced the letter AND the symbol between them. It seems that the word prediction is only associated with the letter in front of the symbol. Anything that comes after the symbol is associated with the letter in front of the symbol.
Please let me know if more information is needed.

Keyman apps

Keyman Developer

Keyboard name

sil_jarai

Keyman version

16.0.144-stable

Operating system

Windows 10

Keyboard version

1.0

Language name

Jarai

Additional context

https://keyman.com/keyboards/sil_jarai?bcp47=jra-khmr

Relevant issue

#235

[nrc.en.mtnt] Revise wordlist

From a team review of the Keyman for Android UX (keymanapp/keyman#7161)

Aside from the contractions issue noted in #143, @mcdurdin notes the default English lexical-model wordlist needs the following adjustments:

Add common words such as:

Covid (the original wordlist gathered from reddit was pre-covid)
Qantas (airline) (and a number of other brands!)
Coronavirus

Remove these entries (along with any other typos found):

becasue 10
être 6
reccomend 5
sheild 5

Is there any value in keeping single-character entries (e.g. $ 1898)?

[wyc_eth.mym-latn.me_en] add casing

release/wyc_eth/wyc_eth.mym-latn.me_en

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

[nrc.en.mtnt and katelem.ann-latn.getat] Rebuild required due to compiler bug

We will need to update the two models mentioned in keymanapp/keyman#4716 after we get the updated compiler.

Originally posted by @mcdurdin in keymanapp/keyman#4718 (comment)

Should be version 14.0.265-beta or later.

The suggestion become irrelevant with the characters being type when preceded by a quote or a punctuation of that sort

Suggestion when type ឝ alone:

Suggestion when type «ឝ:

Currently the compiler requires `id` to be present in the source `.model_info`

Currently the compiler does require id to be present in the source .model_info which is technically a bug; we should instead pull that from the folder name. I will open an issue.

This is a problem because:

The spec says that id is not required.
It's WETter than it needs to be.

Originally posted by @mcdurdin in #7 (comment)

What is the purpose of the .kps file?

What purpose does the .kps file serve separate from the .model_info file? The only think I figure is that the KMP compiler needs it as an argument :/

Can the .kps file be automatically generated? Having to change things in multiple locations (model.ts, .model_info, .kps) is tedious and error prone.

bug(nrc.en.mtnt): `we're` not offered as a suggestion for `were`

I think this still needs to remain open for @jahorton to confirm this is resolved:

With a context of "were", I get 3 suggestions:
"were" (frequency 5309)
"weren't" (frequency 385)
"werent" (frequency 16)

"we're" has a frequency of 927 so I would expect it to appear

On my alpha build of Keyman for Android, I'm still not getting "we're" as a suggestion.

Originally posted by @darcywong00 in #143 (comment)

Reproduced in Keyman 17.0.219-alpha

bug(sil_jarai): khmer word prediction replacing the symbols with the word predicted

The behavior of the issue (Video).

Details: The word predicted from typing replaced the letter AND the symbols in front of the letter.
Please let me know if more information is needed.

Keyman apps

Keyman for iPhone and iPad

Keyboard name

sil_jarai

Keyman version

17.0.254-alpha

Operating system

iOS 16.3.1

Device

iPhone 11 Pro Max

Keyboard version

1.0

Language name

Jarai

Additional context

https://keyman.com/keyboards/sil_jarai?bcp47=jra-khmr

Relevant issue

#236

"welcome.htm" not shown after installation finishes

Is it intentional that the welcome.htm is not shown after the installation of the LM is finished?

The welcome page does show up when the installation of a keyboard is finished though.

[meta] Tidy up build script to be able to use `set -u`

From #54 (review), we should be able to set -u in build.sh, but currently there are some rough edges preventing that.

bug(gff.byn.gff_blin): Blin Dictionary Downloads But Does Not Load

Describe the bug

The new gff.byn.gff_blin lexicon does not get setup to work with the gff_blin keyboard. The lexicon is discovered as an available dictionary, but after retrieval, does not appear as installed for the language and the words are not offered for selection. The keyboard continues to work as if there is no connected dictionary.

Reproduce the bug

Keyman > Installed languages > Bilin > Dictionary (Check for available dictionary)
The messages appear: "Checking for associated...." , "Downloading dictionary..." , "Dictionary download is finished" , "Resources successfully updated!"
The "Check for available dictionary" text remains, it is not replaced with the dictionary name.
Going into the Keyman app editor, or in local editors, and launching the Blin keyboard, the dictionary is not loaded, the top of the screen with the word options does not appear.
I've restarted Keyman, and restarted my phone, but no change. Other keyboards work fine.
Steps (1) and (2) are repeated, and the results are identical.

Expected behavior

Following the "Resources successfully updated!" message, the Blin keyboard would begin to offer the terminology from the dictionary for selection.

Related issues

No response

Keyman apps

Keyman version

16.0.138

Operating system

Android 13

Device

Samsung S22

Target application

Keyman App, any editor

Browser

No response

Keyboard name

gff_blin

Keyboard version

1.5.1

Language name

Blin

Additional context

This issue could be related to BCP 47 code processing. Both the keyboard and lexicon use language ID "byn-Ethi" , this is important because there is also a Latin convention for writing Blin endorsed by the Eritrean government (thus "byn-Latn" is possible).

The lexicon file name does not include the -Ethi part, it is gff.byn.gff_blin. Still, it is odd that the dictionary is located, and downloaded, it just does not get associated with the keyboard afterwards.

[bennylin.jv-latn.bausastra_jawa] add casing

@bennylin
release/bennylin/bennylin.jv-latn.bausastra_jawa

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

bug: khmer model custom wordbreaker issues

Describe the bug

The crash happened after this activity was done. See the crash in action:

predictive.text.crashes.mov

Reproduce the bug

No response

Expected behavior

No response

Related issues

No response

Keyman apps

Keyman version

17.0.104-alpha

Operating system

iOS 16.4

Device

iPhone Pro Max Simulator

Target application

No response

Browser

No response

Keyboard name

sil_jarai

Keyboard version

1.0

Language name

Jarai

Additional context

https://keyman.com/keyboards/sil_jarai?bcp47=jra-khmr

[sil_kmhmu] LM only matches the last character being typed

From the get go, I see that the prediction only match the last character being typed rather than a continuous string coming before it, i.e. When one type ເ, the model tries to match words beginning with that character, but then when the next character (ຄ) is typed, the model now tries to match word beginning with ຄ, not the combination of the two (ເຄ), so one may not be able to get the suggestion for words like ເຄືອນ at all. Talk to me if the description not understandable.

The lexical model package: https://drive.google.com/file/d/1Gsz6U5Ww45AjWbfiilLdmz7qYeg0mnKz/view?usp=sharing

The associated keyboard: https://keyman.com/keyboards/sil_kmhmu

Use kmlmc and kmlmp in the build script

Blocked by:

Package compilation improperly bundles resources

While the individual resources for each model are properly built at present, the .model.kmp package file itself is not. The issue: both welcome.htm and the *.js file with compiled model code within the .model.kmp are instead relative paths to the individual build products, rather than copies of the actual files themselves.

Upon trying to actually utilize the .kmps for development work in our apps, we get something like the following:

As you might imagine, this rapidly generates errors and results in an unusable model.

[cjp-latn] update model for automatic case selection

[gonzalez_quint_coto.cjp-latn.cabecar]
Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enables use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

feat: build with kmc instead of kmlmc

Part of the 17.0-alpha work. @mcdurdin will work on this.

TODO:

We need to verify against the older kmlmp output for packages.
Add LICENSE.md to each package (#225)
Create a filelist and build it with a single command for performance (#226)

Compiler needs pre-emptive refactoring

As noted in discussion on #27:

Clocking in at over 150 lines, LexicalModelCompiler.compile() is definitely showing symptoms of long method.

Also, I think the switch statement may be more maintainable as a polymorphic call to a "model builder" or something like that, that knows how to produce code for a specific model.

chore: Updates for Keyman 15.0

I was taking a prelimiary look that this repo will compile with Keyman 15 by setting the following changes in package.json

  "dependencies": {
    "@keymanapp/lexical-model-compiler": "^15.0.247-beta",
    "@keymanapp/models-types": "^15.0.247-beta",
  },
  "devDependencies": {
    "jszip": "^3.7.0",
  }

I had to add jszip in devDependencies (because of keymanapp/keyman#5770 ?)

The build.sh script halts on fv.bea.tsaadane.model.js

The target folder contains unexpected files:
fv.bea.tsaadane.model_info source source/fv.bea.tsaadane.model.js source/fv.bea.tsaadane.model.kmp
Aborting build
Aborting with error 999

Several of the lexical-models had warnings about duplicate words being found, but the build went to completion.
The duplicated words may not be properly encoded though?

For example:

Building model /c/src/lexical-models/release/shavian_info/shavian_info.en-shaw.readlex/
wordlist.tsv (10): Warning: 2802 duplicate word ΓÇ£≡ÉææΓÇ¥ found in same file; summing counts
wordlist.tsv (35): Warning: 2802 duplicate word ΓÇ£≡Éæ₧≡Éæ¿≡ÉææΓÇ¥ found in same file; summing counts

[jo.isk-cyrl-tj.ishkashimi_cyrillic_minimal_model] add casing

release/jo/jo.isk-cyrl-tj.ishkashimi_cyrillic_minimal_model

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

Build default lexical models from Unicode unilex data

https://github.com/unicode-org/unilex/tree/master/data/frequency

I reckon we could get a long way with default models. Not perfect for all languages but maybe a decent base for others to work on.

Also they appear to be TSV files -- just need to strip off a line or two at the start!

bug(sil_bunong): word picked from the word suggestion also replace the symbols next to the letter

The behavior of the issue (Video).

Details: If a symbol is standing next to the letter that is being predicted for words then when the word is chosen (clicked) the symbols would disappear too.
Symbol: ៗ, (), #, $, @, []...etc.
Please let me know if more information is needed.

Keyman apps

Keyman Developer
Keyman for iOS

Keyboard name

sil_bunong

Keyman version

16.0.144-stable (Keyman Developer)
17.0.257-alpha (Keyman for iOS)

Operating system

Windows 10
iOS version 16.3.1

Keyboard version

1.5

Language name

Bunong

The lexical-model compiler update in the fix keymanapp/keyman#7297 forces ES3 code generation to support Android 5.0 devices.

this will require version bumps on all affected LMs in order to deploy an updated version with the new version of the compiler once it lands.)

The fix is available in kmlmc 16.0.128-beta

And the following lexical-models will need to be rebuilt:

dotland.ru.russian
iles.chp.indigenous_nt
iles.dgr.indigenous_nt
sil.bcc-arab.upp_ptwl1
sil.glk-arab.ptwl1
sil.kjg-laoo.ptwl1
wyc_eth.mym-latn.me_en

"If omitted, builds all models" is untrue for -b and -t

The usage statement ends with "If omitted, builds all models" (speaking of the optional target argument).
This is true for a bare ./build.sh or for ./build.sh -c, but fails for any combination (that I've tried) that includes either -t or -b.
For example:
$ ./build.sh -c -b
Usage: ./build.sh [-t(est)|-b(uild)|-c(lean)] [-s] [target]
-t || -test Runs tests on models
-b || -build Creates compiled models
-c || -clean Cleans intermediate and output files
-s Quiet build
target The specific model(s) to build, e.g. release or release/example/en.template
If omitted, builds all models

Rename or move example template models

I think as we head to beta we should be changing our example template models to use a custom BCP47 tag such as qaa so that they don't appear as options to end users, esp as we have an English model now. We definitely need to do this for release. That means a bunch of renaming though!

I would also like to see the English model move out of 'example' and into something real.

An alternative is to move the example models out of release/ into a sample/ folder. Perhaps that's better?

If we do it in alpha, we avoid the need to have backward compat and can just do cleanup on the downloads server. Once we go to beta, we have to deprecate and have a deprecation pathway.

Himyarit Musnad

Dear all I am build lexicon for used with himyarit Musnad keyboard. I didn't have idea how can add fills her to you can you help@mcdurdin

[Meta] Rename folders from e.g. nrc/str.sencoten to nrc/nrc.str.sencoten

For ease of dealing with folder names, and consistency with the keyboards repo and the downloads.keyman.com folder names, I'd like to rename the model folders to match the model id in its entirety. This simplifies some processes that then don't need to parse the model id in order to derive the folder name.

A cursory look at this suggests that this is a build environment change only, with no other significant side-effects.

Change name of repo to keymanapp/dictionaries

Obviously, this incurs MASSIVE changes to e.g., the API, and the CI, but—we've been claiming that users will be installing dictionaries and not lexical models. Just like users were able to download and publish keyboards from keymanapp/keyboards, they should be able to download and publish dictionaries from keymanapp/dictionaries.

@mcdurdin @darcywong00 @jahorton, your comments are very welcome!

[ptg.nan-latn-tw.taigipoj] add casing

@shiami
experimental/ptg/ptg.nan-latn-tw.taigipoj

Keyman version 14 has added the possibility for automatic case selection in predictive text models.
This only applies to languages with upper/lower case distinctions (Latin and Cyrillic scripts, for example).
Not only is Keyman Developer 14 required, but there needs to be a change in the lexical model source file.
There's a new property for lexical model source files that must be set in order for automatic casing to work.

    languageUsesCasing: true

It's set in .ts file, in the same place as the format, wordBreaker and sources properties.
For example, the existing file might look like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
};
export default source;

And, with the addition of the new property, like:

const source: LexicalModelSource = {
  format: 'trie-1.0',
  wordBreaker: 'default',
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
};
export default source;

This will turn on the possibility for case differentiation and use the default configuration.
Most likely this default operation will be all you need. In that case you don't need any customization.
If you do need to control how capitalization works, please consult the discussion in keymanapp/keyman#3720 "Example for Turkish".

In addition, you'll need to change the version number and (probably) the copyright date, which will require you to update some other files. The Keyman team is looking at how to reduce the number of changes needed, but for now here's what's needed:

(1) HISTORY.md will need a new entry with the new version number and the date of the change, something like:

1.1 (2021-01-31)
----------------
* Enable use of Keyman 14's case-detection & capitalization modeling features

Normally entries in this file are ordered with the latest date at the top of the list.

(3) LICENSE.md will need the same copyright change as used in README.md.

(4) The version number needs to be changed in the .kps file. In Keyman Developer, use "Packaging" to get to the .kps file, then on the "Details" tab update the version number and (if needed) the copyright statement.

(5) If you have a copyright statement in a "readme.htm" or a "welcome.htm" file, this will need to be updated with the same copyright change used in README.md. (Since these files are covered by the copyright statement in LICENSE.md, you are free to omit the copyright statement from the individual files, which can make for less work when updating the model.)

	const source: LexicalModelSource = {
	format: 'trie-1.0',
	wordBreaker: 'default',
	sources: ['mtnt.tsv']
	};

keymanapp / lexical-models Goto Github PK

lexical-models's Introduction

Open Source Keyman lexical models

File Layout

Building Models

Preqrequisites

Build instructions

lexical-models's People

Contributors

Stargazers

Watchers

Forkers

lexical-models's Issues

Configuration

Keyman apps

Keyboard name

Keyman version

Operating system

Keyboard version

Language name

Additional context

Relevant issue

Keyman apps

Keyboard name

Keyman version

Operating system

Keyboard version

Language name

Additional context

Relevant issue

Add common words such as:

Remove these entries (along with any other typos found):

Keyman apps

Keyboard name

Keyman version

Operating system

Device

Keyboard version

Language name

Additional context

Relevant issue

Describe the bug

Reproduce the bug

Expected behavior

Related issues

Keyman apps

Keyman version

Operating system

Device

Target application

Browser

Keyboard name

Keyboard version

Language name

Additional context

Describe the bug

Reproduce the bug

Expected behavior

Related issues

Keyman apps

Keyman version

Operating system

Device

Target application

Browser

Keyboard name

Keyboard version

Language name

Additional context

Keyman apps

Keyboard name

Keyman version

Operating system

Keyboard version

Language name

Additional context

Relevant issue

Recommend Projects

Recommend Topics

Recommend Org