jzohrab / lute Goto Github PK

DEPRECATED: LUTE (Learning Using Texts) is a self-hosted web app for learning language through reading, based on Learning with Texts (LWT)

License: The Unlicense

Shell 0.69% PHP 81.07% CSS 3.07% JavaScript 3.92% Twig 11.10% Dockerfile 0.14%

deprecated obsolete

lute's People

Contributors

Stargazers

Watchers

Forkers

hugofara makeprojectgreatagain adamhebby 16amattice stnetwork rusiano iamgilgunderson diogojorge1401 disfated maxdailene vinhtam1996

lute's Issues

Giving users the option to choose whether or not to use the book feature

Is your feature request related to a problem? Please describe.

I suggest giving users the option to choose whether or not to use the book feature when they import text. eg. lyric

Describe the solution you'd like

The default setting could be to not use the book feature, and users can click on ⚡️ to activate it when they decide to do so.

Describe alternatives you've considered

When importing a TXT file, the input will be treated as a book feature. When importing via copy and paste in the Text column, the input will be treated as a single page but keep the ⚡️ option.

Additional context

There is a dirty way currently. Create an one line text first then update the text, you will see the whole text and ⚡ for book feature if you want to do so️ in the future.

Help Dockerizing Lute

Hello all, if you're reading this, perhaps you have Docker experience and can help Dockerize Lute.

I've had a few people ask about having Lute Dockerized. It's been a loooong time since I've hacked on Docker, and I don't have the time or energy to spend fiddling with it. Perhaps someone is looking for a Docker project to try out, and would be willing to contribute.

Below is some info, LMK if you need more. This is a free project, I'm not looking to make money from it, and so I can't pay anyone. Hopefully this doesn't make you feel you're being taken advantage of ... (as a dev, I used to always feel that certain types of ppl were leeching off me) ... my limited dev time is better spent working on Lute itself.

Cheers and best wishes!

Jeff

Specs for Lute:

needs Apache (due to using Apache rewrites for URLs)
PHP 8.1 (and maybe later)
MySQL 8.0.31

More notes:

Lute users install and run Lute on their own machines. I'm not running a business, I just provide this free tool for language learners to use. So, this docker installation is for people to run on their own machines ... and perhaps it would be nice to have a Docker development environment as well. Your input appreciated. Users would be expected to have the necessary bits for Docker on their machines (Docker desktop or similar, docker compose, etc)
The project uses Symfony, a PHP framework, and php composer to pull in all of the packages. I am not sure if the code and framework should be built directly into the container, or if they should be mounted from an external directory ... your input appreciated.
The database migrations are managed automagically by the project -- Lute is given root access to the MySQL so that it can create tables, etc.
The project needs a single configuration file to describe the database connection params. Maybe these can be passed in as env params to the container -- or perhaps just the file is somehow passed in?
The project also stores some downloaded images in a /public/usermedia directory, so those would need to be ... mounted? Along with the db?
Users should be able to update their Lute by running the latest container (at which point the db migrations are automatically applied), or perhaps replacing the mounted code directory with newer code (and then somehow killing the cache in the container ... might get annoying)
The project build is single-click. CI is running locally and on GitHub.

Creating a registry for the app image in docker hub is possible?

I think creating a registry in docker hub would make it easier to install and update the app as new versions are released.

What do you guys thinks about it? Is it hard to manage?

Backup files with timestamp and rolling settings

Is your feature request related to a problem? Please describe.

In the V2 beta version, there is currently only one backup file available, and the manual files are also rolled by the automatic backup.

Describe the solution you'd like

It would be helpful to create backup files with timestamps and keep all manual backups because there are instances where people prefer manual backups. It would also be great if those backup files could be managed in Lute.

Additionally, we might consider keeping 2-3 auto backup files, and include one file from the end of the previous month.

Describe alternatives you've considered

Add timestamps feature and move backup files manually.

Additional context

Add some way to auto-lemmatize (auto-assign parents) a book.

Summary

The current method of defining all terms can be streamlined by some form of "lemmatization", i.e., finding root terms of words.

Currently, Lute treats every word as different: eg, "blancas" and "blancos" are different, though both have the same parent term "blanco", as are "escribo" and "escribieron", though both are forms of the verb "escribir." When I first started out, I didn't mind having to manually make all of these mappings, but as I progress, I feel that's a hassle. I often want to have the parent images available for the child terms, just for my own enjoyment.

It would be nice to have an "auto-lemmatize" feature that can take a given text or book, and automatically map terms to existing parents.

Currently, the only functionality around parent terms, but a significant one in my experience, is the ability to see a bunch of sentences for a term when looking at the references. Eg. for me, the term "albergado" is linked to the parent term "albergar", and when I click on the "sentences" link of "albergado" I get an extensive list of sentences with albergar, albergaba, albergó, albergado, etc etc, which is great b/c I can see the term in my readings. In the future, I can also see this being useful for something like "create Anki cards for only parent terms, with examples of child terms" etc..

First iteration: create a mapping file outside of Lute, then import.

This iteration would be good enough for me, at present!

Lemmatizing could, at first, be handled outside of Lute, using a tool like spaCy. This could generate a mapping file of terms in a given text/book, child -> parent. See code below.
The resulting file could be imported into Lute, and mappings done. New children (status = unknown) could be created and auto-assigned to the parent, with the same status as the parent (or with status = 1, maybe).
Potentially, new parents could also be made ... but that gets into new term creation, which I'm really not sure how much I want to get into!
The lemmatization could also be applied after-the-fact to existing terms, but then things might get weird with people creating terms with a given status being mapped to parents with different status ... not sure! For the first iteration, it could just work when importing a new book, perhaps.

Sample code using spacy-stanza

This only finds lemma that are different than the original term.

import stanza
import spacy_stanza

# Download the stanza model if necessary
# print("downloading model ...");
# stanza.download("es")

# Initialize the pipeline
nlp = spacy_stanza.load_pipeline("es")

text = """
Los acomodé contra las paredes, pensando en la comodidad y no en la estética.
"""

# with nlp.select_pipes(enable=['tok2vec', 'tagger', 'attribute_ruler', 'lemmatizer']):
doc = nlp(text)

# for token in doc:
#     print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
# print(doc.ents)
lemmatized = [ token for token in doc if token.text != token.lemma_ ];
for token in lemmatized:
    print(token.text, token.lemma_)

Run with python3 -W ignore ex.py (when all dependencies are installed in a python venv):

Output:

Los él
acomodé acomodar
las el
paredes pared
pensando pensar
la el
la el

The lemmatizing code takes a while to load due to the extensive data, but that's ok. If people run the process outside of Lute, they'll understand the processing needs. And this is a first-pass idea anyway.

This data could be loaded into a file and then passed back to Lute for magic processing.

ref code links for spacy

Future iterations

Obviously, having Lute manage this would be great, but it implies a full installation of some form of Python and spaCy or similar. This could be done with Docker containers too, managed by compose.

I don't think this would need a constantly running server for the lemma process, it could just run a "docker command" style microcontainer that just processes some input (list of terms) and returns the mapping.

However, possibly in the future it would be nice to do the lemmatization on-the-fly, which would need some kind of REST API server running. This might require a bunch of config though, to get the corpus(es) necessary for users with their specific languages.

Error when add text to Lute

Description

Brief description of bug. Include copy-paste of error message details, or table of error data, if available.

To Reproduce

Steps to reproduce the behavior, e.g.:

Go to '...'
Click on '...'
Scroll down to '...'
See error

Screenshots

If it will be helpful, add screenshots.

Extra software info, if not already included in the Description:

OS (e.g., iOS, windows):
Browser (e.g., chrome, safari):
Web server (e.g., regular apache/php, MAMP, XAMP...):
Version (git commit):

Add basic Term bulk import

For people coming to Lute from other systems, or for some learning materials, it would be nice to have a bulk CSV import of terms. This could also be used to help bulk translations for imported materials (even though I don't use Lute that way myself).

Support multiple parents

Sometimes words have more than one parent, depending on context.

Spanish example: in "Él se siente mejor," the verb is "sentirse", to feel. But in the below extract, it's "sentarse", to sit down:

Yo hice ademán de tomar asiento. —¿Quién le ha dicho que se siente? —murmuró don Basilio

Czech example (from Mycheze in Discord):

hoře is a a declension of hora the regular form of hoře and a conjugation of hořet. Two nouns and a verb all use the same "word." And their meanings aren't even close. First one's mountain, second's is sorrow or grief and the verb is to feel love.

App design

Things to consider

would a word with multiple parents have the same status for each parent? e.g. would "se siente" have the same status, even though it's really two different words? I think that should be good enough -- doesn't occur often enough to make a big difference, and gets very messy/impossible otherwise.
when presenting the mouseover pop-up, presumably just list both parents, one after the other, with all of their extra data.
when showing term references, will need to include multiple parents and all the children - perhaps separated by parent, so that the meanings aren't interleaved

UX

For users, I think using "tags" as the parents would be easiest, allowing spaces in the tags, and making an ajax call to get the current list of words. I like the way that Lute currently shows the parent's definition in the dropdown:

I'm not sure how to replicate that with tags.

Consider using Sqlite instead of Mysql

Currently Lute using MySQL, which is potentially a bit heavyweight. Other tools like VocabSieve and Anki use Sqlite and it works fine for them.

If Lute used Sqlite, users wouldn't have to install and configure MySQL, which is pretty heavyweight. Install would be simplified to the following:

install PHP
get the Lute code

They should be able to use the built-in PHP web server from the public folder with $ php -S localhost:8000. The project could be initially config'd to run the db file from an internal folder and file (saved within the project directory). Then users could modify the .env.local file, with clear instructions on a wiki page. It should suffice, performance-wise.

(Of course, if they want, they can install the Symfony CLI (https://symfony.com/download) for a lightweight web server, and run it from the app folder with $ symfony server:start, or set up MAMP or XAMPP or whatever, but those would be overkill).

Sample branch

Branch sqlite_work_in_progress pushed to this repo has a few things done already. The unit tests all pass. The code appears to work, but I haven't tested it with any real volume of data, so I don't yet know if it's performant.

Remaining todos

~~determine how to port data from existing MySQL installations to the Sqlite file.~~ - data will be ported by CSV export and import.
DB backup would just be a copy of the sqlite file to some folder
how to handle new migrations? Still use the _migrations table idea, or simplify it further?
have to baseline the latest schema to a new .sqlite file and commit that to the repo, using that as the baseline db. And stash/delete all existing migrations so they don't get re-applied
the code has some "TODO:sqlite" comments. use dev:find to find them

creating sqlite baseline

apply the field/key changes listed below to the sql db (with a migration, probably)
fix the trigger
run process to export the mysql db to the baseline .sqlite db (committed in the repo)

Can keep creating baselines as needed

porting data - test cases

If importing, the Sqlite DB must be empty. The export from the Mysql db is just raw data, including primary keys. If the tables contain other data, all keys will get messed up.
importing: bad fields should throw an error

misc notes accumulated during hacking:

exporting the schema

**** reset
rm var/data/test.sqlite

# using https://github.com/techouse/mysql-to-sqlite3
mysql2sqlite -f var/data/test.sqlite -d test_lute -u root --mysql-password root -W

fixing col types


**** fix col types so that conversion to sqlite sets up primary key as autoincrement

alter table languages modify column LgID INTEGER NOT NULL AUTO_INCREMENT;
alter table books modify column BkID INTEGER NOT NULL AUTO_INCREMENT;
alter table booktags modify column BtBkID INTEGER NOT NULL;
alter table booktags modify column BtT2ID INTEGER NOT NULL;
alter table sentences modify column SeID INTEGER NOT NULL AUTO_INCREMENT;
alter table statuses modify column StID INTEGER NOT NULL;
alter table tags modify column TgID INTEGER NOT NULL AUTO_INCREMENT;
alter table tags2 modify column T2ID INTEGER NOT NULL AUTO_INCREMENT;
alter table texts modify column TxID INTEGER NOT NULL AUTO_INCREMENT;
alter table texttags modify column TtTxID INTEGER NOT NULL;
alter table texttags modify column TtT2ID INTEGER NOT NULL;
alter table texttokens modify column TokTxID INTEGER NOT NULL;
alter table wordimages modify column WiID INTEGER NOT NULL AUTO_INCREMENT;
alter table wordparents modify column WpWoID INTEGER NOT NULL;
alter table wordparents modify column WpParentWoID INTEGER NOT NULL;
alter table words modify column WoID INTEGER NOT NULL AUTO_INCREMENT;
alter table words modify column WoLgID INTEGER NOT NULL;
alter table wordtags modify column WtWoID INTEGER NOT NULL;
alter table wordtags modify column WtTgID INTEGER NOT NULL;

**** fix trigger -- committed in rep. migrations

misc other notes

**** weird 'like' vs '=' issue in sqlite
ref Ref https://stackoverflow.com/questions/26719948/sqlite-why-select-like-works-and-equals-does-not
in SpaceDelimitedParser_IntTest.php

**** TODO db filename in settings
**** fix migration thing
***** commit a baseline _empty_ db to the reop
***** migration helper thing should copy the baseline

Combine "parent term mapping" and "term import"

Currently, "parent term mapping" is handled at "Import Parent Term mapping", and "term import" at "Import Terms". From a user's perspective, there's no real need for a difference, they're both just file imports. The parent term mapping takes a "parent [tab] child" mapping file, and the term import takes a CSV with several columns.

The Term Import could take a CSV with variable number of fields. Minimum required fields: language, term. Optional fields would be the rest. So, a parent mapping file could contain something like

language,term,parent
Spanish,gatos,gato

and the full import could have the whole thing.

Japanese ctrl-c not respecting paragraphs

Sample text showing issue:

１　とばされた家

ドロシーは、ヘンリーおじさんとエムおばさんと、三人でくらしていました。家は小さくて、部屋は一つだけですが、地下室がありました。三人が住むカンザスでは、よくたつまきが起こるのです。地下室ににげこめば、たつまきから身を守ることができました。
　牛の世話をしたり、畑をたがやしたり、おじさんもおばさんも、いそがしくはたらいています。二人とも、ドロシーと遊んだり話したりするひまはありません。ドロシーはいつもひとりでした。家のまわりは草原で、友だちもいません。
「ぼくがいるよ、ワンワン」というように、ドロシーにとびついて走りまわるのは、小さくて黒い犬のトトです。ドロシーはトトが大すきで、トトもドロシーが大すきでした。朝からばんまでいっしょで、ドロシーがベッドに入ると、トトももぐりこんでくるのです。
「たつまきが近づいているぞ。」
ある日、空を見たおじさんがいいました。空は、どこまでも暗いはい色です。
「牛たちのようすを見てやらなければ。」
　おじさんは牛小屋に走っていき、おばさんの声がひびきました。
「ドロシー、急いで地下室に入りなさい。」
　おばさんは地下へ下りていきます。

Hover over "ドロシー", Shift + c, and the whole page is highlighted. Cry bitter tears.

HTTP Basic Authentication for security

Is your feature request related to a problem? Please describe.
For people who want to have a simple login feature.
! important ! It can only stop regular people to mess up your database, b/c the password is plaintext.
Maybe someone can help me to set/hash the password. For now, it's fine for me.

Describe the solution you'd like
Just use HTTP Basic Authentication

Add below lines in .env file and change USERNAME as well as PASSWORD, both default values are lute

# For login Lute
# You cannot use log out with the HTTP basic authenticator.
# Even if you log out from Symfony, your browser "remembers" your
#credentials and will send them on every request.
# -------------------
LUTE_USER_USERNAME=lute
LUTE_USER_PASSWORD=lute

Replace all content in ./config/packages/security.yaml as below

security:
    # https://symfony.com/doc/current/security.html#registering-the-user-hashing-passwords
    password_hashers:
        # Uncomment below 1 lines to restore orginal setting
        # Symfony\Component\Security\Core\User\PasswordAuthenticatedUserInterface: 'auto'
        
        # Uncomment below 1 line to use login feature (Http Basic Access)
        Symfony\Component\Security\Core\User\InMemoryUser: plaintext
    # https://symfony.com/doc/current/security.html#loading-the-user-the-user-provider
    providers:
        # Uncomment below 1 lines to restore orginal setting
        # users_in_memory: { memory: null }

        # Uncomment below 4 lines to use login feature (Http Basic Access)
        users_in_memory:
            memory:
                users:
                    '%env(LUTE_USER_USERNAME)%': {password: '%env(LUTE_USER_PASSWORD)%', roles: ['ROLE_USER']}

    firewalls:
        dev:
            pattern: ^/(_(profiler|wdt)|css|images|js)/
            security: false

        # TURNING OFF SECURITY FOR PROD.
        # Yes, this looks bad, but Lute is designed to run locally only.
        # There are _no security checks_.
        
        # Uncomment below 3 lines to restore orginal setting
        # prod:
        #     pattern: ^/
        #     security: false
        
        # Uncomment below 4 lines to use login feature (Http Basic Access)
        main:
            lazy: true
            provider: users_in_memory
            http_basic:
                realm: Secured Area

            # activate different ways to authenticate
            # https://symfony.com/doc/current/security.html#the-firewall

            # https://symfony.com/doc/current/security/impersonating_user.html
            # switch_user: true

    # Easy way to control access for large sections of your site
    # Note: Only the *first* access control that matches will be used
    access_control:
        # Uncomment below 1 line to use login feature (Http Basic Access)
         - { path: ^/, roles: ROLE_USER }

        # - { path: ^/admin, roles: ROLE_ADMIN }
        # - { path: ^/profile, roles: ROLE_USER }

when@test:
    security:
        password_hashers:
            # By default, password hashers are resource intensive and take time. This is
            # important to generate secure password hashes. In tests however, secure hashes
            # are not important, waste resources and increase test times. The following
            # reduces the work factor to the lowest possible values.
            Symfony\Component\Security\Core\User\PasswordAuthenticatedUserInterface:
                algorithm: auto
                cost: 4 # Lowest possible value for bcrypt
                time_cost: 3 # Lowest possible value for argon
                memory_cost: 10 # Lowest possible value for argon

There is no log out button, so you might use incognito window for every time login.
For docker user, if you want to change user/password after running docker
4.1 run docker compose stop
4.2 amend user/password you want in .env
4.3 run docker compose up

Additional context

CSS in .env file not working?

Both YeYueMX and King-Awgwa have reported this. I thought it was working, but maybe not!

Missing る in texts

From discord:

Sentences are missing for archived texts, which messes up references.

While reading today, I had a word showing up as "known", but I couldn't find a reference for it (when I clicked "sentences"). Turns out a book had been archived, and the sentences had been removed. This is probably a relic of prior code that used to wipe sentences.

This will likely be an expensive operation (will have to re-parse any texts that don't have sentences), so maybe have a special section for one-time jobs, and only show this if there are texts that don't have sentences.

Will this blow up the size of the DB?

Fix DB foreign key integrity - cascade updates and deletes

Currently, the DB doesn't really enforce referential integrity, which could result in weird behaviour. Better get that under control.

e.g., the "words" table has PK "WoID", which is referenced in various tables. Currently, when hacking at the db with straight SQL, it appears that deletes from the "words" table aren't cascaded to child tables, which is bad -- eg. a wordparents record may refer to something that has been deleted, or, worse, was deleted and then replaced with something new.

Deletes in parent tables should cascade to child tables. I believe that the Doctrine model is correctly removing things from dependent tables when parents are removed (unit tests are covering that), but it is better to be safe than sorry.

Todo:

fix the FKs, cascade update and delete
Add tests for these things just to be complete.
tests to ensure FKs are enforced

List of FKs to fix -- there may be others, but this is a good start.

CREATE TABLE IF NOT EXISTS "books" (
	FOREIGN KEY("BkLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "bookstats" (
	FOREIGN KEY("BkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "booktags" (
	FOREIGN KEY("BtT2ID") REFERENCES "tags2" ("T2ID") ON UPDATE NO ACTION ON DELETE NO ACTION,
	FOREIGN KEY("BtBkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "sentences" (
	FOREIGN KEY("SeTxID") REFERENCES "texts" ("TxID") ON UPDATE NO ACTION ON DELETE NO ACTION,
	FOREIGN KEY("SeLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "texts" (
	FOREIGN KEY("TxBkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE NO ACTION,
	FOREIGN KEY("TxLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "texttags" (
	FOREIGN KEY("TtTxID") REFERENCES "texts" ("TxID") ON UPDATE NO ACTION ON DELETE NO ACTION,
	FOREIGN KEY("TtT2ID") REFERENCES "tags2" ("T2ID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "texttokens" (
	FOREIGN KEY("TokTxID") REFERENCES "texts" ("TxID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "wordimages" (
	FOREIGN KEY("WiWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "wordparents" (
	FOREIGN KEY("WpParentWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION,
	FOREIGN KEY("WpWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "wordtags" (
	FOREIGN KEY("WtWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION,
	FOREIGN KEY("WtTgID") REFERENCES "tags" ("TgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE wordflashmessages (
  FOREIGN KEY("WfWoID") REFERENCES "words" ("WoID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE IF NOT EXISTS "words" (
	FOREIGN KEY("WoLgID") REFERENCES "languages" ("LgID") ON UPDATE NO ACTION ON DELETE NO ACTION
);

With Sqlite have to follow a painful process: https://www.sqlite.org/lang_altertable.html#otheralter

Setting term as its own (new) parent results in integrity constraint error

Description

Reading, and accidentally set the (new) parent for the term "cordillera" as "cordillera", resulting in error.

src/domain/dictionary->add should check if the parent is the term, and prevent setting the term as its own parent!

Add a unit test to that effect.

Clearer/simpler installation docs

I think the docs are a bit screwy, more than one person has gotten lost in a few places. Can simplify (at least for Docker).

Feedback from MyCheze in discord: "Many instructions say "Go to this page and do what it says" and then the page was like 1 instruction. It could have all been on the first page to streamline the process. ... I needed to edit the .env file to say BACKUP_HOST_DIR since it originally says BACKUP_DIR and Docker would fail. That was the hardest thing to find."

So, fix the wiki, and then the main page README.

Multi-words display error

Description

Multi-words display error.

To Reproduce

Steps to reproduce the behavior, e.g.:

Create new text as below

The first was a new U.S. Department of Energy (DOE) report, which has not been made public.

Click word DOE, input Department of Energy as parent then save it
See error, multi-words will not display correctly

Solution

Go Terms
Search multi-words (here, Department of Energy)
Click multi-words (here, Department of Energy)
Click Update
Go back to text, you can see multi-words display correctly

Extra software info, if not already included in the Description:

OS (e.g., iOS, windows): macOS
Browser (e.g., chrome, safari): brave
Web server (e.g., regular apache/php, MAMP, XAMP...): MAMP
Version (git commit): 1.1.3

Preserve term case for German nouns

Currently Lute downcases all clicked words. e.g., clicking on "Futter" opens a form with the term "futter". German uses caps for nouns, so it would be good if the initial caps could be preserved for German nouns, e.g.:

The above is from branch spike_preserve_caps pushed to this repo.

This feature of "preserving case" is really only needed for German, because that's the only lang that uses capitalization to indicate something special in the sentence ... but then again, in other languages, caps could be used for proper names etc like "Jesus Christ" or "Mexico".

To-do:

ensure setting parent dropdown works (e.g. parent case is correct in autocomplete, and on popup)
parsing works with same term in upper and lowercase
see how term listing table is impacted
bulk setting parent works with proper case

No other language needs case preservation -- and preserving case could be annoying (???? no idea). So, we might want a "preserve case" flag on the language -- annoying. Optionally, we could let users downcase the word somehow on term creation.

Import fails, "missing field language" even though it's there.

File attached, language = Chinese.

HSK_wordlist_take_2.csv

Let me edit Book data after the book is created

Currently, that's not possible.

Allow changes:

title
tags
language, sure why not
content -- if not changed, keep existing, else update.

Bust javascript cache on release

Currently, user browsers may cache public/js/lute.js, so when people update they don't necessarily get the latest code. Massively annoying for all.

The release process could easily do something like the following, as a very hacky but workable workaround:

rename the file lute.js to lute_<somedatetime>.js
change the reference to lute.js to match the screwy filename

This is all terribly hacky, of course, and the right thing to do would be something that I know nothing about, like WebPack. That belongs in a separate ticket; in the meantime, hacky hack above will do.

Stats wrong for Chinese text

Stats currently are calculated in such a way that don't work for character-based languages, such as Chinese. For example, take this single-page text, with completely garbage terms created:

Even though the terms are trash, they cover 100% of the text, so you'd expect the % to be pretty high ... but the index page shows 0% known:

Obviously not right.

Consider reducing number of term ratings

Currently, Lute uses the ratings from LWT (unknown, 1 to 5, then Well Known or Ignore). That's 7 choices, which I think is way too many.

Personally, when I'm reading, I'm really only thinking like this:

unknown
NEW: "I've never seen this before, or I really don't know it well."
LEARNING: I've seen this before, but I still don't know it.
LEARNED: I know this word
IGNORE: This word doesn't really count, it's a name or whatever.

The current statuses are as follows:

(StID, StAbbreviation, StText)
(0, '?', 'Unknown'),
(1, '1', 'New (1)'),
(2, '2', 'New (2)'),
(3, '3', 'Learning (3)'),
(4, '4', 'Learning (4)'),
(5, '5', 'Learned'),
(99, 'WKn', 'Well Known'),
(98, 'Ign', 'Ignored');

I think these should be mapped as follows:

New(1) and New(2) => New
Learning(3) and Learning(4) => Learning
Learned and Well Known => Learned

Changes needed:

db migration to change the content of the status table
change term form radio buttons
change the keyboard shortcuts for rating things

There may be other places too. I think the numbers are hardcoded in a few places, but there are some predefined constants in src/Entity/Status.php.

NOTE: I don't know why I was so hung up on this at the time ... I could just choose to ignore the values I don't use :-)

Add hotkey "Click to copy sentence or paragraph"

maybe helps people using other tools.

Option: somehow toggle term selection so that people can click-drag-highlight to select terms. Currently click-drag creates multiword terms, could do something differently ??

Fatal error: Invalid default value for 'UpdatedDate' - On Lute installation

Invalid timestamp

Hi Jeff! I'm trying to install Lute with a LAMP server, but I ran into a fatal error. Basically, the app crashes while displaying File 20221221_233742_add_textstatscache_timestamps.sql exception: Invalid default value for 'UpdatedDate' Quitting.

The file is located at ./db/migrations/20221221_233742_add_textstatscache_timestamps.sql. I have been trying multiple different timestamps as 0000-00-00 00:00:00 or 0 without any change. The annoying thing is that it also make the tests crash.

Ubuntu
LAMP
MySQL: 8.0.31
Apache/2.4.52

EDIT: skipping this instructions was enough, as it is a fresh DB install, I don't need the migration.

Same word shown twice on render

As at commit:

$ git log -n 1
commit 4b587946918bd8ee71e363c0e6b79e02a98be4ca (HEAD -> develop, origin/develop)
Author: Jeff Zohrab <[email protected]>
Date:   Thu Mar 2 13:17:14 2023 -0600
    Remove unused code.

Weird, only one record in the db:

mysql> select * from textitems2 where ti2txid = 359 and ti2text like '%resq%';
+---------+---------+---------+---------+----------+--------------+---------------+---------------+
| Ti2WoID | Ti2LgID | Ti2TxID | Ti2SeID | Ti2Order | Ti2WordCount | Ti2Text       | Ti2TextLC     |
+---------+---------+---------+---------+----------+--------------+---------------+---------------+
|  111581 |       1 |     359 |   94730 |      834 |            1 | resquebrajado | resquebrajado |
+---------+---------+---------+---------+----------+--------------+---------------+---------------+
1 row in set (0.14 sec)

Bare metal install throws unless MeCab is installed?

It appears that it bombs if you install Lute on a system without Mecab. From Sir Digby in Discord:

See what happens with the install if MeCab isn't there.

Ability to remove "Mark rest as known" button

This button is so confusing and easy to press by occasion. How can I remove it?

Change how Lute displays overlapping Terms, so that words/characters aren't written twice

Terms that partially overlap are both displayed. For example, suppose you defined terms "apple ball" and "ball cat". Given the imported text "apple ball cat dog", Lute will show this as "[apple ball][ball cat][dog]". The word "ball" is shown twice, because Lute cannot decide which term should really be shown ... only you know that.

Now, I know that this looks off, but it was the best solution I could come up with! For me studying Spanish, this has only occurred a few times while reading ... e.g. I have the terms "llegar a", and "a ver", which are both common constructs, and very occasionally while reading this has been rendered as "[llegar a][a ver]". It has not been enough of a bother for me to come up with an alternate solution -- after a cursory think, I believe that a good solution to this could be quite complicated, but I'd have to spend time investigating to be sure.

Issue given user @alguien in Discord, for the sentence "开始新生活吧，好吗？" (Note that "生" is rendered twice):

Possible solutions, sketches only:

1. "Mouseover reveal overlap" ... maybe something like ... "show the first term completely, and show the second one in such a way that the user knows that it's partially overlapped by the first; and on mouseover of the second term show it fully, and hide part of the first term." Tricky!

2. "Mouseover popup shows full" - "show the un-overlapped portion of the second term, but on mouseover the pop-up shows the full term".

Both of these solutions don't change the fact that Lute is only showing the first term fully, and that perhaps it's really the second term that's the right one, in the context, but at least you wouldn't see weird repeats. Solution 2 is easier, less moving parts.

Comment from user @alguien in Discord:

I see, the system as it is for spanish it sounds like a decent compromise but in chinese i don't think it's as good of an solution as it currently is because, sometimes the overall meaning will get through (不知知道） but other times the combination means something else entirely and the meaning of the text will get distorted（新生生活）, so it's far from ideal. It's possible that this happens less with material not aimed at beginners but it's going to still happen from time to time regardless. If it's not as much of a coding error as an unintended consequence of the algorhitm I understand it will be problematic to fix just for one language, so I won't expect a fix anytime soon. I think that having this give priority to the first term would be okay here, not sure about other languages with similar issues but I think that'd parse well with Chinese

3. "Unhighlighted text mode" - Another possible solution, but a big change for the UI/user experience: when reading, add a "render white page" mode or something is which Lute doesn't show the terms as color-coded "chunks" on a page, just show the text pretty much as-is (white page), but for each character/word, on hover, pop up every possible phrase that it's a part of. Then, on un-check of "render white page", all of the color-coded terms show up.

Consider docker pre-built images

With docker pre-built images pushed to docker hub, users with docker would be able to start using Lute with just a few clicks:

get the .env file, and the docker-compose.yml. Then

$ mkdir lute
$ cd lute
$ # cp the .env file and compose file to this dir, editing them if needed
$ docker compose up

I don't feel any code changes are needed. V2 (soon to be merged into develop) already has the various code changes in place needed for the image to work well. ... But there may be other requirements as well for this to work on all client machines that I'm not aware of!!

Ref different builds for different architectures (https://docs.docker.com/build/building/multi-platform)

Handle dups, and multiple parents, in the import mapping file

I'm not sure what happens if the parent mapping file contains dups, it doesn't look like it will handle it well.

cat     cats
cat     cats

Also, the code may have problems if the same term is mapped to different parents -- eg a fake example:

parent     somechild
child        somechild

should only import the mapping once. Add a check and test.

Front page redesign - show my books on the first screen, if I have any

I find the current front page not very useful, I'm always immediately leaving it to either read, or to go to the book list. When I open Lute, the only thing I'm interested in doing is reading, not seeing the current list of links, and when I'm done reading something, the only thing I'm interested in is creating or starting the next thing to read.

For new users, the existing list of links is good!

Perhaps the page could look like this, once some books have been defined:

The links at the bottom of the page could be rearranged to save real estate ... actually, for users that have already defined books, some of those links wouldn't even be needed, as the list of books is already there.

The book listing sort order should initially be something like:

books opened more recently should be sorted to the top of the list, those are what I'm currently interested in
newer created books should appear next
the rest appear last

Error on import: "...Integrity constraint violation: 19 UNIQUE constraint failed: words.WoTextLC, words.WoLgID"

With attached file (language = Mandarin)

new_1.csv

Go back to previous page after edit/update terms

Is your feature request related to a problem? Please describe.

Go back to previous page after edit/update terms.

Describe the solution you'd like

eg. You have 20 pages of terms, you go to page 11, edit and save one of those terms, and it should stay on page 11.

Describe alternatives you've considered

Use command+click to pop up a new window on macOS.

Additional context

I use phpMyAdmin instead for the moment.

Using real URL for dictionary URL

Currently, the dictionary URL contain arbitrary placeholders, it has a few consequences:

The dictionary input filed cannot be set of type "url", though it gives nice features to virtual keyboard users.
The URL become difficult to manipulate on dev side. They needed to be tweaked in several way to detect Pop-Up and insert term.
It's a very rigid system. With the official LWT, a third feature "word encoding" was present and was incredibly hard to implement.

I formatted everything with a proper URL in my fork of LWT, as it works better than the previous system I would like to apply a similar system here. Let me detail it.

General approach

As far as I have though, the best system is to replace arbitrary code by instructions preceded by a nice prefix like "lute_". For instance:

Replace term placeholder ### by lute_term.
Replace the placeholder indicator * by an argument lute_popup=1
It will easily be extended on the fly.

A bit more about Pop-Up marker

I don't think this part should go into the dictionary URL, as it makes it longer whatsoever and can be confusing for users. A field "display in pop-up", with a database counterpart can be better, but it is slightly more difficult to implement.

Backward compatibility

Backward compatibility with the "placeholder" system was not to hard to achieve, it shouldn't be harder to achieve here. I would also like to add a compatibility with my (prefixed by lwt_) if you don't have any objection against it.

If it makes sense to you, I can work on it and make a nice PR.

Terms with "." give "resource not found"

e.g., I have my language exceptions for spanish set up so that "A." is highlighted as an "unknown term" but clicking on that gives this:

The reason for this is that Symfony doesn't like URLs with dots in them. ... similar to the JPEG issue noted in src/entity/Term.php (I think).

Minor issue, but still needs fixing.

add greek sample

Regex chars:

a-zA-ZÀ-ÖØ-öø-ȳͰ-Ͽἀ-ῼ

Sample story: https://www.greek-language.gr/certification/dbs/teachers/show.html?id=5

Figure out the dictionary

from Cynthios in slack:

Modern Greek
https://en.wiktionary.org/wiki/###
https://www.wordreference.com/gren/###
*https://www.deepl.com/translator#el/en/###

Only show term sentences for texts I've actually read

I have a few books loaded that I haven't read yet, or have pages remaining I haven't read yet. Sometimes when I ask for a term to show its sentences, I'm shown sentences that are well in advance in the book I'm reading, so I really don't know what the context is (I can sometimes guess for texts I've read in the past).

Outline of code changes:

add "texts.TxIsRead boolean" field, default false
update code and tests to only include sentences where TxIsRead
change query to return latest sentenceIDs first -- these are freshest in the mind, presumably.
set TxIsRead on moving to the next page (note: do this on the ">" link at the top, as well as on the check-> and > at the bottom of the page)
assume all archived books have been fully read, set them to read
for active books, set all pages before the current page to read

Multi-word terms not highlighting correctly for new texts.

Description

See title :-)

To Reproduce

Create a new English text AP1 with text "Abc wrote to the Associated Press about it."
Create another new Eng text AP2 with "Def wrote to the Associated Press about it."
Open AP1. Highlight "associated press" and make a new term, save it. Then mark all words as known. Text looks like the following:
Open AP2. Text looks like the following:
Create new Eng text AP3 with "Ghi wrote to the Associated Press about it." Note that Associated Press is not highlighted, but it should be.

Lute v1.1.4

Broken RTL languages support.

Description

RTL display is broken, it's not completely LTR either, I'll provide Arabic text.
Without diacritics (which represent short vowels), it shows words correctly, but in the wrong order LTR instead of RTL, but when I add the diacritics, it loses its ability to show connected letters mostly.

Another point is, Titles show completely fine, with the diacritics and in the correct order.

Reproduce

صَبَاحُ الخَير.md

This is the file if you wanna try it on your system.

Edit: text from the file ... SORT OF ... even copying and pasting it here makes it behave strangely. The exclamation marks belong on the left of the first two lines. :-)

خالِد: صَبَاحُ الخَيرِ!
خُلُود: صَبَاحُ النُّور!
خالِد: كَيفَ حَالُكِ؟
خُلُود: بِخَيرٍ، وَأَنْتَ؟
خَالِد: فِي أَحْسَنِ حالٍ. ما اسمُكِ؟
خُلُود: اسمي خُلُودُ، وأَنتَ، ما اسمُكَ؟
خَالِد: أَنا خَالِد، تَشَرَّفْتُ بِلِقَائِكِ.
خُلُود: الشَّرَفُ لِي.

Screenshot of the text file:

Screenshots

Extra software info, if not already included in the Description:

OS: Fedora 38 workstation edition (Gnome)
Browser: fire fox.
Web server: Idk, I'm running it locally, with a docker installation.
Version: 2.0.14

Spanish texts don't tokenize "EE.UU." correctly

"Set all to known" still shows it as unknown. Probably getting parsed into multiple parts with zero-width space, instead of being treated as a single unit.

The display of Japanese numerical words is incorrect

Description

The display of Japanese numerical words is incorrect.

To Reproduce

Steps to reproduce the behavior, e.g.:

Go to Creat New Text
Copy and paste below text.

言葉一つで傷つくような
ヤワな私を捧げたい今
二度と訪れない季節が

Click Save
See error

Screenshots

I checked with MeCab in terminal, it can recognize numerical words very well.

Extra software info, if not already included in the Description:

OS (e.g., iOS, windows): macOS
Browser (e.g., chrome, safari): brave
Web server (e.g., regular apache/php, MAMP, XAMP...): Symfony Server
Version (git commit): 1.1.5

Hover mouse + hotkey for all actions (instead of having to select first)

This feels like a nice idea, less clicky-clicky.

In addition, at least one user (quopquaoi in Discord) reported an issue: when clicking a term, the language dictionary (Jisho, Japanese dict) was "pulling focus", so the hotkey was actually getting sent to Jisho, instead of being handled by Lute. If hotkeys worked on hover, that wouldn't be a problem, because Lute would still be running the show (would have focus).

Possible implementation:

on hover sets a blue underline for terms
hotkeys work on the blue-underlined term
if user clicks the term, it switches to red underline, the form shows, dictionary loaded, etc etc as currently happens
bulk updates (shift-click) I think would still behave as they do currently - red underline, status updates work

UX issues to sort out:

if user multi-clicks some words, and then hits a status key, are these words still selected (red underline)? If yes, how does the "hover" thing take over from the red underlined? or, does holding the shift while clicking mean that it's just a blue underline, because the words aren't really active?

Remove double space from imported text

Sometimes texts have two spaces, which then causes problems with multi-word terms not matching. Eg, "llevar [ ] [ ] a" will not match with "llevar [ ] a". Just regex replace space-space on import text.

Error when starting docker

Just wondering if anyone could help me run this project with Docker.

Steps that I took:

clone the repo
cd into the repo using CMD (Windows 11)
ran the command docker compose build (as per instructions)
I get this:

c:\code\lute>docker compose build
time="2023-06-25T10:54:29+12:00" level=warning msg="The \"BACKUP_HOST_DIR\" variable is not set. Defaulting to a blank string."
1 error(s) decoding:

error decoding 'volumes[1]': invalid spec: :/lute/backup: empty section between colons

Retain identical pop up on hover behavior for words set to "well-known" as learning statuses 1-5

Is your feature request related to a problem? Please describe.

I believe that the current behavior discourages ever setting a word to well-known. My stated reasoning is as follows: A user may have intimate familiarity with the more common forms of a word and consequently wish to remove the colored highlight of learning statuses 1-5 but still desire to have quick reference to the parent word. The current behavior requires the user to completely open the entry for a word in order to access its definition, parent word, and tags when a mere hover should suffice.

Take this use case as an example; in Latin, in order to be able to conjugate a given verb into every tense, voice, and mood, the learner is required to have mastered the verb's so called "principal parts". The more frequent a learner's exposure to the principal parts, the easier their ultimate acquisition will be. The parent word box is an excellent place to put not only the principal parts but also a short definition of the headword for quick reference.

Describe the solution you'd like

Ideally, Lute would retain the identical pop up on hover behavior for words set to "well-known" as it does for learning statuses 1-5.

Raspberry Pi installation fails

I believe that @99MengXin already has Lute running with Docker on the Rasp Pi - in the Discord "install" channel ecurp_forp is having massive trouble getting started ... and I'm not sure what's wrong.

I have no idea what the issue is, and can't debug it or suggest changes, so maybe try setting up a Vagrant Box with a Rasp Pi OS, put Docker in there, install Lute in there, run it ... and watch the world explode. (Try using Vagrant box w/ Docker inside, and run Lute within Docker within Vagrant)

and the rest should be ... pretty straightforward. :-/ Not.

127 error code on using backup within Lute

Notes from discord chat:

root@63dce09ff1c7:/# mysqldump --version
mysqldump  Ver 10.19 Distrib 10.7.8-MariaDB, for debian-linux-gnu (aarch64)

Lute v1.1.7

Dump in terminal works:

# mysqldump -u root --password=root lute > /lute/backup.sql
root@63dce09ff1c7:/# cd lute
root@63dce09ff1c7:/lute# ls
backup.sql

jzohrab / lute Goto Github PK

lute's People

Contributors

Stargazers

Watchers

Forkers

lute's Issues

Specs for Lute:

More notes:

Summary

First iteration: create a mapping file outside of Lute, then import.

Sample code using spacy-stanza

ref code links for spacy

Future iterations

App design

UX

Sample branch

Remaining todos

creating sqlite baseline

porting data - test cases

misc notes accumulated during hacking:

exporting the schema

fixing col types

misc other notes

General approach

A bit more about Pop-Up marker

Backward compatibility

Recommend Projects

Recommend Topics

Recommend Org