Code Monkey home page Code Monkey logo

lingtrain-aligner-editor's Introduction

Hi there, I'm Sergei 👋

Twitter Follow Linkedin: averkieff Habr Badge Ods.ai Badge Profile views

  • 🚀 Working in the field of ML and MLOps.
  • 🌱 My main interest in this area is mostly NLP.
  • 😄 Besides my work I like to learn languages (Chinese, Russian, English, German, Czech, Hungarian, Japanese).
  • 💬 Ask me about how to pronounce "Köszönöm" in Hungarian and what does 侍 mean.
  • 🖋️ I'm also writing articles time to time.

Channels

Habr

Medium

lingtrain-aligner-editor's People

Contributors

averkij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lingtrain-aligner-editor's Issues

Show a list of lines that were used more then once

In addition to #7 it would be usefull to see a list of lines for both languages that were used more then once, i.e. - duplicated.

Duplicated lines are not shown as conflicts, at least I wasn't able to trigger that manually adding duplicates.

Parallel data processing

Due to the model inference alignment process can take a long time for the big (1000+ lines) documents. Parallel processing will decrease the calculation time dramatically.

The following points should be considered:

  • batches should be independent
  • multiprocessing instead of multithreading (it's a CPU bound task and Python code)
  • tasks should update some state which will be fetched by UI during the processing

Alignment completion flag

Requested by Alexandra Khramtsova.

For successfully aligned pair of sentences user can add some mark of completion (through the button or the checkbox).
This information can be used for calculation the overall progress and track unaligned sentence pairs.

Change file order IDs to GUIDs

Now API uses the file orders (1, 2, etc.) during the queries. It is inconvenient because of the excessive checks and inability to determine if the alignment was already started for selected documents or not.

We need

  • to create the database and store the uploaded file names along with the IDs
  • use them during the API queries
  • use them also on Frontend

Undo button

Requested by Alexandra Khramtsova.

Would be useful to have the undo last action functionality.

Filter already used candidates

Requested by Alexandra Khramtsova.

In the alignment candidates list would be useful to have the ability to leave only the unused ones.

Alignment conflicts autodetection

Detect the conflicts and highlight the corresponding lines.

Conflict — break in the alignment flow of any kind. More detailed classification of conflicts should be done before doing this issue.

Fix the download function

After moving to the new storage mechanism (on DB) downloading was broken.

Consider the index ordering while implementing the new download mechanism.

"Generate preview" and "Download" buttons are not working

Not sure if this is some kind of edge case, triggered by me, or just a problem in the current version, but having alignment done with no pending conflicts - when I go to Create tab and press "Generate preview" or "Download book" there's no result and an exception in console:

03-Jul-21 15:06:36 [ERROR] - 24: Exception on /items/Alex/create/fr/en/89f8ad0e6f154cd1896d34e971153600/preview [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functionsrule.endpoint
  File "./main.py", line 773, in get_book_preview
    paragraphs, delimeters, metas = reader.get_paragraphs_polybook(
  File "/usr/local/lib/python3.8/site-packages/lingtrain_aligner/reader.py", line 180, in get_paragraphs_polybook
    par_info = [(par_id, par_sent_ids_from, par_sent_ids_to)
  File "/usr/local/lib/python3.8/site-packages/lingtrain_aligner/reader.py", line 180, in <listcomp>
    par_info = [(par_id, par_sent_ids_from, par_sent_ids_to)
  File "/usr/local/lib/python3.8/site-packages/lingtrain_aligner/reader.py", line 446, in get_next_paragraph
    tid_min, tid_max = min(json.loads(item[0][3])), max(
ValueError: min() arg is an empty sequence
[pid: 24|app: 0|req: 1591/3777] 172.17.0.1 () {48 vars in 1012 bytes} [Sat Jul  3 15:06:36 2021] POST /items/Alex/create/fr/en/89f8ad0e6f154cd1896d34e971153600/preview => generated 290 bytes in 328 msecs (HTTP/1.1 500) 4 headers in 160 bytes (1 switches on core 0)
172.17.0.1 - - [03/Jul/2021:15:06:36 +0000] "POST /items/Alex/create/fr/en/89f8ad0e6f154cd1896d34e971153600/preview HTTP/1.1" 500 290 "http://localhost/user/Alex/create/fr/en" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0" "-"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.