humansignal / label-studio-ml-backend Goto Github PK

Configs and boilerplates for Label Studio's Machine Learning backend

License: Apache License 2.0

Dockerfile 7.60% Python 92.13% Shell 0.24% Makefile 0.03%

label-studio-ml-backend's Introduction

What is the Label Studio ML backend?

The Label Studio ML backend is an SDK that lets you wrap your machine learning code and turn it into a web server. The web server can be connected to a running Label Studio instance to automate labeling tasks.

If you just need to load static pre-annotated data into Label Studio, running an ML backend might be overkill for you. Instead, you can import pre-annotated data.

Quickstart

To start using the models, use docker-compose to run the ML backend server.

Use the following command to start serving the ML backend at http://localhost:9090:

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/label_studio_ml/examples/{MODEL_NAME}
docker-compose up

Replace {MODEL_NAME} with the name of the model you want to use (see below).

Allow the ML backend to access Label Studio data

In most cases, you will need to set LABEL_STUDIO_URL and LABEL_STUDIO_API_KEY environment variables to allow the ML backend access to the media data in Label Studio. Read more in the documentation.

Models

The following models are supported in the repository. Some of them work without any additional setup, and some of them require additional parameters to be set.

Check the Required parameters column to see if you need to set any additional parameters.

Pre-annotation column indicates if the model can be used for pre-annotation in Label Studio:
you can see pre-annotated data when opening the labeling page or after running predictions for a batch of data.
Interactive mode column indicates if the model can be used for interactive labeling in Label Studio: see interactive predictions when performing actions on labeling page.
Training column indicates if the model can be used for training in Label Studio: update the model state based the submitted annotations.

MODEL_NAME	Description	Pre-annotation	Interactive mode	Training	Required parameters
segment_anything_model	Image segmentation by Meta	❌	✅	❌	None
llm_interactive	Prompt engineering with OpenAI, Azure LLMs.	✅	✅	✅	OPENAI_API_KEY
grounding_dino	Object detection with prompts. Details	❌	✅	❌	None
tesseract	Interactive OCR. Details	❌	✅	❌	None
easyocr	Automated OCR. EasyOCR	✅	❌	❌	None
spacy	NER by SpaCy	✅	❌	❌	None
flair	NER by flair	✅	❌	❌	None
bert_classifier	Text classification with Huggingface	✅	❌	✅	None
huggingface_llm	LLM inference with Hugging Face	✅	❌	❌	None
huggingface_ner	NER by Hugging Face	✅	❌	✅	None
nemo_asr	Speech ASR by NVIDIA NeMo	✅	❌	❌	None
mmdetection	Object Detection with OpenMMLab	✅	❌	❌	None
sklearn_text_classifier	Text classification with scikit-learn	✅	❌	✅	None
interactive_substring_matching	Simple keywords search	❌	✅	❌	None
langchain_search_agent	RAG pipeline with Google Search and Langchain	✅	✅	✅	OPENAI_API_KEY, GOOGLE_CSE_ID, GOOGLE_API_KEY

(Advanced usage) Develop your model

To start developing your own ML backend, follow the instructions below.

1. Installation

Download and install label-studio-ml from the repository:

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/
pip install -e .

2. Create empty ML backend:

label-studio-ml create my_ml_backend

You can go to the my_ml_backend directory and modify the code to implement your own inference logic.

The directory structure should look like this:

my_ml_backend/
├── Dockerfile
├── docker-compose.yml
├── model.py
├── _wsgi.py
├── README.md
└── requirements.txt

Dockefile and docker-compose.yml are used to run the ML backend with Docker. model.py is the main file where you can implement your own training and inference logic. _wsgi.py is a helper file that is used to run the ML backend with Docker (you don't need to modify it). README.md is a readme file with instructions on how to run the ML backend. requirements.txt is a file with Python dependencies.

3. Implement prediction logic

In your model directory, locate the model.py file (for example, my_ml_backend/model.py).

The model.py file contains a class declaration inherited from LabelStudioMLBase. This class provides wrappers for the API methods that are used by Label Studio to communicate with the ML backend. You can override the methods to implement your own logic:

def predict(self, tasks, context, **kwargs):
    """Make predictions for the tasks."""
    return predictions

The predict method is used to make predictions for the tasks. It uses the following:

tasks: Label Studio tasks in JSON format
context: Label Studio context in JSON format - for interactive labeling scenario
predictions: Predictions array in JSON format

Once you implement the predict method, you can see predictions from the connected ML backend in Label Studio.

4. Implement training logic (optional)

You can also implement the fit method to train your model. The fit method is typically used to train the model on the labeled data, although it can be used for any arbitrary operations that require data persistence (for example, storing labeled data in a database, saving model weights, keeping LLM prompts history, etc).

By default, the fit method is called at any data action in Label Studio, like creating a new task or updating annotations. You can modify this behavior from the project settings under Webhooks.

To implement the fit method, you need to override the fit method in your model.py file:

def fit(self, event, data, **kwargs):
    """Train the model on the labeled data."""
    old_model = self.get('old_model')
    # write your logic to update the model
    self.set('new_model', new_model)

with

event: event type can be 'ANNOTATION_CREATED', 'ANNOTATION_UPDATED', etc.
data the payload received from the event (check more on Webhook event reference)

Additionally, there are two helper methods that you can use to store and retrieve data from the ML backend:

self.set(key, value) - store data in the ML backend
self.get(key) - retrieve data from the ML backend

Both methods can be used elsewhere in the ML backend code, for example, in the predict method to get the new model weights.

Other methods and parameters

Other methods and parameters are available within the LabelStudioMLBase class:

self.label_config - returns the Label Studio labeling config as XML string.
self.parsed_label_config - returns the Label Studio labeling config as JSON.
self.model_version - returns the current model version.
self.get_local_path(url, task_id) - this helper function is used to download and cache an url that is typically stored in task['data'], and to return the local path to it. The URL can be: LS uploaded file, LS Local Storage, LS Cloud Storage or any other http(s) URL.

Run without Docker

To run without Docker (for example, for debugging purposes), you can use the following command:

label-studio-ml start my_ml_backend

Test your ML backend

Modify the my_ml_backend/test_api.py to ensure that your ML backend works as expected.

Modify the port

To modify the port, use the -p parameter:

label-studio-ml start my_ml_backend -p 9091

Deploy your ML backend to GCP

Before you start:

Install gcloud.
Initialize billing for your account if it's not activated.
Initialize gcloud, enter the following commands and login with your browser:

gcloud auth login

Activate your Cloud Build API.
Find your GCP project ID.
(Optional) Add GCP_REGION with your default region to your ENV variables.

To start deployment:

Create your own ML backend
Start deployment to GCP:

label-studio-ml deploy gcp {ml-backend-local-dir} \
--from={model-python-script} \
--gcp-project-id {gcp-project-id} \
--label-studio-host {https://app.heartex.com} \
--label-studio-api-key {YOUR-LABEL-STUDIO-API-KEY}

After Label Studio deploys the model, you can find the model endpoint in the console.

Troubleshooting

Troubleshooting Docker Build on Windows

If you encounter an error similar to the following when running docker-compose up --build on Windows:

exec /app/start.sh : No such file or directory
exited with code 1

This issue is likely caused by Windows' handling of line endings in text files, which can affect scripts like start.sh. To resolve this issue, follow the steps below:

Step 1: Adjust Git Configuration

Before cloning the repository, ensure your Git is configured to not automatically convert line endings to Windows-style (CRLF) when checking out files. This can be achieved by setting core.autocrlf to false. Open Git Bash or your preferred terminal and execute the following command:

git config --global core.autocrlf false

Step 2: Clone the Repository Again

If you have already cloned the repository before adjusting your Git configuration, you'll need to clone it again to ensure that the line endings are preserved correctly:

Delete the existing local repository. Ensure you have backed up any changes or work in progress.
Clone the repository again. Use the standard Git clone command to clone the repository to your local machine.

Step 3: Build and Run the Docker Containers

Navigate to the appropriate directory within your cloned repository that contains the Dockerfile and docker-compose.yml. Then, proceed with the Docker commands:

Build the Docker containers: Run docker-compose build to build the Docker containers based on the configuration specified in docker-compose.yml.
Start the Docker containers: Once the build process is complete, start the containers using docker-compose up.

Additional Notes

This solution specifically addresses issues encountered on Windows due to the automatic conversion of line endings. If you're using another operating system, this solution may not apply.
Remember to check your project's .gitattributes file, if it exists, as it can also influence how Git handles line endings in your files.

By following these steps, you should be able to resolve issues related to Docker not recognizing the start.sh script on Windows due to line ending conversions.

Troubleshooting Pip Cache Reset in Docker Images

Sometimes, you want to reset the pip cache to ensure that the latest versions of the dependencies are installed. For example, Label Studio ML Backend library is used as label-studio-ml @ git+https://github.com/HumanSignal/label-studio-ml-backend.git in requirements.txt. Let's assume that it is updated, and you want to jump on the latest version in your docker image with the ML model.

You can rebuild a docker image from scratch with the following command:

docker compose build --no-cache

Troubleshooting `Bad Gateway` and `Service Unavailable` errors

You might see these errors if you send multiple concurrent requests.

Note that the provided ML backend examples are offered in development mode, and do not support production-level inference serving.

label-studio-ml-backend's People

Contributors

Stargazers

Watchers

Forkers

changhsinlee mtl67 mmartinortiz serjrd volerog napoler gherao courseprojects avandebrook annoitdotcom graphext corner4world ntlex tchaton mauricebauer fpaupier hovoh dsanders11 trevortega fangpochen shoummu1 ballooninc hysios krnithishkumar omaymas abhinavthomas aaronfderybel lcnlvrz-revelacion stefosgeo wxthss82 aniketmaurya pshwetank fralik gradsflow kino4u hendriksuvalov antmoev luxiangx iitb-leap-ocr sunqidong1999 rsps950551 thehappy1 arronmabrey fabienarcellier maksim-klishch-heroku-deploy thienbon kuwork ekaterina-merkulova ndlrf-rnd amtam0 k200x hopeskair gohypergiant mathrb softiger vivekdurai romakoks ditto-vicky lmmx jovanhuang delandcaglar akoses rballachay-ecoation chenpengf nfarisia sheldonldev nabilragab faebots babajideowoyele haiderasad rahulkishen18 paul-english oussamayousre michelpromonet nij6173 thalesgroup mo-arvan sunwood-ai-labs chorus12 nikitabelooussovbtis ftesser ricpruss pjhu yeeanz farahats9 mottajacopo marvinyang93 mbatra-hunter lucky096 nchanez sudo-install-mw kyesh leon-richard sirmomster zhangyuef spmngs vabt-igd shondle yakuho hogepodge

label-studio-ml-backend's Issues

Request not defined in exceptions file

Line no 17 in exceptions.py: a.update({'request': request.args})
Uses request object which is not defined in internal variable and also not imported as form of library. It is throwing an exception becuase of that.

Flask frozen version does not work anymore

The frozen version of Flask no longer works. When the backend is invoked from the command line, I get an error message. Flask version 1.1.4 installs a version of Jinja2 incompatible with version 1.1.4.

Traceback (most recent call last):
  File "{path}/_wsgi.py", line 30, in <module>
    from label_studio_ml.api import init_app
  File "/{path}/.venv/lib/python3.8/site-packages/label_studio_ml/api.py", line 4, in <module>
    from flask import Flask, request, jsonify
  File "{path}/.venv/lib/python3.8/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
ImportError: cannot import name 'escape' from 'jinja2'

2 solutions :

freeze all the library version
use less strict requirements

Currently, I have rewritten the requirements.txt like this. That seems to work.

attr>=0.3.1
attrs>=19.2.0
appdirs>=1.4.3
colorama>=0.4.4
Flask>=1.1.4
lxml>=4.2.5
Pillow
requests>=2.22.0,<3
scikit-learn>=0.24.1
label-studio-tools>=0.0.0.dev11

Thanks a lot for your amazing work ;)

Doesn't install default_configs, because of missing MANIFEST.in file

See here: https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html e.g.

This also makes label-studio-transformers fail, as the first thing to do according to its README is

label-studio-ml init my-ml-backend --script models/bert_classifier.py

which fails because it can't find the default_configs folder

requirement version conflict

The root requirements.txt has scikit-learn==0.24.1. The one under label_studio_ml/examples has scikit-learn==0.22.2.post1.

Newer version of itsdangerous incompatible with Flask:1.1.2

With the 2.1.0 version of itsdangerous on Feb 18, 2022, it is now only compatible with Flask>=1.1.4
label-studio-ml is fixing the version of Flask to 1.1.2

Error when starting the simple text classifier example:

server    | [2022-02-28 08:49:14 +0000] [1] [INFO] Starting gunicorn 20.1.0
server    | [2022-02-28 08:49:14 +0000] [1] [INFO] Listening at: http://0.0.0.0:9090 (1)
server    | [2022-02-28 08:49:14 +0000] [1] [INFO] Using worker: gthread
server    | [2022-02-28 08:49:14 +0000] [8] [INFO] Booting worker with pid: 8
server    | [2022-02-28 08:49:14 +0000] [1] [INFO] Shutting down: Master
server    | [2022-02-28 08:49:14 +0000] [1] [INFO] Reason: Worker failed to boot.
server exited with code 3

If using --preload on the gunicorn call of the Dockerfile to have better logs:

ImportError: cannot import name 'json' from 'itsdangerous' (/usr/local/lib/python3.8/site-packages/itsdangerous/__init__.py)

Temporary fix for me was to downgrade the version of itsdangerous to 2.0.1 in the requirements of the text classifier example backend

Best solution might be to upgrade Flask to 1.1.4 for label-studio-ml dependencies

Related issue and proposed solution on Microsoft Azure CLI:
Azure/azure-cli#21363 (comment)

confusion caused by versions > 1.0.4

Hi,
I have questions regarding the versions of this library after 1.0.4 because I'm confused with how they differ from 1.0.4 and how they should be used.

The Dockerfile and relevant files confuse me. After 1.0.4, the docker setup changed, and the following happened:

requirements.txt is empty; why? Is this ok? If that's intended, then why still keep it in the first place? Why is there no dependency on gunicorn when it is essential?
supervisord isn't used anymore
- If that's the case, why do we still have supervisord.conf in the container?
- How is training happening? Is it simply using one (or more) of the default eight gunicorn threads? What if I run two simultaneous training jobs? Is it a "race"?
what happened with Redis support?
- docker-compose still deploys it
- If the above regarding training is true, then we don't need redis, right? rq is no longer started via supervisord because there is no supervisord

Could someone familiar with the matter clarify some of the above doubts?
Thanks

image semantic segmentation ml backend example?

Is there an example for image semantic segmentation ml backend? I am not sure, how to write the predict results to adapt to label studio. Thank you!

Conflicting dependencies?

I'm on Ubuntu 18. I created a fresh conda environment with pip installed, and tried to follow the instructions from https://github.com/heartexlabs/label-studio-ml-backend#quickstart. I got this:

INFO: pip is looking at multiple versions of label-studio-ml to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install label-studio-ml because these package versions have conflicting dependencies.

The conflict is caused by:
    label-studio 1.0.2.post0 depends on psycopg2-binary==2.8.4
    label-studio 1.0.2 depends on psycopg2-binary==2.8.4
    label-studio 1.0.1 depends on psycopg2-binary==2.8.4
    label-studio 1.0.0.post3 depends on psycopg2-binary==2.8.4
    label-studio 1.0.0.post2 depends on psycopg2-binary==2.8.4
    label-studio 1.0.0.post1 depends on psycopg2-binary==2.8.4
    label-studio 1.0.0.post0 depends on psycopg2-binary==2.8.4
    label-studio 1.0.0 depends on psycopg2-binary==2.8.4

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Validation error from label-studio after docker-compose up -d of ml-backend

git clone https://github.com/heartexlabs/label-studio-ml-backend
cd label-studio-ml-backend/label_studio_ml/examples/simple_text_classifier
docker-compose up -d

Then in Label studio add it, but validation error occured as below:

I directly access it in the browser, it said:

docker-compose up -d, unable to copy *.py /app, interanl server error

Hi, I've done docker-compose up -d on default_configs
but it except with: unable to copy .py /app, internal server error in Dockerfile which seems correct (as there is no .py)
I've copied the up directory/.py there and it said no error, but the curl return internal server error

ImportError: cannot import name 'LabelStudioMLBase' from 'label_studio.ml'

Getting this error when trying to follow the guide: https://labelstud.io/guide/ml.html
at label-studio-ml init my_ml_backend --script label_studio-ml/examples/simple_text_classifier.py step

p.s.: not sure if it matters, but I didn't run the pip install -r requirements.txt command, since I already have newer versions of the listed packages

ContextualVersionConflict: google-cloud-core 2.0.0 on mmdetection.py

Hi
Have followed the instructions here and installed mmdetection in the venv with mmdetection robin$ pip install -r requirements/build.txt, but on running the script with label-studio-ml init coco-detector --from label_studio_ml/examples/mmdetection/mmdetection.py get the following:

Traceback (most recent call last):
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 584, in _build_master
    ws.require(__requires__)
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 901, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 792, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (google-cloud-core 2.0.0 (/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages), Requirement.parse('google-cloud-core<2.0dev,>=1.2.0'), {'google-cloud-storage'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/bin/label-studio-ml", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3262, in <module>
    def _initialize_master_working_set():
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3245, in _call_aside
    f(*args, **kwargs)
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3274, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 586, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 599, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages/pkg_resources/__init__.py", line 792, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (google-cloud-core 2.0.0 (/Users/robin/label-studio-ml-backend/label-studio-ml-backend/venv/lib/python3.8/site-packages), Requirement.parse('google-cloud-core<2.0dev,>=1.2.0'), {'google-cloud-storage'})

CUDA with multiprocessing

Greetings,

I'm getting this error when use ml backend with PyTorch.

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I see there is a solution.
I try to fix it here and it works fine on my machine.
Do you think my solution is feasible?
If so, I can make a PR ;-)

Error in installing label-studio-ml-backend

Steps:

Clone this project git clone https://github.com/heartexlabs/label-studio-ml-backend
Run cd label-studio-ml-backend
Run pip install -e .

The error:

    ERROR: Command errored out with exit status 1:
     command: /home/users/miniconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/users/Projects/tutorials/label_studio/label-studio-ml-backend/setup.py'"'"'; __file__='"'"'/home/users/Projects/tutorials/label_studio/label-studio-ml-backend/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-vpaczmrj
         cwd: /home/users/Projects/tutorials/label_studio/label-studio-ml-backend/
    Complete output (9 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/users/Projects/tutorials/label_studio/label-studio-ml-backend/setup.py", line 2, in <module>
        import label_studio_ml
      File "/home/users/Projects/tutorials/label_studio/label-studio-ml-backend/label_studio_ml/__init__.py", line 1, in <module>
        from .model import LabelStudioMLBase
      File "/home/users/Projects/tutorials/label_studio/label-studio-ml-backend/label_studio_ml/model.py", line 5, in <module>
        import redis
    ModuleNotFoundError: No module named 'redis'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

In setup.py, we import label_studio_ml on the second line, in which it will run label-studio-ml-backend/label_studio_ml/__init__.py, in which it ends up trying importing redis.

Isn't this circular dependency, where you need to install dependencies for label_studio_ml but you execute label_studio_ml source code?

Example simple_text_classifier.py Error

Hey, the new updates might have broken the initialization of example ml_backends. I follow the official steps to install;

git clone https://github.com/heartexlabs/label-studio-ml-backend
cd label-studio-ml-backend
pip install -U -e .
pip install -r label_studio_ml/examples/requirements.txt

label-studio-ml init --script label_studio_ml/examples/simple_text_classifier.py

Traceback (most recent call last):
File "/data/workspace/tltenv/bin/label-studio-ml", line 33, in
sys.exit(load_entry_point('label-studio-ml', 'console_scripts', 'label-studio-ml')())
File "/data/workspace/deep_cv/label-studio-ml-backend/label_studio_ml/server.py", line 119, in main
create_dir(args)
File "/data/workspace/deep_cv/label-studio-ml-backend/label_studio_ml/server.py", line 79, in create_dir
model_class = model_classes[0]
IndexError: list index out of range

the commit : add00c8 works fine

Docker Compose: No module named '_wsgi'

When cloning the repo and running docker-compose up I receive the following:

Starting redis ... done
Starting server ... done
Attaching to redis, server
redis     | 1:C 22 Apr 2022 05:32:50.317 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis     | 1:C 22 Apr 2022 05:32:50.317 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis     | 1:C 22 Apr 2022 05:32:50.317 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis     | 1:M 22 Apr 2022 05:32:50.317 * monotonic clock: POSIX clock_gettime
redis     | 1:M 22 Apr 2022 05:32:50.318 * Running mode=standalone, port=6379.
redis     | 1:M 22 Apr 2022 05:32:50.318 # Server initialized
redis     | 1:M 22 Apr 2022 05:32:50.318 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis     | 1:M 22 Apr 2022 05:32:50.318 * Loading RDB produced by version 6.2.6
redis     | 1:M 22 Apr 2022 05:32:50.318 * RDB age 2 seconds
redis     | 1:M 22 Apr 2022 05:32:50.318 * RDB memory usage when created 0.77 Mb
redis     | 1:M 22 Apr 2022 05:32:50.318 # Done loading RDB, keys loaded: 0, keys expired: 0.
redis     | 1:M 22 Apr 2022 05:32:50.318 * DB loaded from disk: 0.000 seconds
redis     | 1:M 22 Apr 2022 05:32:50.318 * Ready to accept connections
server    | Traceback (most recent call last):
server    |   File "/usr/local/bin/gunicorn", line 8, in <module>
server    |     sys.exit(run())
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
server    |     WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 231, in run
server    |     super().run()
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 72, in run
server    |     Arbiter(self).run()
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 58, in __init__
server    |     self.setup(app)
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 118, in setup
server    |     self.app.wsgi()
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
server    |     self.callable = self.load()
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
server    |     return self.load_wsgiapp()
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
server    |     return util.import_app(self.app_uri)
server    |   File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
server    |     mod = importlib.import_module(module)
server    |   File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
server    |     return _bootstrap._gcd_import(name[level:], package, level)
server    |   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
server    |   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
server    |   File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
server    | ModuleNotFoundError: No module named '_wsgi'
server exited with code 1

`ValueError: empty vocabulary; perhaps the documents only contain stop words` on text classification tutorial

I am trying to go through this tutorial:
https://labelstud.io/tutorials/sklearn-text-classifier.html

I'm taking the model code practically verbatim, and am using a fairly straightforward interface:

<View>
  <Text name="text" value="$text"/>
  <View style="box-shadow: 2px 2px 5px #999;                padding: 20px; margin-top: 2em;                border-radius: 5px;">
    <Header value="Choose text sentiment"/>
    <Choices name="sentiment" toName="text" choice="single" showInLine="true">
    <Choice value="no_pets"/><Choice value="pets"/></Choices>
  </View>
</View>

I've been able to launch and connect the backend in dev mode successfully, but upon each request it gives me an error as follows:

.../labelstudio/tfidf_backend/model3.py", line 148, in fit
    self.model.fit(input_texts, output_labels_idx)
...site-packages/sklearn/feature_extraction/text.py", line 1134, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

my completions in the fit function is apparently empty on every call, no matter what I do. This happens for both for "Train" via the interface, and for each annotation.

What am I dong wrong?

How return format json pre-annotation

Hi,
my_backend send this json--> to label-studio
{'annotations': [{'result': [{'from_name': 'label', 'to_name': 'image', 'type': 'rectanglelabels', 'value': {'rectanglelabels': ['car'], 'x': 0.07965517044067383, 'y': 0.08535835891962051, 'width': 0.0035918901364008584, 'height': 0.027162909507751465}, 'score': 0.98451}, {'from_name': 'label', 'to_name': 'image', 'type': 'rectanglelabels', 'value': {'rectanglelabels': ['car'], 'x': 0.06727302074432373, 'y': 0.11262299865484238, 'width': 0.0160603125890096, 'height': 0.011844754219055176}, 'score': 0.98362213}, {'from_name': 'label', 'to_name': 'image', 'type': 'rectanglelabels', 'value': {'rectanglelabels': ['car'], 'x': 0.06980470816294353, 'y': 0.07412908226251602, 'width': 0.013376221060752867, 'height': 0.025176003575325016}, 'score': 0.9786032}, {'from_name': 'label', 'to_name': 'image', 'type': 'rectanglelabels', 'value': {'rectanglelabels': ['car'], 'x': 0.044702822963396706, 'y': 0.0858452245593071, 'width': 0.015623619159062704, 'height': 0.026315413415431973}, 'score': 0.9712106}, {'from_name': 'label', 'to_name': 'image', 'type': 'rectanglelabels', 'value': {'rectanglelabels': ['car'], 'x': 0.015116376181443533, 'y': 0.07554244995117188, 'width': 0.013097770512104034, 'height': 0.021875575184822083}, 'score': 0.9607427}, {'from_name': 'label', 'to_name': 'image', 'type': 'rectanglelabels', 'value': {'rectanglelabels': ['car'], 'x': 0.06571339070796967, 'y': 0.010496671311557293, 'width': 0.005934014916419983, 'height': 0.00786984246224165}, 'score': 0.9527463}]}]}

but it shows message TypeError: Object of type float32 is not JSON serializable.
so, I return json.dumps(str(msg)) instead.
but,label-studio doest not show any bounding box.

that JSON-format is correct?

requests.exceptions.MissingSchema: Invalid URL '/data/upload/frame_0176.jpg': No schema supplied. Perhaps you meant http:///data/upload/frame_0176.jpg?

I am try out the mmdetection example, but failed to reproduce it. Here is the error from log label-studio-ml backends. follwing my steps:

label-studio-ml init coco-detector --from /label-studio-ml-backend/label_studio_ml/examples/mmdetection/mmdetection.py
nohup label-studio-ml start coco-detector --with config_file=/mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py checkpoint_file=/installer/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth score_threshold=0.5 device=cuda:0 > ml.log &
nohup label-studio start my_project --username test@localhost --password test --init --ml-backends http://localhost:9090 > fornt.log &

To be added, my data imported through the web, the detail web screenshot located here https://github.com/heartexlabs/label-studio-ml-backend/issues/8#issuecomment-818542737

env:

=> Database and media directory: /root/.local/share/label-studio
=> Static URL is set to: /static/


Label Studio version: 1.0.1

{
    "package": {
        "version": "1.0.1",
        "short_version": "1.0",
        "latest_version_from_pypi": "1.0.1",
        "latest_version_upload_time": "2021-04-05T19:13:41",
        "current_version_is_outdated": false
    },
    "backend": {
        "message": "Updates for relative URLs (#752)",
        "commit": "71278b0b727bc103f408fe3c279c5487f44fee56",
        "date": "2021-04-05 11:30:54 -0700",
        "branch": "master",
        "version": "1.0.0+87.g71278b0b"
    },
    "label-studio-frontend": {
        "message": "Fix Text spans if label's missing for any reason",
        "commit": "5164462ced2fe8a0bbdd7cd9c4a5bec3772577ab",
        "branch": "master",
        "date": "2021-03-31T10:57:00Z"
    },
    "dm2": {
        "message": "Annotation generation logic fix",
        "commit": "7751a996682f145d651123af27286f4f392c293c",
        "branch": "master",
        "date": "2021-04-02T13:15:22Z"
    }
}

Any suggesion would be appericiated!

Example tensorflow

Hi,
can I apply TensorFlow examples for multiple object detection rectangles for pre-annotation?

chinese Garbled

predict results in the label studio, chinese garbled

Attributes not initialized in Transformers NER example through docker-compose.

Hi, I am using the fix/transformers-bert branch and trying to run the ner backend through docker-compose. If it is as simple as docker-compose up still I am facing errors.
Here is my logs :

ImportError: No module named label_studio_ml.api

run cd label-studio-ml-backend
pip install -e .
cd label_studio_ml/examples
pip install -r requirements.txt
and label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier.py
following the instructions but received:

label-studio-ml start my_ml_backend/
Traceback (most recent call last):
File "./my_ml_backend/_wsgi.py", line 30, in
from label_studio_ml.api import init_app
ImportError: No module named label_studio_ml.api

Predict request is getting called twice

For the Custom-model integration with the label studio, the prediction is working fine.
The issue is the predict() method is getting called twice. So that two times the model is getting the inference for the prediction.
This happens when I load the file for the annotation. Kindly suggest some ideas to overcome two times predication to one time.

With the below logs, You can able to see the POST method for predict is getting called twice for the same file.

TypeError: argument of type 'ModelWrapper' is not iterable

Hello, when I am running backend API I receive the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/label_studio_ml/api.py", line 66, in _train
    job = _manager.train(annotations, project, label_config, **params)
  File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 646, in train
    cls.get_or_create(project, label_config, force_reload=True, train_output=train_output)
  File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 450, in get_or_create
    if not cls.has_active_model(project) or force_reload or (cls.get(project).model_version != version and version is not None):  # noqa
  File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 426, in has_active_model
    return cls._key(project) in cls._current_model
TypeError: argument of type 'ModelWrapper' is not iterable

I have checked the code and it looks like field _current_model sometimes is set as instance of ModelWrapper rather than dictionary:

https://github.com/heartexlabs/label-studio-ml-backend/blob/054854e0bae5b5d4c1d99c68fcf1830db13bb747/label_studio_ml/model.py#L479-L486

Environment

Project version: https://github.com/heartexlabs/label-studio-ml-backend/tree/054854e0bae5b5d4c1d99c68fcf1830db13bb747

ContextualVersionConflict MarkupSafe 2.0.0 versus moto

There seems to be a requeriments version conflict related to dependencies MarkupSafe and moto.
These are not direct dependencies but they make the system fail.

After following the readme instructions to install it

cd label-studio-ml-backend

# Install label-studio-ml and its dependencies
pip install -U -e .

# Install example dependencies
pip install -r label_studio_ml/examples/requirements.txt

it fails to run label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier.py

label-studio-ml init my_ml_backend   --script label_studio_ml/examples/simple_text_classifier.py
Traceback (most recent call last):
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (MarkupSafe 2.0.0 (/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages), Requirement.parse('MarkupSafe<2.0'), {'moto'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/workspace/label-studio-ml-backend/dev/bin/label-studio-ml", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3252, in <module>
    def _initialize_master_working_set():
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 585, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 598, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/root/workspace/label-studio-ml-backend/dev/lib/python3.8/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'MarkupSafe<2.0' distribution was not found and is required by moto

Error: sh 1: python not found

https://github.com/heartexlabs/label-studio-ml-backend/blob/4989ef5adf3673e3d41fac67964b90298ac07573/label_studio_ml/server.py#L112

I believe it should be:

os.system('python3 ' + wsgi + ' ' +' '.join(subprocess_params))

What are your thoughts?

404 error after label-studio-ml start my_ml_backend

run label-studio-ml start my_ml_backend successfully but reported 404 error:

Serving Flask app "label_studio_ml.api" (lazy loading)
Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Debug mode: off
[2021-04-06 12:16:04,800] [INFO] [werkzeug::_log::113] * Running on http://0.0.0.0:9090/ (Press CTRL+C to quit)
[2021-04-06 12:16:11,130] [INFO] [werkzeug::_log::113] 127.0.0.1 - - [06/Apr/2021 12:16:11] "GET / HTTP/1.1" 404 -
[2021-04-06 12:16:12,330] [INFO] [werkzeug::_log::113] 127.0.0.1 - - [06/Apr/2021 12:16:12] "GET / HTTP/1.1" 404 -
[2021-04-06 12:16:43,845] [INFO] [werkzeug::_log::113] 127.0.0.1 - - [06/Apr/2021 12:16:43] "GET / HTTP/1.1" 404 -

This repo is too complex

Hi,

I try to get started with ML Backend in Label Studio.
My personal appreciation is that the example provided by this repo is too complex :

The docker and docker-compose configurations should be at the root of the repo (Docker is a standard and should probably be the one for LabelStudio)
The repo should provide 1 or 2 simple examples (text and image classification for example)
There are several requirements.txt, there should be only one (you could add comments in the requirements.txt for each example's dependencies)

Currently it's very time consuming to get into ML Backend even if it's a critical feature for labelers.

I hope Heartex can make it more simple in the future (I guess you are very busy).

Thanks,
A really-liking-label-studio user

get_local_path provides path without the project_id [FileNotFoundError]

For the custom ML model, we used to get the uploaded file path using get_local_path on passing the file URL https://github.com/heartexlabs/label-studio-ml-backend/blob/94750ca7233b6f07ff7cb8986160ba9ea1b4ed91/label_studio_ml/model.py#L297

Current behaviour:

It provides the incorrect file path without the project ID.

The file is uploaded within C:\\Users\\uname1\\AppData\\Local\\label-studio\\label-studio\\media\\upload\\1\\3f3daefd-sampledata.wav.
After the upload folder we have subfolder 1 that represents the project id.

While getting file_path using get_local_path. It provides without the project id, C:\\Users\\uname1\\AppData\\Local\\label-studio\\label-studio\\media\\upload\\1\\3f3daefd-sampledata.wav

Tested Environment:

Windows 10

Screenshot:

Tutorial with pytorch is missing substantial parts

I want to use the label studio ml backend together with an image classification dataset and a pretrained pytorch model. There seems to be a blog / tutorial for it: https://labelstud.io/tutorials/pytorch-image-transfer-learning.html

However, this tutorial is missing substantial parts:

ImageClassifierDataset._get_image_from_url(self, url) is not implemented.
ImageClassifierAPI.__init__() uses the variable resources, which is never defined.
ImageClassifierDataset is never used.
ImageClassifierAPI.predict(self, tasks, **kwargs) and ImageClassifierAPI.fit(self, completions, **kwargs) are not implemented.
In general, I found it hard to find out anything about the types/contents of the input arguments: E.g. What is the content of tasks[0], or what is the required output format? This prevented my from implementing it myself.

Ideally, the tutorial would also come with an example dataset running it through. E.g. use the clothing dataset small which is very easily downloadable:
git clone https://github.com/alexeygrigorev/clothing-dataset-small.git

The 'ruamel.yaml>=0.15.34' distribution was not found and is required by drf-yasg

Hi, I was trying to set up the ML-backend as follow the QuickStart. When initializing using this code:

label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier.py

I have this problem

When I check the site-packages I have both 'ruamel.yaml.clib-0.2.2.dist-info' and 'ruamel_yaml' installed. May I know how to solve this? Thanks.

Link points to 404

Hi in Line 82, of the README, there is no link provided for ref for details, I would like to know when should we use parsed configs, my task is on integrating YOLO v5 model, should I use that?

Thanks for making this awesome repo!

Could you add question answering example code?

I would like to try like Bert question answering with label studio. Could you add example code for question answering?

Problem connecting my ml model to label-studio running in a container

Hello Everyone,

I am facing a problem trying a customed ml model my label-studio container running in docker.

I made sure to connect the ml container to the label-studio one by putting them in one single network.

docker network inspect labelstudio

[
    {
        "Name": "labelstudio",
        "Id": "2a0117a6cb8a41b5cf9970ddafa8d07cf388368881460bbc6de08510bd4a3bb4",
        "Created": "2021-12-22T07:01:21.16654434Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": ,
                    "Gateway": 
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "4fbc7aa90f60dbb7c98b45d5b110ae0dbc01f501460cebfde3b9ee936eca8d9e": {
                "Name": "label_studio",
                "EndpointID": "81f01b1046bbfdfe68316397bd9af753082469fff0da4bc12b171dc7cd63dbb4",
                "MacAddress": ,
                "IPv4Address": ,
                "IPv6Address": ""
            },
            "ad7579dee4bbe6a63eb119200e2f2be24e9cc9ced7ea9140cedf8695bc357e53": {
                "Name": "server",
                "EndpointID": "21994a2a70e463deaea48968551e94f3d24d10a0ffe9b31a0196515c7a6533a6",
                "MacAddress": ,
                "IPv4Address": ,
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

When executing the two containers using bash, they are able to communicates without issue. The problem happens when I try to put the url's reference into label-studio .
From Label studio to the ml-model :

curl --noproxy "*" http://server:8080
{"model_dir":"/data/models","status":"UP"}

How to run stateless custom backend api?

Hello, I am trying to run custom backend API in serverless way. So, it is by default stateless. Moreover, containers are scaled up on demand and scaled down to 0 when there is no workload

All works fine except from model versions. There are multiple problems:

Over the time there are a lot of newly created model versions in LabelStudio UI like on the following screenshot:

Ideally, I would like to have single version INITIAL

From time to time there are exceptions like OSError: Result file /data/models/1642448814/job_result.json specified by model_version doesn't exist

Is it possible to run stateless custom backend API?

Multiple copies of the OpenMP runtime have been linked into the program

Device: Mac book
Action: run label-studio-ml start my_ml_backend
Error Detail:
Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']

This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch: 0%| | 0/100 [00:00<?, ?it/sOMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized. | 0/1 [00:00<?, ?it/s]
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Add a License

Awesome project. Can you add a License for this project to the repository?

Re-run predictions

Hi, is there a way to re-run predictions with ML Backend for already-predicted tasks ?

Thanks

Displaying predictions scores for task if there is a prediction

If the current Model Version is not aligned with the source of prediction for the task, there would be no prediction score displayed in the prediction score column.

From this image you can see that there is already a prediction for the task with Model Name: 1646776505

However, because another model version is selected, the prediction scores are not showing.

Not sure if this is as intended. Just a suggestion so that active learning can be supported.

Bug / Potential error in the documentation - GET/POST /predict request

In the docs:
https://labelstud.io/guide/ml.html#Get-predictions-from-a-model

There is the paragraph:

If you want to retrieve predictions manually for a list of tasks using only an ML backend, make a GET request to the /predict URL of your ML backend with a payload of the tasks that you want to see predictions for, formatted like the following example: [...]

With a GET request I get the following Error:
requests.exceptions.HTTPError: 405 Client Error: METHOD NOT ALLOWED

But with a POST request it seams to work just fine.

Dataset created is empty thus the model does not predict in Pytorch transfer learning

Hi, I have been trying the run the Pytorch transfer learning model but the Completions variable is always empty, can someone help me sort this

https://github.com/heartexlabs/label-studio-ml-backend/blob/032b88e59c980e2b0022eab1a5f95a86f16628e1/label_studio_ml/examples/pytorch_transfer_learning/pytorch_transfer_learning.py#L179

Unable to start ml-backend docker

Hi, I am trying to run the ml-backend on docker but I am facing an error. After following the instructions I ran the following commands:

label-studio-ml init my_ml_backend3 --script label_studio_ml/examples/tensorflow/mobilenet_finetune.py:TFMobileNet

and then tried docker-compose up

But i got the following error:

Starting redis ... done
Starting server ... done
Attaching to redis, server
redis | 1:C 02 Sep 2021 13:36:41.376 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis | 1:C 02 Sep 2021 13:36:41.376 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1, just started
redis | 1:C 02 Sep 2021 13:36:41.376 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis | 1:M 02 Sep 2021 13:36:41.376 * monotonic clock: POSIX clock_gettime
redis | 1:M 02 Sep 2021 13:36:41.377 * Running mode=standalone, port=6379.
redis | 1:M 02 Sep 2021 13:36:41.377 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis | 1:M 02 Sep 2021 13:36:41.377 # Server initialized
redis | 1:M 02 Sep 2021 13:36:41.377 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis | 1:M 02 Sep 2021 13:36:41.377 * Loading RDB produced by version 6.2.5
redis | 1:M 02 Sep 2021 13:36:41.377 * RDB age 4 seconds
redis | 1:M 02 Sep 2021 13:36:41.377 * RDB memory usage when created 0.77 Mb
redis | 1:M 02 Sep 2021 13:36:41.377 * DB loaded from disk: 0.000 seconds
redis | 1:M 02 Sep 2021 13:36:41.377 * Ready to accept connections
server | 2021-09-02 13:36:42,704 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
server | 2021-09-02 13:36:42,708 INFO RPC interface 'supervisor' initialized
server | 2021-09-02 13:36:42,708 CRIT Server 'inet_http_server' running without any HTTP authentication checking
server | 2021-09-02 13:36:42,708 INFO supervisord started with pid 1
server | 2021-09-02 13:36:43,711 INFO spawned: 'rq_00' with pid 9
server | 2021-09-02 13:36:43,712 INFO spawned: 'wsgi' with pid 10
server | 2021-09-02 13:36:43,788 INFO exited: rq_00 (exit status 1; not expected)
server | 2021-09-02 13:36:44,791 INFO spawned: 'rq_00' with pid 25
server | 2021-09-02 13:36:44,791 INFO success: wsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
server | 2021-09-02 13:36:44,864 INFO exited: rq_00 (exit status 1; not expected)
server | 2021-09-02 13:36:46,867 INFO spawned: 'rq_00' with pid 26
server | 2021-09-02 13:36:46,974 INFO exited: rq_00 (exit status 1; not expected)
server | 2021-09-02 13:36:49,979 INFO spawned: 'rq_00' with pid 27
server | 2021-09-02 13:36:50,059 INFO exited: rq_00 (exit status 1; not expected)
server | 2021-09-02 13:36:51,060 INFO gave up: rq_00 entered FATAL state, too many start retries too quickly

and

Traceback (most recent call last):
File "/usr/local/bin/rq", line 5, in
from rq.cli import main
File "/usr/local/lib/python3.7/site-packages/rq/cli/init.py", line 2, in
from .cli import main
File "/usr/local/lib/python3.7/site-packages/rq/cli/cli.py", line 93, in
@pass_cli_config
File "/usr/local/lib/python3.7/site-packages/rq/cli/cli.py", line 72, in pass_cli_config
func = option(func)
File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 247, in decorator
_param_memo(f, OptionClass(param_decls, **option_attrs))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 2482, in init
super().init(param_decls, type=type, multiple=multiple, **attrs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 2110, in init
) from None
ValueError: 'default' must be a list when 'multiple' is true.

Thank you in advance.

Error getting Simple Text Classifier model up

on running label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier.py

I get

Can't import module "simple_text_classifier", reason: No module named 'core'. If you are looking for examples, you can find a dummy model.py here: https://labelstud.io/tutorials/dummy_model.html

and on running label-studio-ml start my_ml_backend

I get

python: can't open file './my_ml_backend/_wsgi.py': [Errno 2] No such file or directory

Is this an expected behaviour?

Multi language ml backend

Hello,

First of all, thanks for the product.
I'd like to host a text ml backend that will be registered by multiple labelstudio projects that use different languages (english, german, french ...)
As of today, it does not seem to be feasible because the instance of LabelStudioMLBase is done here:
https://github.com/heartexlabs/label-studio-ml-backend/blob/8c2ebaf9543ce569027de0bc1574d2cbe0cd20eb/label_studio_ml/model.py#L67
So no extra arg is fetched from the labelstudio server (which makes sense)
And the train/fit method is called right after
https://github.com/heartexlabs/label-studio-ml-backend/blob/8c2ebaf9543ce569027de0bc1574d2cbe0cd20eb/label_studio_ml/model.py#L69
additional_params sounded like a good candidate, but get_additional_params returns an empty dict
https://github.com/heartexlabs/label-studio-ml-backend/blob/8c2ebaf9543ce569027de0bc1574d2cbe0cd20eb/label_studio_ml/model.py#L75
Would there be a workaround I did not see, or is it something planned in the future?

Kind regards

fit() call with no tasks

Hi Everybody,

it seems like my ml backend is receiving the fit signals but no tasks are associated (the first argument is an empty tuple)
The culprit seems to be this line that I can't make sense of.
When a train event is received the object calls the fit method... with an empty tuple as argument, and that's why I receive an empty tuple.

Where are the tasks?
What am I missing?

Thank you in advance

raise KeyError(key) from None KeyError: 'config_file'

Hello~
l try to run the object-detector example,and l follow the instruction of
https://github.com/heartexlabs/label-studio/blob/master/docs/source/tutorials/object-detector.md
and download the checkpoint_file from MMDetection model zoo.
l use the order to start the service
label-studio-ml init coco-detector --from label_studio_ml/examples/mmdetection/mmdetection.py
label-studio-ml start coco-detector --with config_file=./mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py checkpoint_file=/home/tcexeexe/dataDisk2/abel-studio-ml-backend/mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

My label setting is as follow

but there is still an error occourred when l try to connect the label-studio-ml-backend

the error log from the backend is as follows:

[2022-04-17 00:29:16,909] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/label_studio_ml/api.py", line 50, in _setup
    model = _manager.fetch(project, schema, force_reload, hostname=hostname, access_token=access_token)
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/label_studio_ml/model.py", line 502, in fetch
    model = cls.model_class(label_config=label_config, **kwargs)
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/coco-detector/mmdetection.py", line 41, in __init__
    config_file = config_file or os.environ['config_file']
  File "/home/tcexeexe/anaconda3/envs/label_studio/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'config_file'

Traceback (most recent call last):
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/label_studio_ml/exceptions.py", line 39, in exception_f
    return f(*args, **kwargs)
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/label_studio_ml/api.py", line 50, in _setup
    model = _manager.fetch(project, schema, force_reload, hostname=hostname, access_token=access_token)
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/label_studio_ml/model.py", line 502, in fetch
    model = cls.model_class(label_config=label_config, **kwargs)
  File "/home/tcexeexe/dataDisk2/label-studio-ml-backend/coco-detector/mmdetection.py", line 41, in __init__
    config_file = config_file or os.environ['config_file']
  File "/home/tcexeexe/anaconda3/envs/label_studio/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'config_file'

[2022-04-17 00:29:16,909] [INFO] [werkzeug::_log::225] 10.168.1.217 - - [17/Apr/2022 00:29:16] "POST /setup HTTP/1.1" 500 -

where the problem might be?

FR: Add dockerfile

I guess this could get a bit complicated, since each example would require addition commands, but a dockerfile would be very helpful to get going with the quickstarts. From quick look, this should base off https://github.com/heartexlabs/label-studio/blob/master/Dockerfile quite nicely

Update the correct file path in the README page

For initializing the simple text classifier example for the ML backend.
The location path for simple_text_classifier.py is incorrect.

https://github.com/heartexlabs/label-studio-ml-backend/blob/a952a186622f0a539cc59e950e7bcea3fccab4c2/README.md?plain=1#L44

Run directory undefined specified by model_version doesn't exist

I am trying to run custom model without using docker compose.
This is code that I am using to launch the server. But I get an error model_version doesn't exist.

app = init_app(
    model_class=MyModel,
    model_dir=os.environ.get("MODEL_DIR", os.path.dirname(__file__)),
    redis_queue=os.environ.get("RQ_QUEUE_NAME", "default"),
    redis_host=os.environ.get("REDIS_HOST", "localhost"),
    redis_port=os.environ.get("REDIS_PORT", 6379),
)
if __name__ == "__main__":
    app.run(host="localhost", port=9090)

Run directory undefined specified by model_version doesn't exist
Traceback (most recent call last):
  File "/media/idk/idk1/label-studio-ml-backend/label_studio_ml/model.py", line 54, in get_result
    job_result = self.get_result_from_job_id(model_version)
  File "/media/idk/idk1/label-studio-ml-backend/label_studio_ml/model.py", line 107, in get_result_from_job_id
    result = self._get_result_from_job_id(job_id)
  File "/media/idk/idk1/label-studio-ml-backend/label_studio_ml/model.py", line 183, in _get_result_from_job_id
    raise IOError(f'Run directory {job_dir} specified by model_version doesn\'t exist')
OSError: Run directory undefined specified by model_version doesn't exist