Light

cvlab-ai / splinter Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 1.0 133.09 MB

License: MIT License

Shell 0.49% Dockerfile 0.48% Python 14.42% Jupyter Notebook 17.02% PHP 5.01% CSS 0.04% PureBasic 62.55%

splinter's Introduction

Splinter

Run dockerized environment

docker-compose up -d

To rebuild docker images after updating sources run

docker-compose up -d --build

Access to web UI

After starting the system, the web application is available under: http://localhost:8888/splinter

test user:

email: [email protected]
password: 1

Testing

Run integration tests by:

pytest -v -s tests/

Deploy

To prepare the project for deploying run the deploy_setup.sh script:

./deploy_setup.sh

The script will:

generate random passwords for docker services & replace them in .env file
generate docker-compose file compatible with docker swarm e.g include the content of .env file to each service depending on it.

NOTE: Remember to save generated passwords!

Known Issues

Issue: exec /docker-entrypoint.sh: no such file or directory after running the Docker containers

For detailed troubleshooting steps and solution, refer to this Stack Overflow post.
Unresolved References in IDE:

If you encounter unresolved references in your IDE (like PyCharm), follow these steps to resolve the issue:
1. Open PyCharm and navigate to File -> Settings -> Project:Splinter -> Project Structure.
2. Locate the inference_engine module within your project structure.
3. Mark inference_engine as Sources root
This action informs PyCharm that the inference_engine module should be recognized as a source directory, resolving unresolved references and enabling proper module imports within your project.

splinter's People

Contributors

Watchers

Forkers

leckijakub

splinter's Issues

Create ML engine container

Create a base container for a service running md model inference on requested exams file
The service should:

Publish an API allowing new exams to be checked
Has access to exam storage
Accept new exams

Data Generator - kickoff

Brainstorm about:

Generator architecture
Base generation algorithm
Initial implementation for further incremental development

Repairing an existing model

Adapting the program to real scans | Visualization of model predictions

Planning a team meeting with the presentation of architecture

Preparation of the model learning pipline diagram

Inside the research module is the mark_generator, where the model is created. So get familiar with the 00_training.ipynb notebook and prepare the learning pipeline of the existing training path.

The README for research module can be created, and we can put the above diagram there.

WebDAV Server

Main idea: Introduction of WebDAV technology to be able to automatically upload scanned sheets from the printer to the cloud.

#95
Preparation of PoC application

Enhancing Performance of the Current Model

This task involves improving the performance of our existing model, which may be struggling due to data quality issues. We need to update the mark_generator.py in the mark_generator module and implement a new data generation template. Additionally, we will integrate a new image into the data generation process to enhance the dataset and potentially boost model performance.

Set pipeline for training using generator

Setting training pipeline
Generated data augmentation
Preparing the generator for creating training data with metadata
Train model

Design base system architecture

As we are starting to develop main services which for sure will be placed in the end system we need to agree on the overall system architecture.
Propose the architecture containing:

System services
- Name
- Short description
- Planned software stack
Visibility of services (if visible from the external network)
Communication flow between them
Use-Case diagrams for most important actions.

DoD:

architecture in form of diagram and documentation proposed
architecture approved by all team members

Disable the buttons if we do not have any exams loaded.

As in the title

INF_ENGINE: Detection and correction of scan rotation

Using black control rectangle on answer sheet detect page position and correct it if its rotated

Getting to know the Markdown

WEBAPP: uploading answer sheets in pdf format

Create easier to analyse exam template

Provide SVG graphic with exam template similar to one provided by supervisor.
Goals:

Keep field structure, order and position
Avoid "soft" edges for bounding boxes and exam checkboxes
Be readable and clear for students
Contain "removable" letters (constant grayscale)

WEBAPP: Exam results export to csv

Checked exam should has option to export it to csv.
One row should correspond to one student.
The format should be allow importing csv into eNauczanie platform

Creating the required documentation for research project

Checking the required documents from the research project
#106

@Krystian030 @S-cichocka

Getting to know the generator in Latex

Latex Generator: https://github.com/cvlab-ai/splinter/tree/qr/qr
After consultation, this task is not required to write new exam generator. If you don't need it, you can close the issue as won't do.

Create exam storage container

Imported exams should be stored in a dedicated docker container which would be accessible only inside the internal docker network.

WEBAPP: Exam structures

Each user should be able to create an exam. An exam is a container that groups answer sheets from one exam term.
Each owner of an exam should be able to edit or delete it.

Adjust the inference engine for the new template

Prepare field extraction from the new template
Classify the extracted fields
Prepare fields for inference

Analysing the existing system architecture

#97
#90
#91 (OPTIONAL)

@Krystian030 @PatrykGolembiewski @S-cichocka

Create POC (jupyter notebook) for CV and ML

Text rotation: https://developpaper.com/python-opencv-implementation-of-rotating-text-correction/
Text recognition: https://pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/

Prepare project configuration and environment for deployment

INF_ENGINE: Save exam results in a csv file

Preparation of documentation on existing architecture

The task is complex. Should be done after #97
The aim is to further analyse of the project and prepare documentation for the API and existing methods.

Create database

INF_ENGINE: Recognize answer sheet

PDF can contain pages which in fact are not student's answer sheets. We should recognize and delete such pages.

Fix inference engine for real exam case

Prepare training dataset

Prepare training dataset as randomly marked boxes with labels.

Web app: resolve TODO comments

This issue is created from #32 (review)
The objective is to track all mocks and not implemented features in web application.
This issue should be split into separate features issues

INF_ENGINE: Save results in form usable for backend service

Proposed exam_storage structure

└── /splinter/<exam_id>  # exam_path
       └── metadata.json - pdfs done
       └── pdfs
       └── students
       └── <index_nr>
 	      └── answers.json
 	      └── answers.jpg
 	      └── recognized_fields.jpg*
 └── results.csv
 └── answers_key_<group*>.jpg
 └── answers_key_<group*>.json

└── /uploads

Create web application service

Create answers parser for training

Answers generated for students to fill exams must be parsed to a format useful for model training.

Research on WebDav

Data collection for number detection

The issue is collecting data to detect numbers.

Hypotheses:

How often do they fill in under the US system versus the European system?
How often are mistakes made when transferring answers to the exam?

Future works:

#98

Prepare test dataset

WEBAPP: user authentication

User registration
- does not include sending verification mails
- System should permit registration using eti.pg.edu.pl or pg.edu.pl domains
User can log in

Create inference engine app

Create an Flask app responsible for handling model inference. The application API will include the following endpoints:

check-exams (Create a files with detected answers, based on the provided path to the exams)
check-exam (Create a file with detected answers, based on the provided path to the exams and file name)
generate-exam-key (Create a file with detected answers as a key to exam, based on provided path to exams and answers template file)

Write the exam generator in Markdown

The task includes generating the empty session template.

EXAM_STORAGE: Create exam storages.

Create two exam directories:

/splinter - for storing processed exams
/uploads - for uploading pdfs from scanner using webdav - the access to it should be restricted by credentials

Create generator for marked answer card

Store credentials to database in a secure way

As for now, sample passwords are stored in configuration files which is not a secure approach. Before the system is prepared to release, the password handling must be reorganized in some more secure way.

Design and deploy training pipeline

To increase the accuracy of our model to the expected threshold (99.9%) we need to design a training pipeline:

Configure data generation using our synthetic data generator
Use real data as a validation resource

Rewriting the exam generator from latex to markdown

The task involves rewriting the exam generator from Latex to Markdown

This QR code generator is in Latex which causes it to take a long time to execute and the proposal is to make it in Markdown by adding your own code to the ready-made library which would automatically generate the fields for index, QR code and randomize the order of questions and answers.

Score and save exams with incorrect index

Create data for annotators

Generate images and required annotations (answers)
Create google drive
Write annotation instructions
Publish and collect the data
Deadline: 18.04.2022

Check current status of Angular frontend application

WEBAPP: Search webdav uploaded files and assign it to given exam.

Files uploaded from scanner using webdav protocol will be located in /uploads/ directory on exam_storage service.
When files from /uploads are assigned to exam it should be moved from /uploads to correct exam path.

Create tool that splits PDFs based on UUID

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.