Code Monkey home page Code Monkey logo

splinter's Introduction

Splinter

Run dockerized environment

docker-compose up -d

To rebuild docker images after updating sources run

docker-compose up -d --build

Access to web UI

After starting the system, the web application is available under: http://localhost:8888/splinter

test user:

email: [email protected]
password: 1

Testing

Run integration tests by:

pytest -v -s tests/

Deploy

To prepare the project for deploying run the deploy_setup.sh script:

./deploy_setup.sh

The script will:

  • generate random passwords for docker services & replace them in .env file
  • generate docker-compose file compatible with docker swarm e.g include the content of .env file to each service depending on it.

NOTE: Remember to save generated passwords!

Known Issues

  • Issue: exec /docker-entrypoint.sh: no such file or directory after running the Docker containers

    For detailed troubleshooting steps and solution, refer to this Stack Overflow post.

  • Unresolved References in IDE:

    If you encounter unresolved references in your IDE (like PyCharm), follow these steps to resolve the issue:

    1. Open PyCharm and navigate to File -> Settings -> Project:Splinter -> Project Structure.
    2. Locate the inference_engine module within your project structure.
    3. Mark inference_engine as Sources root

    This action informs PyCharm that the inference_engine module should be recognized as a source directory, resolving unresolved references and enabling proper module imports within your project.

splinter's People

Contributors

debskij avatar krystian030 avatar leckijakub avatar marekhering avatar mateuszkepczynski avatar

Watchers

 avatar  avatar

Forkers

leckijakub

splinter's Issues

Create ML engine container

Create a base container for a service running md model inference on requested exams file
The service should:

  • Publish an API allowing new exams to be checked
  • Has access to exam storage
  • Accept new exams

Data Generator - kickoff

Brainstorm about:

  • Generator architecture
  • Base generation algorithm
  • Initial implementation for further incremental development

Preparation of the model learning pipline diagram

Inside the research module is the mark_generator, where the model is created. So get familiar with the 00_training.ipynb notebook and prepare the learning pipeline of the existing training path.

The README for research module can be created, and we can put the above diagram there.

WebDAV Server

Main idea: Introduction of WebDAV technology to be able to automatically upload scanned sheets from the printer to the cloud.

  • #95
  • Preparation of PoC application

Enhancing Performance of the Current Model

This task involves improving the performance of our existing model, which may be struggling due to data quality issues. We need to update the mark_generator.py in the mark_generator module and implement a new data generation template. Additionally, we will integrate a new image into the data generation process to enhance the dataset and potentially boost model performance.

exam-template

Design base system architecture

As we are starting to develop main services which for sure will be placed in the end system we need to agree on the overall system architecture.
Propose the architecture containing:

  • System services
    • Name
    • Short description
    • Planned software stack
  • Visibility of services (if visible from the external network)
  • Communication flow between them
  • Use-Case diagrams for most important actions.

DoD:

  • architecture in form of diagram and documentation proposed
  • architecture approved by all team members

Create easier to analyse exam template

Provide SVG graphic with exam template similar to one provided by supervisor.
Goals:

  • Keep field structure, order and position
  • Avoid "soft" edges for bounding boxes and exam checkboxes
  • Be readable and clear for students
  • Contain "removable" letters (constant grayscale)

WEBAPP: Exam results export to csv

Checked exam should has option to export it to csv.
One row should correspond to one student.
The format should be allow importing csv into eNauczanie platform

Create exam storage container

Imported exams should be stored in a dedicated docker container which would be accessible only inside the internal docker network.

WEBAPP: Exam structures

  • Each user should be able to create an exam. An exam is a container that groups answer sheets from one exam term.
  • Each owner of an exam should be able to edit or delete it.

Web app: resolve TODO comments

This issue is created from #32 (review)
The objective is to track all mocks and not implemented features in web application.
This issue should be split into separate features issues

INF_ENGINE: Save results in form usable for backend service

Proposed exam_storage structure

└── /splinter/<exam_id>  # exam_path
       └── metadata.json - pdfs done
       └── pdfs
       └── students
       └── <index_nr>
 	      └── answers.json
 	      └── answers.jpg
 	      └── recognized_fields.jpg*
 └── results.csv
 └── answers_key_<group*>.jpg
 └── answers_key_<group*>.json

└── /uploads

Data collection for number detection

The issue is collecting data to detect numbers.

Hypotheses:

  • How often do they fill in under the US system versus the European system?
  • How often are mistakes made when transferring answers to the exam?

Future works:

WEBAPP: user authentication

  • User registration
    • does not include sending verification mails
    • System should permit registration using eti.pg.edu.pl or pg.edu.pl domains
  • User can log in

Create inference engine app

Create an Flask app responsible for handling model inference. The application API will include the following endpoints:

  • check-exams (Create a files with detected answers, based on the provided path to the exams)
  • check-exam (Create a file with detected answers, based on the provided path to the exams and file name)
  • generate-exam-key (Create a file with detected answers as a key to exam, based on provided path to exams and answers template file)

EXAM_STORAGE: Create exam storages.

Create two exam directories:

  • /splinter - for storing processed exams
  • /uploads - for uploading pdfs from scanner using webdav - the access to it should be restricted by credentials

Store credentials to database in a secure way

As for now, sample passwords are stored in configuration files which is not a secure approach. Before the system is prepared to release, the password handling must be reorganized in some more secure way.

Design and deploy training pipeline

To increase the accuracy of our model to the expected threshold (99.9%) we need to design a training pipeline:

  • Configure data generation using our synthetic data generator
  • Use real data as a validation resource

Rewriting the exam generator from latex to markdown

The task involves rewriting the exam generator from Latex to Markdown

This QR code generator is in Latex which causes it to take a long time to execute and the proposal is to make it in Markdown by adding your own code to the ready-made library which would automatically generate the fields for index, QR code and randomize the order of questions and answers.

Create data for annotators

  • Generate images and required annotations (answers)
  • Create google drive
  • Write annotation instructions
  • Publish and collect the data
  • Deadline: 18.04.2022

Create tool that splits PDFs based on UUID

  • Read PDFs
  • Detect UUID
  • Recognize UUID
  • Match partial UUID to full UUID
  • Works with A5 format
  • Save splitted exams to different files
  • Prepare dataset
  • Match missing page ID
  • Sort exams if not in right order
  • Fix issues with different detection due to different scale of image crop

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.