Light

yanivp / genobase Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 34 KB

Genome parser

License: MIT License

Python 99.61% Dockerfile 0.39%

genobase's Introduction

Genobase - A DNA file parser service (WIP)

These instructions assume you have (and know how to work) python3.6, Docker

Technologies

Docker containers
Python 3.6
Django
Postgres
S3 buckets/objects
RabbitMQ
Celery workers

General architecture (and process)

Client POST a GeneParser object to API Server.
1. API Server contacts S3 and retrieves a presigned URL for upload.
2. API Server stores it in the GeneParser object and returns to client.
Client PUT a file in the S3 presigned URL.
Client POST to API Server that the file was uploaded.
API Server creates AsyncJob and puts a task in the RabbitMQ.
Worker pulls job from RabbitMQ and begins processing.
1. Worker downloads file in chunks from S3 and processes each chunk separately.
2. Once the Worker finds a DNA sequence it stores it in Postgres.
3. When file is complete the worker builds a JSON from the stored DNA sequences and sends it to S3.
Client polls AsyncJob progress until completed.
Client GET GeneParser
1. API Server contacts S3 and retrieves a presigned URL for download.
Client downloads file using presigned URL.

How to run the project

Create terminal and navigate to project folder.
docker network create genobase_default
docker-compose -f data.yml up
1. This will bring up Postgres, Minio and RabbitMQ
Wait for the database to initialize...
Create terminal and navigate to project folder.
docker-compose -f processors.yml up
1. This will bring up the API server, Celery workers and a schema migrations server.
Create terminal and navigate to project folder.
1. pip install requests: Needed for the test script
2. Run the test script: python tests/test_full_flow.py
3. Navigate to ./downloads/ and see the results.

Key places to look at:

./gene_parser/tasks.py: The processor background job.
./gene_parser/parsers.py: The file text parsers.
./downloads/: Where the resulted parsers are.
.tests/test_full_flow.py: The test script.

Closing notes

Since objects are stored in the DB it's simple to create an API to retrieve them in different manners.
Code assumes genes are not repeating in the files.
Improvement idea: Shard file to multiple workers.
Improvement idea: Use S3 bucket notifications instead of the file_uploaded endpoint.
Improvement idea: Store file in S3 in parts instead of whole.
I'm using python manage.py runserver as the server for demo simplicity.
No Authentication layer or any security for simplicity.
If you want to reset everything just delete the ./data folder.

genobase's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.