noorts / dlsa Goto Github PK
View Code? Open in Web Editor NEW🧬 Distributing Local Sequence Alignment using Volunteer Computing
License: Apache License 2.0
🧬 Distributing Local Sequence Alignment using Volunteer Computing
License: Apache License 2.0
Are we assuming the worker can recieve multiple queries or only one query and multiple targets?
The communication between the worker nodes and the tracker is the most critical, I think. Here is a thread to discuss the development and design decisions for the interface.
The component that serves a REST API. This is the component that clients (e.g., a user on a laptop) interact with from the outside. It allows a client to 1) submit a sequence alignment job request, and 2) to poll for the results.
The provided job request will be parsed and converted into our internal format.
Metric granularity can be down to the following 4 metrics inside the worker node:
So computation consists of: | BUILD MATRIX | BACKTRACE |
| 1GCUPS,2TIME | 3TIME |
| 4COMBINED TIME |
Benchmarking has two purposes:
class WorkPackage(BaseModel):
# work package id
id: str
targets: Dict[SequenceId, Sequence]
queries: Dict[SequenceId, Sequence]
sequences: List[TargetQueryCombination]
Is the point of the sequences field only to specify which of the sequences sent to the worker it should compute?
We have a quite a few entities that have to communicate. Let this function as a tracking issue for general discussion over interfaces.
Overview of interface issues:
Adjust the client (TUI), master, and worker node such that the following configuration options are passed from the client all the way to the worker node.
Configuration options:
Defaults for these could be as stated in the competition document: Match = +2, Mismatch = -1, Gap=1
. It might make sense to define these defaults only in the client (the relevant functions in the master and worker will parameterize these options). We might want to group them into a configuration object (for maintainability’s sake).
Requirements
To Do:
About using SIMD in the diagonals. SIMD can be used as soon as the matrix is LANES + 1 wide (one because of the leftmost zero column). This should speed up all cases that have a fairly high query length compared to target length.
This is just a rough draft (hopefully enough for the design to show Tiziano) feel free to comment and obviously there need to be more specifications:
The master node is responsible for receiving jobs from the job scheduler and assigning them to workers. The master keeps a list of workers and whether they are available and which jobs they have and how long they have taken. The master node also keeps track of the health of the workers and provides recovery in case of failures. Once the master has received a confirmation that there are no more jobs to be submitted and all jobs have been processed, the node sends the results to a AWS bucket
Jobs: Queue like data structure
Results: Data structure to store the results
Workers: List of workers available,
Methods:
ReceiveTask()
receive task from scheduler and add it to the task queue
RegisterWorker(workerId: String or int, metadata: Custom class)
Allows worker to register with the master with the id
DelegateTask(workerId: String)
Delegates a task to a worker with it workerID
ReportStatus(wokerId, status: Custom class)
Receives and processes status updates from worker a node(completion, error etc.)
SubmitResult()
Submits a result to the bucket or database
A tracking issue about all problems related to the Rust implementation
To Do
I created a proposed project structure branch which also includes a grpc config. Be free to just leave a comment or change stuff around.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.