Code Monkey home page Code Monkey logo

dlsa's People

Contributors

danielvoogsgerd avatar enricozeilmaker avatar haraldurbjarni avatar niclashaderer avatar noorts avatar

Stargazers

 avatar

Watchers

 avatar  avatar

dlsa's Issues

Master

  • Add parameters #12
  • Add logging (able to view what is happening)
  • Add new scheduler
    • Simple work split (for one-to-many and for many-to-many jobs)
    • Fancy heuristics once we have the benchmarking numbers sent by the worker

Worker - Tracker interface

The communication between the worker nodes and the tracker is the most critical, I think. Here is a thread to discuss the development and design decisions for the interface.

External REST API

The component that serves a REST API. This is the component that clients (e.g., a user on a laptop) interact with from the outside. It allows a client to 1) submit a sequence alignment job request, and 2) to poll for the results.

The provided job request will be parsed and converted into our internal format.

Worker - Metric computation and Benchmarking

  • Metric computation
    • Time
    • (Opt) CUPS
  • Benchmarking (for performance insight and experiments)

Metric granularity can be down to the following 4 metrics inside the worker node:

So computation consists of: | BUILD MATRIX | BACKTRACE |
                            | 1GCUPS,2TIME |   3TIME   |
                            |       4COMBINED TIME     |

Benchmarking has two purposes:

  1. performance benchmarking for the experiments and testing in general (allows us to see which code improvements deliver practical results)
  2. compute capacity estimation, for the “intelligent” scheduler.

Roadmap

  • Architecture.md that can be converted into something for the report (e.g., rust analyzer)
  • #2
  • Worker node
    • Algo implementation (Daniel)
    • Node registrar interface (Daniel)
  • Registrar + Register Interface for Node (Halli & Enrico)
  • Job Scheduler (Niclas & Paul)
    • Job Data Format
    • Job interface for internal format
    • Node interface for Job assignment?
  • #4
    • Support File Types to Internal Format
  • Persistence Layer setup

Sequences to compute

class WorkPackage(BaseModel):
    # work package id
    id: str
    targets: Dict[SequenceId, Sequence]
    queries: Dict[SequenceId, Sequence]

    sequences: List[TargetQueryCombination]

Is the point of the sequences field only to specify which of the sequences sent to the worker it should compute?

Tracking Issue: Interfaces

We have a quite a few entities that have to communicate. Let this function as a tracking issue for general discussion over interfaces.

Overview of interface issues:

Configuration option passing

Adjust the client (TUI), master, and worker node such that the following configuration options are passed from the client all the way to the worker node.

Configuration options:

  • match score
  • mismatch score
  • gap (usually split into extension penalty δ and gap open penalty Δ, but fine to keep them the same in this PR)

Defaults for these could be as stated in the competition document: Match = +2, Mismatch = -1, Gap=1. It might make sense to define these defaults only in the client (the relevant functions in the master and worker will parameterize these options). We might want to group them into a configuration object (for maintainability’s sake).

Requirements

  • The client (TUI)
    • allows the configuration options to be passed as arguments
    • falls back to defaults in the case that some options are not specified
  • The master
    • passes the configuration options to the worker, taking into account potential job to work splitting
  • The worker node
    • parses and uses the configuration options inside the Smith-Waterman algorithm
    • its tests have been updated to use a default set of configuration options

Write SIMD algorithm

To Do:

  • Don't overcompute diagonal parts of the Matrix
  • Use SIMD in the diagonal parts

About using SIMD in the diagonals. SIMD can be used as soon as the matrix is LANES + 1 wide (one because of the leftmost zero column). This should speed up all cases that have a fairly high query length compared to target length.

Interface - master

This is just a rough draft (hopefully enough for the design to show Tiziano) feel free to comment and obviously there need to be more specifications:

The master node is responsible for receiving jobs from the job scheduler and assigning them to workers. The master keeps a list of workers and whether they are available and which jobs they have and how long they have taken. The master node also keeps track of the health of the workers and provides recovery in case of failures. Once the master has received a confirmation that there are no more jobs to be submitted and all jobs have been processed, the node sends the results to a AWS bucket

Jobs: Queue like data structure
Results: Data structure to store the results
Workers: List of workers available,

Methods:

ReceiveTask()
receive task from scheduler and add it to the task queue

RegisterWorker(workerId: String or int, metadata: Custom class)
Allows worker to register with the master with the id

DelegateTask(workerId: String)
Delegates a task to a worker with it workerID

ReportStatus(wokerId, status: Custom class)
Receives and processes status updates from worker a node(completion, error etc.)

SubmitResult()
Submits a result to the bucket or database

Tracking: Rust

A tracking issue about all problems related to the Rust implementation

To Do

  • Convert unit tests to rust
  • Setup Benchmarking using Criterion
  • Create complete FFI-bindings for Go.
  • #27
  • Write a Low memory variant of the algorithm
  • Use rust version in benchmarker @haraldurbjarni
  • Make vec allocate fallible
  • Catch all panics at FFI @haraldurbjarni

Project structure

I created a proposed project structure branch which also includes a grpc config. Be free to just leave a comment or change stuff around.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.