Code Monkey home page Code Monkey logo

olive-distributed's Introduction

olive-distributed

Dependencies

Software:

bind-tools python-pyro

Patched olive-editor to support partial export (until it's mainlined):

git clone https://github.com/morrolinux/olive.git morrolinux-olive && cd morrolinux-olive && git checkout 0.1.x

cmake . && make -j8 && sudo make install

Setup

On your "master" node, move to the ssl certs folder: cd olive-distributed/ssl/certs

  1. Generate SSL & CA certs for the master node ./generate_keys.sh setup
  2. For each worker node you wish to add: ./generate_keys.sh add <hostname>

During this setup, SSH service must be running on the worker node in order to copy SSL keys.

Usage

Move to olive-distributed folder: cd olive-distributed

On your worker nodes

  1. Start the NFS mounter service: sudo python nfs_mounter.py
  2. Start the worker service: ./run_worker_node.py

On the master node

  1. Start the NFS exporter service: sudo python nfs_exporter.py
  2. Submit a job anytime: main.py --project /path/to/project.ove (to export a single project in a distributed way) or main.py --folder /path/to/folder/containing/projects/ to enqueue multiple projects to be exported in parallel on multiple workers.

Note1: Your master node can also be a worker node, just do the steps for worker nodes as well.

Note2: Once set-up, worker nodes can come and go during a workload, fault tolerance should cope with changes at runtime.

Logic overview

A master node is used to dispatch work amongst worker nodes. The master node has a "NFS Exporter" service running as root and a "Job dispatcher" process running as user. Each worker node has a "NFS Mounter" service running as root and a "worker" process running as user. In both cases the user process communicates with the root process via Pyro (RMI) using a 2-Way SSL connection. The same approach is used for communication between workers and master: Architecture

When a job gets assigned to a worker, it is moved to the "ongoing" queue until a worker reports back on the exit status of the export:

  • If it failed, it's moved to the "failed" queue
  • If it succeeded, it's moved to the "completed" queue

When the main job queue becomes empty, "failed" jobs are assigned to free workers (if any).

When the "failed" queue becomes empty as well, free workers are assigned "ongoing" jobs as they might belong to crashed/unreachable workers who couldn't report back. The first worker to finish an ongoing job gets to push it to the "completed" queue, the others get discarded: States

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.