Code Monkey home page Code Monkey logo

woken's Introduction

CHUV License Codacy Badge Codacy Badge Dependencies Build Status CircleCI

Woken: Workflow for Analytics

An orchestration platform for Docker containers running data mining algorithms.

This project exposes a web interface to execute on demand data mining algorithms defined in Docker containers and implemented using any tool or language (R, Python, Java and more are supported).

It relies on a runtime environment containing Mesos and Chronos to control and execute the Docker containers over a cluster.

Usage

 docker run --rm --env [list of environment variables] --link woken hbpmip/woken:3.0.2

where the environment variables are:

  • CLUSTER_IP: Name of this server advertised in the Akka cluster
  • CLUSTER_PORT: Port of this server advertised in the Akka cluster
  • CLUSTER_NAME: Name of Woken cluster, default to 'woken'
  • WOKEN_PORT_8088_TCP_ADDR: Address of Woken master server
  • WOKEN_PORT_8088_TCP_PORT: Port of Woken master server, default to 8088
  • DOCKER_BRIDGE_NETWORK: Name of the Docker bridge network. Default to 'bridge'
  • NETWORK_INTERFACE: IP address for listening to incoming HTTP connections. Default to '0.0.0.0'
  • WEB_SERVICES_PORT: Port for the HTTP server in Docker container. Default to 8087
  • WEB_SERVICES_SECURE: If yes, HTTPS with a custom certificate will be used. Default to no.
  • WEB_SERVICES_USER: Name used to protected the web servers protected with HTTP basic authentication. Default to 'admin'
  • WEB_SERVICES_PASSWORD: Password used to protected the web servers protected with HTTP basic authentication.
  • LOG_LEVEL: Level for logs on standard output, default to WARNING
  • LOG_CONFIG: on/off - log configuration on start, default to off
  • VALIDATION_MIN_SERVERS: minimum number of servers with the 'validation' functionality in the cluster, default to 0
  • SCORING_MIN_SERVERS: minimum number of servers with the 'scoring' functionality in the cluster, default to 0
  • KAMON_ENABLED: enable monitoring with Kamon, default to no
  • ZIPKIN_ENABLED: enable reporting traces to Zipkin, default to no. Requires Kamon enabled.
  • ZIPKIN_IP: IP address to Zipkin server. Requires Kamon and Zipkin enabled.
  • ZIPKIN_PORT: Port to Zipkin server. Requires Kamon and Zipkin enabled.
  • PROMETHEUS_ENABLED: enable reporting metrics to Prometheus, default to no. Requires Kamon enabled.
  • PROMETHEUS_IP: IP address to Prometheus server. Requires Kamon and Prometheus enabled.
  • PROMETHEUS_PORT: Port to Prometheus server. Requires Kamon and Prometheus enabled.
  • SIGAR_SYSTEM_METRICS: Enable collection of metrics of the system using Sigar native library, default to no. Requires Kamon enabled.
  • JVM_SYSTEM_METRICS: Enable collection of metrics of the JVM using JMX, default to no. Requires Kamon enabled.
  • MINING_LIMIT: Maximum number of concurrent mining operations. Default to 100
  • EXPERIMENT_LIMIT: Maximum number of concurrent experiments. Default to 100
  • RELEASE_STAGE: Release stage used when reporting errors to Bugsnag. Values are dev, staging, production
  • DATA_CENTER_LOCATION: Location of the datacenter, used when reporting errors to Bugsnag
  • CONTAINER_ORCHESTRATION: Container orchestration system used to execute the Docker containers. Values are mesos, docker-compose, kubernetes

Getting started

Follow these steps to get started:

  1. Git-clone this repository.
  git clone https://github.com/LREN-CHUV/woken.git
  1. Change directory into your clone:
  cd woken
  1. Build the application

You need the following software installed:

  • Docker 18.09 or better with docker-compose
  ./build.sh
  1. Run the application

You need the following software installed to execute some tests:

  cd tests
  ./run.sh

tests/run.sh uses docker-compose to start a full environment with Mesos, Zookeeper and Chronos, all of those are required for the proper execution of Woken.

  1. Create a DNS alias in /etc/hosts
  127.0.0.1       localhost frontend

  1. Browse to http://frontend:8087 or run one of the query* script located in folder 'tests'.

Available Docker containers

The Docker containers that can be executed on this platform require a few specific features.

TODO: define those features - parameters passed as environment variables, in and out directories, entrypoint with a 'compute command', ...

The project algorithm-repository contains the Docker images that can be used with woken.

Available commands

Mining query

Performs a data mining task.

Path: /mining/job Verb: POST

Takes a Json document in the body, returns a Json document.

Json input should be of the form:

  {
    "user": {"code": "user1"},
    "variables": [{"code": "var1"}],
    "covariables": [{"code": "var2"},{"code": "var3"}],
    "grouping": [{"code": "var4"}],
    "filters": [],
    "algorithm": "",
    "datasets": [{"code": "dataset1"},{"code": "dataset2"}]
  }

where:

  • variables is the list of variables
  • covariables is the list of covariables
  • grouping is the list of variables to group together
  • filters is the list of filters. The format used here is coming from JQuery QueryBuilder filters, for example {"condition":"AND","rules":[{"id":"FULLNAME", "field":"FULLNAME","type":"string","input":"text","operator":"equal","value":"Isaac Fulmer"}],"valid":true}
  • datasets is an optional list of datasets, it can be used in distributed mode to select the nodes to query and in all cases add a filter rule of type {"condition":"OR","rules":[{"field":"dataset","operator","equals","value":"dataset1"},{"field":"dataset","operator","equals","value":"dataset2"}]}
  • algorithm is the algorithm to use.

Currently, the following algorithms are supported:

  • data: returns the raw data matching the query
  • linearRegression: performs a linear regression
  • summaryStatistics: performs a summary statistics than can be used to draw box plots.
  • knn
  • naiveBayes

Experiment query

Performs an experiment comprised of several data mining tasks and an optional cross-validation step used to compute the fitness of each algorithm and select the best result.

TODO: document API

Release

You need the following software installed:

Execute the following commands to distribute Woken as a Docker container:

  ./publish.sh

Installation

For production, woken requires Mesos and Chronos. To install them, you can use either:

  • mip-microservices-infrastructure, a collection of Ansible scripts deploying a full Mesos stack on Ubuntu servers.
  • mantl.io, a microservice infrstructure by Cisco, based on Mesos.
  • Mesosphere DCOS DC/OS (the datacenter operating system) is an open-source, distributed operating system based on the Apache Mesos distributed systems kernel.

What's in a name?

Woken :

  • the Woken river in China - we were looking for rivers in China
  • passive form of awake - it launches Docker containers and computations
  • workflow - the previous name, not too different

Acknowledgements

Funding

This work has been funded by the European Union Seventh Framework Program (FP7/2007­2013) under grant agreement no. 604102 (HBP)

This work is part of SP8 of the Human Brain Project (SGA1).

Sponsors

Thanks for the generous support of Bugsnag who offered us a Standard plan allowing us to inspect and report efficiently errors in our software.

Tools

We use the following tools for development:

  • IntelliJ IDEA
  • Bugsnag to report errors in real time to our development team
  • CircleCI for continuous integration

woken's People

Contributors

ajutzeler avatar carlisgg avatar dabelenda avatar devsprint avatar kubukoz avatar ludovicc avatar marigold avatar mirco-nasuti avatar nicedexter avatar scala-steward avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

woken's Issues

Dump configurations

Add an admin web service that shows current configurations for

  • logs
  • application configuration (akka...)

add database and schema to the where clause

// TODO: add database and schema to the where clause
// TODO: collapse database,schema,table into a TableId in the Doobie mappings
v.fold(
sql"SELECT id, source, hierarchy, target_table, histogram_groupings FROM meta_variables WHERE target_table=$table"
.query[VariablesMeta]
.option


This issue was generated by todo based on a TODO comment in 70240dc. It's been assigned to @ludovicc because they committed the code.

Akka cannot bind to ip address while running in a container in bridge mode managed by Mesos

Exception in thread "main" java.lang.ExceptionInInitializerError
at eu.hbp.mip.woken.web.Web.main(Web.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: hbpxx.intranet.chuv/155.xx.xx.xx:31088

Reference for Akka in Docker:

https://github.com/mhamrah/akka-docker-cluster-example/blob/master/src/main/resources/reference.conf
https://lostintimedev.com/2017/05/26/running-akka-cluster-on-docker-swarm.html
https://github.com/lostintime/hello-akka/blob/master/docker-compose.yml
https://technologyconversations.com/2015/11/25/deploying-containers-with-docker-swarm-and-docker-networking/

Running sbt fails with error "ensimeScalaVersion not found"

Running sbt fails with the following error:

woken/build.sbt:190: error: not found: value ensimeScalaVersion
    ensimeScalaVersion in ThisBuild := "2.11.12",
    ^
[error] sbt.compiler.EvalException: Type error in expression
[error] sbt.compiler.EvalException: Type error in expression
[error] Use 'last' for the full log.
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore?

It seems that a plugin is missing in project/plugins.sbt.

This is the reason why @scala-steward can't propose updates any more.

Acolyte should support pgObject and pgJsonb types

// TODO: Acolyte should support pgObject and pgJsonb types
"put and get variables" ignore withVariablesMetaRepository { dao =>
val churnHierarchy = loadJson("/metadata/churn_variables.json").convertTo[GroupMetaData]
val churnVariablesMeta =
VariablesMeta(1, "churn", churnHierarchy, "CHURN", List("state", "custserv_calls", "churn"))


This issue was generated by todo based on a TODO comment in 51a6650. It's been assigned to @ludovicc because they committed the code.

collapse database,schema,table into a TableId in the Doobie mappings

// TODO: collapse database,schema,table into a TableId in the Doobie mappings
v.fold(
sql"SELECT id, source, hierarchy, target_table, histogram_groupings FROM meta_variables WHERE target_table=$table"
.query[VariablesMeta]
.option
.transact(xa)


This issue was generated by todo based on a TODO comment in 70240dc. It's been assigned to @ludovicc because they committed the code.

Bug WaitForWorkers

After running an experiment from the Web portal, Woken seems to fall into an incorrect state and refuses to handle further requests.

[WARN] [06/26/2017 13:38:37.090] [woken-akka.actor.default-dispatcher-19] [akka.tcp://[email protected]:4090/user/$g] unhandled event Start(Job(89d6a676-db22-4c2e-a568-a8f4fea0a034,Some(ldsm),List(Algorithm(linearRegression,Linear Regression,Map()), Algorithm(knn,K-nearest neighbors with k=5,Map(k -> 5))),List(Validation(kfold,Untitled validation,Map(k -> 2))),Map(PARAM_variables -> subjectageyears, PARAM_query -> select subjectageyears,leftventraldc,rightventraldc from merged_data where subjectageyears is not null and leftventraldc is not null and rightventraldc is not null, PARAM_grouping -> , PARAM_meta -> {"subjectageyears":{"description":"Subject age in years.","methodology":"mip-cde","label":"Age Years","minValue":0,"code":"subjectageyears","units":"years","length":3,"maxValue":130,"type":"integer"},"leftventraldc":{"description":"","methodology":"lren-nmm-volumes","label":"Left Ventral DC","code":"leftventraldc","units":"cm3","length":20,"type":"real"},"rightventraldc":{"description":"","methodology":"lren-nmm-volumes","label":"Right Ventral DC","code":"rightventraldc","units":"cm3","length":20,"type":"real"}}, PARAM_covariables -> leftventraldc,rightventraldc))) in state WaitForWorkers

Healthcheck for mining cache

Add a health check for mining cache, which can also slowly fill the mining cache...
It should help a lot to identify Fass / Chronos misconfiguration issues in production, as live container algorithms should be successfully executed here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.