Code Monkey home page Code Monkey logo

mistry's Introduction

mistry logo


Build Status Go report License: GPL v3

mistry is a general-purpose build server that enables fast workflows by employing artifact caching and incremental building techniques.

mistry executes user-defined build steps inside isolated environments and saves build artifacts for later consumption.

Refer to the introductory blog post Speeding Up Our Build Pipelines for more information.

At Skroutz we use mistry to speed our development and deployment processes:

  • Rails asset compilation (rails assets:precompile)
  • Bundler dependency resolution and download (bundle install)
  • Yarn dependency resolution and download (yarn install)

In the above use cases, mistry executes these commands once they are needed for the first time and caches the results. Then, when anyone else executes the same commands (i.e. application servers, developer workstations, CI server etc.) they instantly get the results back.

Features

  • execute user-defined build steps in pre-defined environments, provided as Docker images
  • build artifact caching
  • incremental building (see "Build cache")
  • CLI client for interacting with the server (scheduling jobs etc.) via a JSON API
  • a web view for inspecting the progress of builds (see "Web view")
  • efficient use of disk space due to copy-on-write semantics (using Btrfs snapshotting)

For more information visit the wiki.

Getting started

You can get the binaries from the latest releases.

Alternatively, if you have Go 1.10 or later you can get the latest development version.

NOTE: statik is a build-time dependency, so it should be installed in your system and present in your PATH.

$ go get github.com/rakyll/statik

# server
$ go get -u github.com/skroutz/mistry/cmd/mistryd

# client
$ go get -u github.com/skroutz/mistry/cmd/mistry

Usage

To boot the server a configuration file is needed:

$ mistryd --config config.json

You can use the sample config as a starting point.

Use mistryd --help for more info.

Adding projects

Projects are essentially directories with at minimum a Dockerfile at their root. Each project directory should be placed in the path denoted by projects_path (see Configuration.

Refer to File system layout - Projects directory for more info.

API

Interacting with mistry (scheduling builds etc.) can be done in two ways: (1) using the client and (2) using the HTTP API directly (see below).

We recommended using the client whenever possible.

Client

Schedule a build for project foo and download the artifacts:

$ mistry build --project foo --target /tmp/foo

The above command will block until the build is complete and then download the resulting artifacts to /tmp/foo/.

Schedule a build without fetching the artifacts:

$ mistry build --project foo --no-wait

The above will just schedule the build and return immediately - it will not wait for it to complete and will not fetch the artifacts.

For more info refer to the client's README.

HTTP Endpoints

Schedule a new build without fetching artifacts (this is equivalent to passing --no-wait when using the client):

$ curl -X POST /jobs \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"project": "foo"}'
{
    "Params": {"foo": "xzv"},
    "Path": "<artifact path>",
    "Cached": true,
    "Coalesced": false,
    "ExitCode": 0,
    "Err": null,
    "TransportMethod": "rsync"
}

Web view

mistry comes with a web view where progress and logs of each build can be inspected.

Browse to http://0.0.0.0:8462 (or whatever address the server listens to).

Configuration

Configuration is provided in JSON format. The following settings are currently supported:

Setting Description Default
projects_path (string) The path where project folders are located ""
build_path (string) The root path where artifacts will be placed ""
mounts (object{string:string}) The paths from the host machine that should be mounted inside the execution containers {}
job_concurrency (int) Maximum number of builds that may run in parallel (logical-cpu-count)
job_backlog (int) Used for back-pressure - maximum number of outstanding build requests. If exceeded subsequent build requests will fail (job_concurrency * 2)

The paths denoted by projects_path and build_path should be present and writable by the user running the server.

For an example refer to the sample config.

Development

Before anything, make sure you install the dependencies:

make deps

The tests will attempt to ssh to localhost. You will need to add your public key to the authorized keys as if you were setting this up to a remote host.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

To run the tests, the Docker daemon should be running and SSH access to localhost should be configured.

$ make test

Note: the command above may take more time the first time it's run, since some Docker images will have to be fetched from the internet.

License

mistry is released under the GNU General Public License version 3. See COPYING.

mistry logo contributed by @cyfugr

mistry's People

Contributors

agis avatar apostergiou avatar ctrochalakis avatar dtheodor avatar fragoulis avatar linosgian avatar nikosgkotsis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mistry's Issues

Configurable build timeout limits

Right now, there are no limits enforced on how much time a job could take. This might lead to cases where clients wait on a never-ending job, that also consumes resources (ie concurrency/backlog configuration options).

For example, this happens on some of our projects when a network issue occurs between our internal network and rubygems (ie. fastly CDNs), resulting in CI builds getting stalled indefinitely.

There should be a hard limit as to how much time a job may take so that we safeguard against such situations. Ideally, it should be both a global configuration that applies across all projects AND a per-project configuration (not sure how this should be specified though).

Add option to completely replace the target directory

In some scenarios, you want the artifacts fetched from mistry to completely override the previous contents of the --target-path. To support such use cases, we could add a --replace (or replace-target) boolea option that, if enabled will completely replace the given path.

In the case of rsync, it would just pass the --delete option to rsync.

Support non-blocking job schedule

Sometimes, it's necessary to just trigger a job and not wait for it to finish.

A common scenario is when someone wants to eagerly prepare a build that knows is going to be needed in the future (eg. triggering a build from a github webhook to precompile the assets of a Rails application).

The server should support this option first, and then it has to be implemented in the client.

Add complete, runnable examples in README/wiki

These would demonstrate the complete flow and would contain real flags. For example:

$ mistry-cli --project rails --group staging --target /tmp

# ... server output

# ... synced artifacts

Handle symlinks on build/projects path

Currently, if data_path is a symlink, docker on Mac complains with:

$ ./mistry-cli build --host localhost --project simple
2018/04/12 13:21:47 Error creating job: Error building {project=simple params=map[] group= id=3dc923b}: work: could not start docker container; Error response from daemon: invalid mount config for type "bind": bind source path does not exist
, http code: 500

This is due to docker/for-mac#1298.

We should resolve symlinks after reading the configuration, and update it accordingly before booting the server.

Incremental builds

Document how to use incremental building (ie. groups). This should be mentioned briefly in the README but explained thoroughly in a new wiki page.

Rename binaries to mistryd / mistry

The following naming scheme seems more natural:

  • server: mistryd
  • client: mistry

This needs not affect the repository/project name, only the generated binaries.

Standardize the test helpers

There are already some test helpers like assertEq, assert, failIfError, assertNotEq.

This issue is about standardizing their usage such as removing duplicate behavior or extracting them to a proper package.

Investigate error message when using scp and the folder exists

Error message:

$ ./mistry build --host localhost --port 8462 --project simple_with_directory \
 --group foo --target /tmp/foo --transport scp -- --x=$RANDOM

/tmp/mistry: Is a directory

The project used should produce artifacts with a directory structure depth equal or bigger than 3.

e.g.

files
└──more_files
    └── foo.js

Populate BuildInfo.Err when possible

BuildInfo.Err is supposed to contain build error messages, if any. At the moment, we do not populate it. We should, when possible (eg. when a docker build or docker run fails).

BootstrapProject is racy

If jobs for a new project (that hasn't been bootstrapped yet) happen to arrive at the same time, one of the (2) calls to BootstrapProject() might fall due to an "directory exists" error.

We could introduce some locking when bootstrapping a project.

(btrfs) Change mod time of jobs

When using the btrfs filesystem, the mod time of the subvolumes that have been snapshotted is the mod time of the first subvolume. This creates an ambiguity when running stat or ls -al.

We should update the mod time to reflect the actual mod time of the directory.

https://golang.org/pkg/os/#Chtimes may help mitigate this.

Make rsync module configurable

As it stands the rsync module is hardcoded to be "mistry":

module := "mistry"

The client should be able to set this option.

Implement a strategy for stale containers

Right now, we remove containers in StartContainer() after they run successfully. However, it might happen, for example in case of a mistry failure while a container was running (eg. SIGKILL), that containers are left around with nobody to remove them.

We should decide on how to treat such cases.

Web view revamp

Backend

  • investigate if we can determine when job has started without looking at the path's mod time. Maybe through build_result.json.
  • properly display docker build logs (#37)
  • properly format "started at" in job/show
  • view: dynamically iterate on build result and print it
  • make broker package SSE-specific (maybe this is just a rename), also try to make it simpler
  • introduce package webtailer (provides the HTTP handlers and uses package tailer & broker)
  • extract presentation logic to a separate struct, if necessary
  • make job show handler return a consistent JSON (ie. JobInfo type)
  • show all job fields when marshalling to JSON
  • check for current jobs (job index) in a goroutine, once for all clients

Create a mapping layer for querying the storage layer

This layer (let's suppose it is a Storage package) will expose functions like:

func (s *Storage) GetJobs(project string) ([]Job, error)

and

func (s *Storage) GetJob(project, id string) (Job, error)

This will abstract the storage solution away and will enable more intuitive interactions in regard with fetching saved jobs.

Support setting timeouts

Right now, the client will block indefinitely if mistry is down (connect timeout) or a job takes forever to complete (read timeout).

However, there are cases when this is not acceptable. We should have the option to set a read timeout and a connect timeout on the client. Perhaps we only should expose this as a single timeout (how much the whole command should run for).

Drop scp transport

It is hard to setup, not going to be used in production environments and overall rsync is a better solution. The only think we decided to add scp was for use in tests and as a more easy-to-setup default than rsync. However, it seems that it's not so easy since it still requires setting up SSH keys and also required more configuration to work as we want it (as opposed to rsync).

If scp doesn't provide the benefits we chose it for, we should drop it completely. For this, #52 should be done first so that we have an alternative (ie. running tests with rsync without having to setup an rsync daemon).

README: add use cases

We should add 3-5 use cases for mistry. Some ideas: rake assets:precompile, bundle install, yarn install . We could be creative here 😄

There's an existing work in branch use-cases.

Gracefully handle non-userland errors

We should handle errors that the user cannot do something about and somehow segregate them from user-land errors.

For example, an container failed to build because another container with the same ID already exists, is not something the user fix.

When target path doesn't exist, it is created as a file

agis:~/dev/go/src/github.com/skroutz/mistry [master] $ ./mistry-cli build --project simple --target nonexistingpath
agis:~/dev/go/src/github.com/skroutz/mistry [master] $ ls nonexistingpath
nonexistingpath
agis:~/dev/go/src/github.com/skroutz/mistry [master] $ file nonexistingpath
nonexistingpath: empty

Expected: nonexistingpath should be created as a directory, and the build artifacts should be inside it.

Consider not caching failed builds

I think that it makes more sense not to cache build failures, most of the time those would be transient errors that should be retried next time someone asks for that build.

In that case we might want to keep the build directory (in a failed subdirectory?) for debugging purposes.

Strategy for pruning old builds

There should be a strategy for somehow pruning old builds. This could be in the form of a "retention period" configuration.

This is necessary, otherwise disk usage will grow indefinitely.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.