docker-systems / baleen Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 442 KB

Filtering out bad Docker builds since 2014

Python 79.66% CSS 0.58% JavaScript 2.02% HTML 16.60% Shell 0.39% Dockerfile 0.74%

baleen's People

Contributors

Stargazers

Watchers

Forkers

holic

baleen's Issues

Define a config format

build:
   # build this image, using the current dir as the docker context
   docker.example.com/app: .
   # example of building multiple images from dockerfile in different dir:
   #- docker.example.com/rabbitmq: containers/rabbitmq/.
   #- docker.example.com/couchdb: containers/couchdb/.

# All credentials that are used are defined. The UI will allow these
# to be filled in. They will be passed to any containers that needs them.
# By default these are assumed to be unique and part of a set within a multi
# container setup.  e.g. if one container requires RABBIT_USER, then any other
# containers with credential RABBIT_USER will get the same value.
#
# All are provided to the container at run time as environment variables.
# FILE means a file will be uploaded and the environment variable will
# be the path to it.
#
# Some day these may alternatively be placed in a config system like etcd,
# or be placed in a config file format at a configurable location.
credentials:
    RABBIT_CERTIFICATE: FILE,
    RABBIT_USER: VALUE,
    RABBIT_PASSWORD: VALUE,

# by default, all dependent containers will be built and be available
# and linked in as their label name
depends:
   rabbitmq:
      src: "[email protected]:docker-systems/rabbitmq.git"
      minhash: "deedbeef" # must have this commit
      # image should be inferred from the baleen.yml of the src repo
      # image: "docker.example.com/rabbitmq"
      #tag: v0.1.1
   db:
      # use official image from dockerhub
      image: "postgres"
   couchdb:
      image: "docker.example.com/couchdb"

# Need to define this better, what will be set up etc.
pre_test:
  - "echo 1"
  - "load up any data"

tests:
    # Tests should run with their minimal dependencies to avoid complexity
    links:
        # remap the link, otherwise uses --link db. --link rabbitmq is implicit
        - "db:db1"
    ports:
       - "8000:8000"
    env:
       TEST: 1
       # will also have credential variables defined above
    volumes:
       # preserve between runs so that multiple commands can have persistent effects
       - "/data"
    cmd:
       # each command is a separate container run, they will share the same
       # volume
       - "run_tests.sh"
       - "collect_coverage.sh"

# How to run integration tests?
# integration tests may require prompting from outside the system, whether it
# is from a browser, a selenium controlled browser like browserstack or other.
# To run them using baleen requires making another container that links all
# the dependencies together and then connects via running something within that
# integration container (e.g. a headless browser)

# All artifacts will be preserved and available for each build.
# Some artifacts like coverage % and test counts will be graphed.
# htmldir and files will be downloadable and served from baleen too
artifacts:
  xunit:
    python_xml: /data/xunit.xml
  coverage:
    python_xml: /data/coverage.xml
    htmldir: /data/htmlcov/
  documentation:
    htmldir: /data/mydocs
  pdf:
    file: /data/my.pdf

# custom or should we be more declarative?
deploy:
    - "docker push docker.example.com/app:latest"
    - "docker stop app"
    # question, should we try to define the parts like 
    - "docker run --rm --name app -t docker.example.com/app:latest"

Correctly import coverage and xunit artifacts

PR #4 retrieves these files from a container, but doesn't currently read their stats into the db.

Convert to docker-compose

Fig is deprecated, so we should replace fig.yml and replace the dependency on fig for the test stage of builds.

Support ability to set a test timeout

If the test container is incorrectly configured (e.g. it starts a web server waiting for connections) , and never exits, then the worker will hang until the test container is killed manually or the baleen job is terminated.

Replace coordination of subprocess PIPEs with sarge

we currently use subprocess's communicate() call to coordinate handling stdout/stderr/stdin, but this is clunky and doesn't let us incrementally read output (making it hard to check on how far along the current action is).

Sarge wraps subprocess with nice features for incremental and asynchronous interaction with processes:

http://sarge.readthedocs.org/en/latest/tutorial.html

There are various approaches for our own implementation, but protecting against deadlocks when io buffers drain or fill up can time consuming.

Warn if SECRET_KEY is "REPLACEME"

In order to build a container with collectstatic copying static files, we need to set SECRET_KEY to a not blank value otherwise Django complains. But this is bad for real deployment and it should still complain when the application runs if SECRET_KEY is still "REPLACEME".

Automatically create container if artifacts exist from build process

Some CI tools use the build process in a dockerfile to run tests.

To make it simpler for people to use both, we should allow people to skip specifying the test section in a baleen.yml. e.g.

build:
    my.registry123124.com/PROJECTNAME: .

test:
    subject:
      image: "my.registry123124.com/PROJECTNAME"
      ## No need to run anything, the test results are already present from
      ## build, but we need a container in order to `docker cp` the results.
      command: "echo 1"

artifacts:
  xunit:
    path: /path/to/xunit

becomes:

build:
    my.registry123124.com/PROJECTNAME: .

artifacts:
  xunit:
    path: /path/to/xunit

Incorrect GitHub webhook URL

Current URL displayed looks like http://baleen/hub/github/... when it should be http://baleen/github/...

Baleen's container naming is not compatible with Fig

I am using a repo with a hyphen in the name. Initially, I had named the container to match (via baleen.yml), but we found that this would cause troubles in Fig. So I renamed the container name, but the dir name is used as a surrogate for the project name when issuing a --build, so it's still mucking with Fig's naming.

Until this is fixed, it might be worth using some sort of slugify method to strip any characters that do not match [0-9a-z_].

Build artifacts are not served correctly

Attempting to get a URL such as:

https://baleen/project/3/job/15/get-build-artifact-coverage_html/htmlcov/index.html

Results in a 404. Might be a baleen container or uwsgi issue rather than baleen, but needs investigation.

Support cloc, sloccount artifacts

http://cloc.sourceforge.net/

Support just using fig to build and test?

Fig usually is used as a way to define an operational group of containers, but having a fig_test.yml and baleen passing that to fig might be a simple way to get something going in the short term.

Emulate docker hub webhooks

Various tools exist to listen to web hooks from dockerhub. If we make baleen support the same format we can easily use these tools, e.g. https://github.com/ehazlett/conduit

The format of the POST request is like:

{
  "callback_url": "https://registry.hub.docker.com/u/svendowideit/testhook/hook/2141b5bi5i5b02bec211i4eeih0242eg11000a/",
  "push_data": {
    "images": [
        "27d47432a69bca5f2700e4dff7de0388ed65f9d3fb1ec645e2bc24c223dc1cc3",
        "51a9c7c1f8bb2fa19bcd09789a34e63f35abb80044bc10196e304f6634cc582c",
        ...
    ],
    "pushed_at": 1.417566161e+09,
    "pusher": "trustedbuilder"
  },
  "repository": {
    "comment_count": 0,
    "date_created": 1.417494799e+09,
    "description": "",
    "dockerfile": "#\n# BUILD\u0009\u0009docker build -t svendowideit/apt-cacher .\n# RUN\u0009\u0009docker run -d -p 3142:3142 -name apt-cacher-run apt-cacher\n#\n# and then you can run containers with:\n# \u0009\u0009docker run -t -i -rm -e http_proxy http://192.168.1.2:3142/ debian bash\n#\nFROM\u0009\u0009ubuntu\nMAINTAINER\[email protected]\n\n\nVOLUME\u0009\u0009[\"/var/cache/apt-cacher-ng\"]\nRUN\u0009\u0009apt-get update ; apt-get install -yq apt-cacher-ng\n\nEXPOSE \u0009\u00093142\nCMD\u0009\u0009chmod 777 /var/cache/apt-cacher-ng ; /etc/init.d/apt-cacher-ng start ; tail -f /var/log/apt-cacher-ng/*\n",
    "full_description": "Docker Hub based automated build from a GitHub repo",
    "is_official": false,
    "is_private": true,
    "is_trusted": true,
    "name": "testhook",
    "namespace": "svendowideit",
    "owner": "svendowideit",
    "repo_name": "svendowideit/testhook",
    "repo_url": "https://registry.hub.docker.com/u/svendowideit/testhook/",
    "star_count": 0,
    "status": "Active"
  }
}

Support html validator artifacts

There are a number validation tools, and these should be potentially be made part of a default selenium based container #36

Allow project ownership/responsibility

If a project remains broken, pester the owner every now and then by hipchat/email to fix it (best to use a backing off period though to avoid annoying people too much)

Separate the abstract concept of a job from storage

Currently Jobs, and the associated output/results, are tightly coupled to Django models.

It'd be nice to separate the execution layer from storage. This would make the baleen builder able to run without a db connection (e.g. for testing if a project builds from the command line).

Such an abstracted Job would needs to provide details and hooks for:

directory for repo to be cloned to, built from. currently this is: settings.BUILD_ROOT + project.project_dir
record_start, record_done, record_action_start, record_action_response - for doing something with these events and their output.
init_repo_plan, action_plan - for generating the planned steps required for the build.

Some of the naming of these things could probably be improved too.

Implement event hooks

I've defined a new hook model in #4.

I'm unlikely to implemented it for a while but wanted to record my thoughts about them as I have them.

class Hook(models.Model):
    """
    Hooks for success/failure/finished during a build, as
    well as arbitary events.
    """
    project = models.ForeignKey('project.Project')
    # actions can choose to implement any "event", and you can make the hook wait
    # for that.
    watch_for = models.CharField(max_length=255,
            help_text="The event to watch for, can be anything.")

    # Email a user, a particular email address, or the commit author
    email_user = models.ForeignKey('auth.User')
    email_address = models.EmailField(max_length=255)
    email_author = models.BooleanField(default=False)

    # Post details about the event
    post_url = models.URLField()

Instead of running hooks after each action in the worker, the hooks will be sent as a new gearman job.

Instead of confusing predefined events, we just use a simple watch_for value which is matched against the event label.

Hooks will support emailing a user, and address, a commit author, or posting details about the event to a url.

Reconsider credential management

From @holic

An alternative for secure config vars in baleen.yml:
http://docs.travis-ci.com/user/environment-variables/#Secure-Variables

Would also allow deployment configuration to be stored in the repo. Baleen wouldn't need to manage a table of credentials which would be a security target.

Bad moment.js rendering when there is an active job.

console reports: Uncaught TypeError: Cannot read property 'local' of null

Probably means one of the expected template dates are missing from the current job include... and this error prevents rendering the rest of the moment.js dates.

Remove the DB persistence around Actions

Currently each Project has an ordered series of Actions. Manually curating these via the web UI is annoying, and we are moving towards declarative configs for these. Since the actions/build steps will be in the file there is no need to persist them. Instead we should have a model that stores the declarative config, and knows how to parse them into functional action objects.

class BuildDefinition(Model):
    project = ForeignKey('Project')
    filename = CharField() # filename will probably help determine what the format is 
    raw_plan = TextField()
    plan_type = CharField() # null allowed, but fill in once it's been matched to a plan generator

    def create_plan():
        return ActionPlan(raw_plan, plan_type)

class BuildStep(object):
    pass

class ActionPlan(object)
    """Is an iterator that returns BuildSteps"""
    def __iter__(self): pass
    def next(self): pass

Status API integration

It would be useful to integrate Baleen with GitHub's Status API.
There are a couple blog posts about it here and here

Listen for Github pull request events and build them

Currently we only support building and testing the master branch. We should:

Isolate builds - don't reuse the same dir for cloning the repo, support different remotes, and branch specific builds.
Listen for pull request events to trigger builds.
Update pull request with success/failure of build whenever the pull request commits are updated.

Requires #27

Support image artifacts

Would like to be able to fetch failing test images (when doing an image compare) and display on project summary

Figure out a deployment mechanism

Ideally, Baleen would just be responsible for pushing a successfully built container to a registry and then notifying whatever orchestration tool is being used that there is a new image.

Ensure "terminate" of a build preserves stdout/stderr

Worker option to build an already checked out repo.

It'd make it a lot easier to test one's baleen.yml if we could skip going through the web ui and just run something from a command line.

Only deploy signed commits to servers

Allow users to upload their public key, make baleen only deploy commits that are correctly signed by a valid user.

Support Selenium or other headless browser integration testing

While this could be done by creating a special selenium container, it'd be nice to make this use case as easy as possible. Either provide a public selenium container that can be used to run tests, or create alternative baleen.yml for UI testing.

Documentation and static site artifacts

After building documentation, a report or an html static site, within a container, we should be able to export a directory of html and upload it to a server/s3/somewhere else.

Make user attribution for changesets based on commit author, not merger

It's a bit rude to give all the credit/blame to the person who reviewed/merged a pull request!

Record and show time taken for each build step

Then use this to predict remaining time in current builds.

Allow building branches other than master

Should be able to build any branch that gets updated, as well as actually filter on branch name.

Unfortunately to do this write would require evaluating the UI too:

It'd be unclear which branch is failing if one just shows all builds in a chronological list. I'd prefer to have a separate history for each branch.

Also need to consider:

How to show the status of building pull requests etc.
What to show on the project overview page (just the master branch or the one selected in the project config?)