docker-systems / baleen Goto Github PK
View Code? Open in Web Editor NEWFiltering out bad Docker builds since 2014
Filtering out bad Docker builds since 2014
build:
# build this image, using the current dir as the docker context
docker.example.com/app: .
# example of building multiple images from dockerfile in different dir:
#- docker.example.com/rabbitmq: containers/rabbitmq/.
#- docker.example.com/couchdb: containers/couchdb/.
# All credentials that are used are defined. The UI will allow these
# to be filled in. They will be passed to any containers that needs them.
# By default these are assumed to be unique and part of a set within a multi
# container setup. e.g. if one container requires RABBIT_USER, then any other
# containers with credential RABBIT_USER will get the same value.
#
# All are provided to the container at run time as environment variables.
# FILE means a file will be uploaded and the environment variable will
# be the path to it.
#
# Some day these may alternatively be placed in a config system like etcd,
# or be placed in a config file format at a configurable location.
credentials:
RABBIT_CERTIFICATE: FILE,
RABBIT_USER: VALUE,
RABBIT_PASSWORD: VALUE,
# by default, all dependent containers will be built and be available
# and linked in as their label name
depends:
rabbitmq:
src: "[email protected]:docker-systems/rabbitmq.git"
minhash: "deedbeef" # must have this commit
# image should be inferred from the baleen.yml of the src repo
# image: "docker.example.com/rabbitmq"
#tag: v0.1.1
db:
# use official image from dockerhub
image: "postgres"
couchdb:
image: "docker.example.com/couchdb"
# Need to define this better, what will be set up etc.
pre_test:
- "echo 1"
- "load up any data"
tests:
# Tests should run with their minimal dependencies to avoid complexity
links:
# remap the link, otherwise uses --link db. --link rabbitmq is implicit
- "db:db1"
ports:
- "8000:8000"
env:
TEST: 1
# will also have credential variables defined above
volumes:
# preserve between runs so that multiple commands can have persistent effects
- "/data"
cmd:
# each command is a separate container run, they will share the same
# volume
- "run_tests.sh"
- "collect_coverage.sh"
# How to run integration tests?
# integration tests may require prompting from outside the system, whether it
# is from a browser, a selenium controlled browser like browserstack or other.
# To run them using baleen requires making another container that links all
# the dependencies together and then connects via running something within that
# integration container (e.g. a headless browser)
# All artifacts will be preserved and available for each build.
# Some artifacts like coverage % and test counts will be graphed.
# htmldir and files will be downloadable and served from baleen too
artifacts:
xunit:
python_xml: /data/xunit.xml
coverage:
python_xml: /data/coverage.xml
htmldir: /data/htmlcov/
documentation:
htmldir: /data/mydocs
pdf:
file: /data/my.pdf
# custom or should we be more declarative?
deploy:
- "docker push docker.example.com/app:latest"
- "docker stop app"
# question, should we try to define the parts like
- "docker run --rm --name app -t docker.example.com/app:latest"
PR #4 retrieves these files from a container, but doesn't currently read their stats into the db.
Fig is deprecated, so we should replace fig.yml
and replace the dependency on fig for the test stage of builds.
If the test container is incorrectly configured (e.g. it starts a web server waiting for connections) , and never exits, then the worker will hang until the test container is killed manually or the baleen job is terminated.
we currently use subprocess's communicate() call to coordinate handling stdout/stderr/stdin, but this is clunky and doesn't let us incrementally read output (making it hard to check on how far along the current action is).
Sarge wraps subprocess with nice features for incremental and asynchronous interaction with processes:
http://sarge.readthedocs.org/en/latest/tutorial.html
There are various approaches for our own implementation, but protecting against deadlocks when io buffers drain or fill up can time consuming.
In order to build a container with collectstatic copying static files, we need to set SECRET_KEY to a not blank value otherwise Django complains. But this is bad for real deployment and it should still complain when the application runs if SECRET_KEY is still "REPLACEME".
Some CI tools use the build process in a dockerfile to run tests.
To make it simpler for people to use both, we should allow people to skip specifying the test section in a baleen.yml
. e.g.
build:
my.registry123124.com/PROJECTNAME: .
test:
subject:
image: "my.registry123124.com/PROJECTNAME"
## No need to run anything, the test results are already present from
## build, but we need a container in order to `docker cp` the results.
command: "echo 1"
artifacts:
xunit:
path: /path/to/xunit
becomes:
build:
my.registry123124.com/PROJECTNAME: .
artifacts:
xunit:
path: /path/to/xunit
Current URL displayed looks like http://baleen/hub/github/...
when it should be http://baleen/github/...
I am using a repo with a hyphen in the name. Initially, I had named the container to match (via baleen.yml
), but we found that this would cause troubles in Fig. So I renamed the container name, but the dir name is used as a surrogate for the project name when issuing a --build
, so it's still mucking with Fig's naming.
Until this is fixed, it might be worth using some sort of slugify
method to strip any characters that do not match [0-9a-z_]
.
Attempting to get a URL such as:
https://baleen/project/3/job/15/get-build-artifact-coverage_html/htmlcov/index.html
Results in a 404. Might be a baleen container or uwsgi issue rather than baleen, but needs investigation.
Fig usually is used as a way to define an operational group of containers, but having a fig_test.yml
and baleen passing that to fig might be a simple way to get something going in the short term.
Various tools exist to listen to web hooks from dockerhub. If we make baleen support the same format we can easily use these tools, e.g. https://github.com/ehazlett/conduit
The format of the POST request is like:
{
"callback_url": "https://registry.hub.docker.com/u/svendowideit/testhook/hook/2141b5bi5i5b02bec211i4eeih0242eg11000a/",
"push_data": {
"images": [
"27d47432a69bca5f2700e4dff7de0388ed65f9d3fb1ec645e2bc24c223dc1cc3",
"51a9c7c1f8bb2fa19bcd09789a34e63f35abb80044bc10196e304f6634cc582c",
...
],
"pushed_at": 1.417566161e+09,
"pusher": "trustedbuilder"
},
"repository": {
"comment_count": 0,
"date_created": 1.417494799e+09,
"description": "",
"dockerfile": "#\n# BUILD\u0009\u0009docker build -t svendowideit/apt-cacher .\n# RUN\u0009\u0009docker run -d -p 3142:3142 -name apt-cacher-run apt-cacher\n#\n# and then you can run containers with:\n# \u0009\u0009docker run -t -i -rm -e http_proxy http://192.168.1.2:3142/ debian bash\n#\nFROM\u0009\u0009ubuntu\nMAINTAINER\[email protected]\n\n\nVOLUME\u0009\u0009[\"/var/cache/apt-cacher-ng\"]\nRUN\u0009\u0009apt-get update ; apt-get install -yq apt-cacher-ng\n\nEXPOSE \u0009\u00093142\nCMD\u0009\u0009chmod 777 /var/cache/apt-cacher-ng ; /etc/init.d/apt-cacher-ng start ; tail -f /var/log/apt-cacher-ng/*\n",
"full_description": "Docker Hub based automated build from a GitHub repo",
"is_official": false,
"is_private": true,
"is_trusted": true,
"name": "testhook",
"namespace": "svendowideit",
"owner": "svendowideit",
"repo_name": "svendowideit/testhook",
"repo_url": "https://registry.hub.docker.com/u/svendowideit/testhook/",
"star_count": 0,
"status": "Active"
}
}
There are a number validation tools, and these should be potentially be made part of a default selenium based container #36
If a project remains broken, pester the owner every now and then by hipchat/email to fix it (best to use a backing off period though to avoid annoying people too much)
Currently Jobs, and the associated output/results, are tightly coupled to Django models.
It'd be nice to separate the execution layer from storage. This would make the baleen builder able to run without a db connection (e.g. for testing if a project builds from the command line).
Such an abstracted Job would needs to provide details and hooks for:
record_start
, record_done
, record_action_start
, record_action_response
- for doing something with these events and their output.init_repo_plan
, action_plan
- for generating the planned steps required for the build.Some of the naming of these things could probably be improved too.
I've defined a new hook model in #4.
I'm unlikely to implemented it for a while but wanted to record my thoughts about them as I have them.
class Hook(models.Model):
"""
Hooks for success/failure/finished during a build, as
well as arbitary events.
"""
project = models.ForeignKey('project.Project')
# actions can choose to implement any "event", and you can make the hook wait
# for that.
watch_for = models.CharField(max_length=255,
help_text="The event to watch for, can be anything.")
# Email a user, a particular email address, or the commit author
email_user = models.ForeignKey('auth.User')
email_address = models.EmailField(max_length=255)
email_author = models.BooleanField(default=False)
# Post details about the event
post_url = models.URLField()
Instead of running hooks after each action in the worker, the hooks will be sent as a new gearman job.
Instead of confusing predefined events, we just use a simple watch_for
value which is matched against the event label.
Hooks will support emailing a user, and address, a commit author, or posting details about the event to a url.
From @holic
An alternative for secure config vars in baleen.yml:
http://docs.travis-ci.com/user/environment-variables/#Secure-Variables
Would also allow deployment configuration to be stored in the repo. Baleen wouldn't need to manage a table of credentials which would be a security target.
console reports: Uncaught TypeError: Cannot read property 'local' of null
Probably means one of the expected template dates are missing from the current job include... and this error prevents rendering the rest of the moment.js dates.
Currently each Project
has an ordered series of Actions
. Manually curating these via the web UI is annoying, and we are moving towards declarative configs for these. Since the actions/build steps will be in the file there is no need to persist them. Instead we should have a model that stores the declarative config, and knows how to parse them into functional action objects.
class BuildDefinition(Model):
project = ForeignKey('Project')
filename = CharField() # filename will probably help determine what the format is
raw_plan = TextField()
plan_type = CharField() # null allowed, but fill in once it's been matched to a plan generator
def create_plan():
return ActionPlan(raw_plan, plan_type)
class BuildStep(object):
pass
class ActionPlan(object)
"""Is an iterator that returns BuildSteps"""
def __iter__(self): pass
def next(self): pass
Currently we only support building and testing the master branch. We should:
Requires #27
Would like to be able to fetch failing test images (when doing an image compare) and display on project summary
Ideally, Baleen would just be responsible for pushing a successfully built container to a registry and then notifying whatever orchestration tool is being used that there is a new image.
It'd make it a lot easier to test one's baleen.yml
if we could skip going through the web ui and just run something from a command line.
Allow users to upload their public key, make baleen only deploy commits that are correctly signed by a valid user.
While this could be done by creating a special selenium container, it'd be nice to make this use case as easy as possible. Either provide a public selenium container that can be used to run tests, or create alternative baleen.yml
for UI testing.
After building documentation, a report or an html static site, within a container, we should be able to export a directory of html and upload it to a server/s3/somewhere else.
It's a bit rude to give all the credit/blame to the person who reviewed/merged a pull request!
Then use this to predict remaining time in current builds.
Should be able to build any branch that gets updated, as well as actually filter on branch name.
Unfortunately to do this write would require evaluating the UI too:
It'd be unclear which branch is failing if one just shows all builds in a chronological list. I'd prefer to have a separate history for each branch.
Also need to consider:
Add a user profile and tallow users to specify their repo username/emails. Create template tag to automatically render user's canonical name.
Needed for baleen.yml
dependencies to actually do anything.
There doesn't seem to be a Perl cobertura xml report for cover, but there is Devel::Cover::Report::Clover
Long term, the build artifacts supported by Baleen should be made pluggable and specifiable in settings.py
, but for now we can just add a new output type.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.