wowu / docker-rollout Goto Github PK

View Code? Open in Web Editor NEW

2.1K 2.1K 58.0 44 KB

🚀 Zero Downtime Deployment for Docker Compose

Home Page: https://github.com/Wowu/docker-rollout

License: MIT License

Shell 100.00%

deployment docker docker-compose

docker-rollout's People

Contributors

Stargazers

Watchers

Forkers

kokizzu suryatmodulus woodrowpearson stvhanna jrcribb diogosilva30 joelassistotvs ssahgal rayblair06 cloudnepal complexcircuit iamgrewal orinocoz ahmedrazi notmariekondo deanpcmad fumaz spread0x activeliang minhnhut0602 arne-kroeger mn-ahmed cellinlab sysadminxxx 0q2 zchspace slary1260 0xkarma justzx2011 jackwiy abhibeckert emreskle viktorvillalobos 8libsaumro baswilson techlab23 maronato szaboattilaprog plastic-forks alecplayground liviob74 encryptblockr aphisiit yaphet17 odas0r antoniolov jorgebenzeno yousysadmin felix98765 rcknr dmitriylyalyuev basverlooy theupriser benzht deepakchawde salamirangerover h3x4g0ns rashgaroth

docker-rollout's Issues

Problems using on alpine based images while running docker in docker

This is not really anymore an issue at least for me, but just to make it easier for someone who runs into similar issue as well.
Even after following the guide, I ran into issues like:

docker: 'rollout' is not a docker command.
See 'docker --help'

AND

grep: unrecognized option: invert-match
BusyBox v1.36.1 (2023-07-27 17:12:24 UTC) multi-call binary.
Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]...
Search for PATTERN in FILEs (or stdin)
	-H	Add 'filename:' prefix
	-h	Do not add 'filename:' prefix
	-n	Add 'line_no:' prefix
	-l	Show only names of files that match
	-L	Show only names of files that don't match
	-c	Show only count of matching lines
	-o	Show only the matching part of line
	-q	Quiet. Return 0 if PATTERN is found, 1 otherwise
	-v	Select non-matching lines
	-s	Suppress open and read errors
	-r	Recurse
	-R	Recurse and dereference symlinks
	-i	Ignore case
	-w	Match whole words only
	-x	Match whole lines only
	-F	PATTERN is a literal (not regexp)
	-E	PATTERN is an extended regexp
	-m N	Match up to N times per file
	-A N	Print N lines of trailing context
	-B N	Print N lines of leading context
	-C N	Same as '-A N -B N'
	-e PTRN	Pattern to match
	-f FILE	Read pattern from file

For the first issue of rollout is not a docker command. If you run docker info, you'd see that docker fails to load the plugin and returns a warning as part of the message as follows:

WARNING: Plugin "/usr/local/libexec/docker/cli-plugins/docker-rollout" is not valid: failed to fetch metadata: fork/exec /usr/local/libexec/docker/cli-plugins/docker-rollout: no such file or directory

The problem here is that the docker-rollout plugin is being executed or supported on only bash currently. This might be a very trivial issue to resolve. I'd try to create an PR to fix this, but in the meantime, it's easy to document this issue.
The easy solution is to install bash in your alpine image.

apk add --update bash && rm -rf /var/cache/apk/*

Also, for the issue with grep, your alpine image comes with a stripped down grep package. You can fix this by simply running

apk add --no-cache --upgrade grep

These took me a couple of hours to resolve, but I thought it could save someone a lot of developer hours as well.
Nevertheless, I'll during my leisure time, develop an MR to address this.

Feel free to leave this open if it's cool to have this support, or close the issue if this is outside of the scope of the tool.

Support profiles

The same way as docker compose does. Is it possible? :)

Number of container don't match expected number

Hello,

Hard issue to explain, but easy to reproduce.

I have two containers. One frontend and one backend.

The backend is already configured to be scaled at 2 containers.
The frontend is configured to be scaled at 1 container.

When I do rollout on the backend container, everything is peachy, it works as expected (4 containers then 2 containers).
When I do rollout on the frontend container after the backend one, one of the backend container gets deleted.

I wonder why it is happening ...

Error after deploiement finish : 20#20: *23 connect() failed (111: Connection refused) while connecting to upstream

Hello I use nginx reverse proxy beside my web app. When Deploiement process it's okay I don't have downtime but after deploiement finish I got this error, nginx can't find upstream :

20#20: *23 connect() failed (111: Connection refused) while connecting to upstream

Remanining containers after rollout

My deploy script (sh file) build three images and rollout:

git pull

docker compose build dashboard
docker compose build api
docker compose build website

docker rollout dashboard
docker rollout api
docker rollout website

Sometimes (and I don't know why) there are "remaining" containers after a deployment:

CONTAINER ID   IMAGE                    COMMAND                  CREATED         STATUS         PORTS                NAMES
123026510d16   myapp-website             "docker-entrypoint.s…"   6 minutes ago   Up 6 minutes   8080/tcp             myapp-website-50
123b6fdc1486   myapp-api                 "docker-entrypoint.s…"   6 minutes ago   Up 6 minutes   3000/tcp             myapp-api-55
12332ab4c312   myapp-api                 "docker-entrypoint.s…"   6 minutes ago   Up 6 minutes   3000/tcp             myapp-api-54
123b3f057175   myapp-api                 "docker-entrypoint.s…"   6 minutes ago   Up 6 minutes   3000/tcp             myapp-api-56
123e6157eca8   myapp-api                 "docker-entrypoint.s…"   7 minutes ago   Up 7 minutes   3000/tcp             myapp-api-53
1234a67df563   myapp-dashboard           "docker-entrypoint.s…"   7 minutes ago   Up 7 minutes   8083/tcp             myapp-dashboard-48
12315fea426f   nginxproxy/nginx-proxy   "/app/docker-entrypo…"   2 weeks ago     Up 2 weeks     0.0.0.0:80->80/tcp   myapp-nginx-proxy-1

I have no idea why it's happening.

I'd like to request a minimal example demonstrating how to perform zero-downtime deployment using docker rollout with Nginx and Rails.

version: "3.9"
services:
  web:
    image: activeliang/rtr_wms-web
    environment:
      VITE_RUBY_HOST: 0.0.0.0
      RAILS_ENV: production
    volumes:
      - /tmp/sockets/rtr_wms:/rtr_wms/tmp/sockets

Hello, the code above is part of my docker-compose.yml file, where my Nginx server runs on the host machine and the Rails application runs inside a container. When I specify to scale the Rails application to two instances using the docker rollout, they both point to /rtr_wms/tmp/sockets/puma.sock which causes a conflict, and one of the apps restarts as a result.

My issue is that I'm not sure how to balance the relationship between Nginx and Rails. Therefore, I'd like to request a minimal example demonstrating how to perform zero-downtime deployment using docker rollout with Nginx and Rails.

Feature: Add additional wait after containers are healthy

I am having issues with some images that report 'ready' slightly before they are actually ready to do real business. One in particular would need just another few seconds to finish its infinispan-syncronisation (i.e., it is ready to sync, but not with the sync).

My proposal would be to define a switch like -g | --graceperiod SECONDS with default 0 and insert a suitable wait als else branch of this if [ "$SUCCESS" != "$SCALE" ]; then.

I can create a PR if that is desired.

Best regards

Performance problem

Hi,

When i up my API services with rollout, i've got a randomly call that take longer time than usual.
If i up with docker compose, no more response time problem.

Does anyone got the same behaviour ?

Support `docker --context`

Could you support the docker --context flag?

It's used like this, to take the example from the Docker docs:

docker --context production container ls

I suppose you'd need to change something around here:

  -*)
    echo "Unknown option: $1"
    exit_with_usage
    ;;
  *)

and here:

# check if compose v2 is available
if docker compose >/dev/null 2>&1; then
  COMPOSE_COMMAND="docker compose"
elif docker-compose >/dev/null 2>&1; then
  COMPOSE_COMMAND="docker-compose"
else
  echo "docker compose or docker-compose is required"
  exit 1
fi

Shell scripting is not at all my strength so I'm hesitant to dive in myself.

zero downtime is a promise not held in reality: can you add a hook (so we can reload the proxy)?

Your solution is incurring downtime as it ignores the downstream proxy, which needs to reload to register the rolled out container's new ip address. AND it needs to reload RIGHT AFTER the rollout moment. Unfortunately your script's duration is too long and in the mean time the proxy still forwards traffic to the old ip.

So my suggestion is to get a hook for us to call the reload of the proxy.

Compose project name not checked during scaling

When there are services with the same name in different compose projects, the script doesn't filter out the ones in the same project.
One use-case can be running a project with multiple branches with different compose project names. Having the same service name currently raises an error during scaling.
For example. Error: No such object: 794aa7906db2.

Scaling not working when old instance have status `Exited`

When the old instance has status Exited, rollout command not start the new instance.
it trying to start the old instance instead and break because the old instance has error in it.

I expected it to ignore the error in old instance and start the new instance. Then re route the when the new instance is ready.

==> Service 'django' is not running. Starting the service.
Container xxx-django-13  Created
Container xxx-django-13  Starting
Container xxx-django-13  Started

after that line, no scaling happens for that service

unknown flag: --quiet error

I have a docker-compose file like below.

version: "3.8"

services:

  traefik:
    image: traefik:v2.11
    container_name: traefik
    command:
      - "--api.insecure=true"
      - "--api.dashboard=false"
      - "--providers.docker=true"
    ports:
      - "$TRAEFIK_DOCKER_PORT:$TRAEFIK_LOCAL_PORT"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"

  node:
    build: ./src
    restart: unless-stopped
    depends_on:
      - traefik
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.node.rule=Host(`localhost`) || Host(`0.0.0.0`)"
      - "traefik.http.services.node.loadbalancer.server.port=$NODE_DOCKER_PORT"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:$NODE_DOCKER_PORT/health_checks/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    stdin_open: true
    tty: true

I have installed docker-rollout on my local, and it is working fine.
When I make docker rollout node after running my dompose file above, it works as expected.
My local docker version is as follows.

src git:(main) docker --version
Docker version 20.10.14, build a224086

I installed it on my Ubuntu server with the same steps. However, when I run the docker rollout node command on my server, I get the following output and it does not work.

github@vmi1556310:~/Projects/e-ihracat-yolu-backend$ docker rollout node
unknown flag: --quiet
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

docker-compose is a Docker CLI plugin

Management Commands:
  compose                    Docker Compose

Global Options:
      --config string      Location of client config files (default "/home/github/.docker")
  -c, --context string     Name of the context to use to connect to the daemon (overrides DOCKER_HOST env
                           var and default context set with "docker context use")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket to connect to
  -l, --log-level string   Set the logging level ("debug", "info", "warn", "error", "fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default "/home/github/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default "/home/github/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default "/home/github/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote

Run 'docker COMMAND --help' for more information on a command.

For more help on how to use Docker, head to https://docs.docker.com/go/guides/

==> Service 'node' is not running. Starting the service.
unknown flag: --detach
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

docker-compose is a Docker CLI plugin

Management Commands:
  compose                    Docker Compose

Global Options:
      --config string      Location of client config files (default "/home/github/.docker")
  -c, --context string     Name of the context to use to connect to the daemon (overrides DOCKER_HOST env
                           var and default context set with "docker context use")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket to connect to
  -l, --log-level string   Set the logging level ("debug", "info", "warn", "error", "fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default "/home/github/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default "/home/github/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default "/home/github/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote

Run 'docker COMMAND --help' for more information on a command.

For more help on how to use Docker, head to https://docs.docker.com/go/guides/

Docker version on my ubuntu server is as follows.

github@vmi1556310:~$ docker --version
Docker version 24.0.5, build 24.0.5-0ubuntu1~20.04.1

Is this a problem about docker versions? What am I doing wrong and how can I solve the problem?

Thank you very much in advance.

remove SC2086 check as it is not very helpful and should be human decided (as you noticed)

Just my 2 cents

True zero-downtime deployment by request draining

Currently there's no way of telling Traefik that old container is going to be stopped, so it might route requests to a container that is shutting down. I'm creating this issue to track the progress of figuring out what is the best way of implementing this.

The problem was mentioned in this StackOverflow question: https://stackoverflow.com/questions/75918681/how-to-avoid-downtime-when-using-docker-rollout-with-traefik

Current idea

The easiest way seems to fail healthchecks before the container is going to be stopped, so Traefik is not routing new requests to the unhealthy container(s). This can be achieved by adding ! test -f /drain to container healthcheck, that is "fail if there exists a file named drain in /", and docker-rollout can create this file before stopping the old container.

I'm not sure if this behavior should be hardcoded in the tool, as there might be better ways of implementing request draining for proxies other than Traefik / nginx. Implementing hook support would allow docker-rollout users to implement true zero downtime deployment in two steps:

Add && ! test -f /drain to current container healtcheck in compose file
Add a hook like --before-stop "docker exec $1 touch /drain && sleep 10" to create the file manually

How to deploy to remote instance

My deploy command:

docker-compose -H "ssh://owadmin@${DEPLOY_HOST}" --profile app -f docker-compose.yml -f docker-compose.deploy.yml -f docker-compose.${BUILD_ENV}.yml up -d --remove-orphans

Is it possible to do it here? (-H option).