autopilotpattern / mysql Goto Github PK

View Code? Open in Web Editor NEW

172.0 172.0 68.0 311 KB

Implementation of the autopilot pattern for MySQL

License: Mozilla Public License 2.0

Python 82.62% Makefile 5.29% Shell 12.09%

mysql's People

Contributors

Stargazers

Watchers

Forkers

sambitdixit jacques zaksoup markthink lue828 yanghongkjxy yh1094632455 b-xiang containersolutions guoguobo gitter-badger tianon jasondewitt pombredanne sourcec0de ernestom devopsbox papertigers guersam misterbisson tgross leowmjw ruiaylin benbromhead luccasmenezes akamalov stefreak nishantjaiswal7287 jsalva mbbender qujiaqi deitch leonkyneur zeroae sodre geek joshuadoubleyou therockstardba wobondar xplodwild infosiftr tjcelaya dperezsan bzcnsh certusoft p9s lostapathy ksoftirqd mattias197711 3838438org bdwyertech yunqianwan maheshsundaramurthy karamata gaybro8777 slpcat garora badrmoh duzhanyuan freshrose sql2 dafaulkens xaviercallens

mysql's Issues

Metadata about snapshots destroyed when Consul is destroyed

When I bring an entire environment up (and down and up), should the database snapshot from the previous time (which is stored in Manta) be used?

It seems that the Consul state isn't maintained anywhere persistently, so the next time the environment comes up, it starts fresh.

If I scale the mysql service after the environment is up, it does get the snapshot from Manta...

First MySQL service startup (after many other "docker-compose up -d")

2016-05-25 16:15:57,893 DEBUG manage.py [pre_start]  pre_start
2016-05-25 16:15:57,895 DEBUG manage.py [pre_start]    has_snapshot
2016-05-25 16:15:58,114 DEBUG manage.py [pre_start]    has_snapshot: None
2016-05-25 16:15:58,114 DEBUG manage.py [pre_start]    initialize_db

prompt> docker-compose scale mysql=2
(slave sees snapshot)

2016-05-25 16:31:44,959 DEBUG manage.py [pre_start]  pre_start
2016-05-25 16:31:44,960 DEBUG manage.py [pre_start]    has_snapshot
2016-05-25 16:31:45,073 DEBUG manage.py [pre_start]    has_snapshot: mysql-backup-2016-05-25T16-16-11Z
2016-05-25 16:31:45,074 DEBUG manage.py [pre_start]    get_snapshot
2016-05-25 16:31:45,960 DEBUG manage.py [pre_start]    get_snapshot: None

Switch to Percona 5.7

The image currently uses Percona 5.6 as the base, but 5.7 is :latest in https://hub.docker.com/r/library/percona/ .

Failover lock in Consul isn't being cleared after failover

After failover, I'm seeing the following in the logs:

failover session lock ({session ID}) not removed because primary has not reported as healthy'

This means the FAILOVER_IN_PROGRESS key isn't getting removed. The primary is marked as healthy however, so this error may be the result of some improper exception handling that's masking some other issue.

Re-configure after a consul system failure.

I was playing around with this and the consul autopilot pattern. I removed consul from this docker-compose thinking it would use the consul cluster I setup. I then removed the consul cluster and re-created it. After the recreation mysql never reported back to consul (which was available at the same CNS address) to set back up it's primary/slaves/keys etc... Maybe this is unreasonable, I'm honestly not sure how it all works together yet.

Snapshot TTL expiry causes Primary to be marked unhealthy

@misterbisson has reported that the on_change handler of replicas appear to be firing spuriously after a long period of operation, and this causes a failover even when the primary appears as though it should be healthy.

This appears to be a bug in the way we're marking the time for the snapshot in Consul. We can reproduce a minimal test case as follows.

We'll stand up a Consul server and a Consul agent container; the agent container is running under ContainerPilot and sends a trivial healthcheck for a service named "mysql" to Consul. The onChange handler will ask Consul for the JSON blob associated with the current status of the service, so that we can get it in the logs. Run the targets as follows:

docker run -d -p 8500:8500 --name consul_server \
       progrium/consul:latest \
       -server -bootstrap -ui-dir /ui

docker run -it -p 8501:8500 --name consul_agent \
       -v $(pwd)/containerpilot.json:/etc/containerpilot.json \
       -v $(pwd)/on_change.sh:/on_change.sh \
       --link consul_server:consul \
       autopilotpattern/mysql \
       /usr/local/bin/containerpilot \
       /usr/local/bin/consul agent -data-dir=/data -config-dir=/config \
                                   -rejoin -retry-join consul -retry-max 10 \
                                   -retry-interval 10s

The minimal containerpilot config we're binding into the agent is:

{
  "consul": "localhost:8500",
  "logging": {
    "level": "DEBUG"
  },
  "services": [
    {
      "name": "mysql",
      "port": 3306,
      "health": "echo health",
      "poll": 2,
      "ttl": 5
    }
  ],
  "backends": [
    {
      "name": "mysql",
      "poll": 7,
      "onChange": "curl -s http://localhost:8500/v1/health/service/mysql"
    }
  ]
}

We'll then register a new health check with the agent for the backup, and mark it passing once:

# push new check to agent
docker exec -it consul_agent \
       curl -s -d '{"Name": "backup", "TTL": "10s"}' \
       http://localhost:8500/v1/agent/check/register

# push pass TTL to agent
docker exec -it consul_agent \
       curl -v -s http://localhost:8500/v1/agent/check/pass/backup

After 10 seconds, the TTL for "backup" will expire and the on_change handler will fire!

2016/09/23 18:59:31 health
    2016/09/23 18:59:31 [WARN] agent: Check 'backup' missed TTL, is now critical
    2016/09/23 18:59:31 [INFO] agent: Synced check 'backup'
2016/09/23 18:59:33 mysql.health.RunWithTimeout start
2016/09/23 18:59:33 mysql.health.Cmd.Start
2016/09/23 18:59:33 mysql.health.run waiting for PID 62:
2016/09/23 18:59:33 health
2016/09/23 18:59:33 mysql.health.run complete
2016/09/23 18:59:33 mysql.health.RunWithTimeout end
2016/09/23 18:59:33 mysql.health.RunWithTimeout start
2016/09/23 18:59:33 mysql.health.Cmd.Start
2016/09/23 18:59:33 mysql.health.run waiting for PID 63:
2016/09/23 18:59:33 [{"Node":{"Node":"1a9e1eb44133","Address":"172.17.0.3","TaggedAddresses":null,"CreateIndex":0,"ModifyIndex":0},"Service":{"ID":"mysql-1a9e1eb44133","Service":"mysql","Tags":null,"Address":"172.17.0.3","Port":3306,"EnableTagOverride":false,"CreateIndex":0,"ModifyIndex":0},"Checks":[{"Node":"1a9e1eb44133","CheckID":"mysql-1a9e1eb44133","Name":"mysql-1a9e1eb44133","Status":"passing","Notes":"TTL for mysql set by containerpilot","Output":"ok","ServiceID":"mysql-1a9e1eb44133","ServiceName":"mysql","CreateIndex":0,"ModifyIndex":0},{"Node":"1a9e1eb44133","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","CreateIndex":0,"ModifyIndex":0},{"Node":"1a9e1eb44133","CheckID":"backup","Name":"backup","Status":"critical","Notes":"","Output":"TTL expired","ServiceID":"","ServiceName":"","CreateIndex":0,"ModifyIndex":0}]}]

When this happens we can check the status of the mysql service with curl -s http://localhost:8500/v1/health/service/mysql | jq . and see the following:

[
  {
    "Node": {
      "Node": "1a9e1eb44133",
      "Address": "172.17.0.3"
    },
    "Service": {
      "ID": "mysql-1a9e1eb44133",
      "Service": "mysql",
      "Tags": null,
      "Address": "172.17.0.3",
      "Port": 3306
    },
    "Checks": [
      {
        "Node": "1a9e1eb44133",
        "CheckID": "mysql-1a9e1eb44133",
        "Name": "mysql-1a9e1eb44133",
        "Status": "passing",
        "Notes": "TTL for mysql set by containerpilot",
        "Output": "ok",
        "ServiceID": "mysql-1a9e1eb44133",
        "ServiceName": "mysql"
      },
      {
        "Node": "1a9e1eb44133",
        "CheckID": "serfHealth",
        "Name": "Serf Health Status",
        "Status": "passing",
        "Notes": "",
        "Output": "Agent alive and reachable",
        "ServiceID": "",
        "ServiceName": ""
      },
      {
        "Node": "1a9e1eb44133",
        "CheckID": "backup",
        "Name": "backup",
        "Status": "critical",
        "Notes": "",
        "Output": "TTL expired",
        "ServiceID": "",
        "ServiceName": ""
      }
    ]
  }
]

Unfortunately this isn't a new bug, but splitting the snapshot from the health check seems to have revealed it, particularly as we've started running this blueprint in situations that were a bit closer to real-world use like autopilotpattern/wordpress.

The root problem is that when we register a check it's not being bound to a particular service and so when it fails the check the entire node is being marked as unhealthy. We can bind the check to a particular "ServiceID", but this means we'll need to have some kind of "dummy service" for backups. Given the low frequency for this check I'm actually just going to swap the check out for reading the last snapshot time directly from the kv store rather than adding this kind of complexity.

In my reproduction above I've also run into what appears to be a ContainerPilot bug that I didn't think was causing missed health checks but turns out it is. This is TritonDataCenter/containerpilot#178 (comment) so I'm going to hop on that ASAP.

MySQL Master isn't starting

When I do a docker-compose up I got the following:

mysql_1   | 2016-05-09 09:24:40,909 INFO manage.py Initializing database...
mysql_1   | Installing MySQL system tables...2016-05-09 09:24:41 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
mysql_1   | 2016-05-09 09:24:41 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
mysql_1   | 2016-05-09 09:24:41 0 [Note] /usr/sbin/mysqld (mysqld 5.6.29-76.2) starting as process 20 ...
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.
mysql_1   | 
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Using atomics to ref count buffer pool pages
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: The InnoDB memory heap is disabled
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Memory barrier is not used
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Compressed tables use zlib 1.2.8
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Using Linux native AIO
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Using CPU crc32 instructions
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Initializing buffer pool, size = 5.0M
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Completed initialization of buffer pool
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: The first specified data file ./ibdata1 did not exist: a new database to be created!
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Setting file ./ibdata1 size to 12 MB
mysql_1   | 2016-05-09 09:24:41 20 [Note] InnoDB: Database physically writes the file full: wait...
mysql_1   | 2016-05-09 09:24:41 7f1fb89da740 InnoDB: Error: Write to file ./ibdata1 failed at offset 0.
mysql_1   | InnoDB: 1048576 bytes should have been written, only 0 were written.
mysql_1   | InnoDB: Operating system error number 28.
mysql_1   | InnoDB: Check that your OS and file system support files of this size.
mysql_1   | InnoDB: Check also that the disk is not full or a disk quota exceeded.
mysql_1   | InnoDB: Error number 28 means 'No space left on device'.
mysql_1   | InnoDB: Some operating system error numbers are described at
mysql_1   | InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
mysql_1   | 2016-05-09 09:24:41 20 [ERROR] InnoDB: Error in creating ./ibdata1: probably out of disk space
mysql_1   | 2016-05-09 09:24:41 20 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
mysql_1   | 2016-05-09 09:24:41 20 [ERROR] Plugin 'InnoDB' init function returned error.
mysql_1   | 2016-05-09 09:24:41 20 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
mysql_1   | 2016-05-09 09:24:41 20 [ERROR] Unknown/unsupported storage engine: InnoDB
mysql_1   | 2016-05-09 09:24:41 20 [ERROR] Aborting
mysql_1   | 
mysql_1   | 2016-05-09 09:24:41 20 [Note] Binlog end
mysql_1   | 2016-05-09 09:24:41 20 [Note] /usr/sbin/mysqld: Shutdown complete
mysql_1   | 
mysql_1   | FATAL ERROR: Error closing mysqld pipe: Broken pipe

I tried to add this to the manage.py:

self.innodb_data_file_path = get_environ('INNODB_FILEPATH', None)
but with or without this the build exits with:

Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-FiSa7Z/cffi/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-LlOePZ-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-FiSa7Z/cffi/
The command '/bin/sh -c curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py &&     python get-pip.py &&     pip install         PyMySQL==0.6.7         python-Consul==0.4.7         manta==2.5.0         mock==2.0.0' returned a non-zero code: 1
 mbp  ~ / projects / autopilot-mysql  master  1  docker images

Consider adding Percona Monitoring and Management agent

The Percona Monitoring and Management agent might be a useful addition to this image.

Add setup.sh and demo.sh scripts

#10 raised a question about getting the Consul DNS name.

Ideally we'd like to eliminate use of the --link to Consul in the main docker-compose.yml and use DNS discovery instead. To make this easier, a setup.sh script like the one in https://github.com/autopilotpattern/couchbase would be a convenient addition that could write out the _env file with the DNS name for the Consul service.

The demo.sh script` is not strictly required, but it automatically opens up the web web browser to the correct locations, making for a convenient demo.

Manta client doesn't respect MANTA_TLS_INSECURE

It seems, that there's a problem uploading xtrabackup-files to a local manta in my lab.
As I have (not yet) an official ssl cert for Manta, I added
MANTA_TLS_INSECURE=1
to my Manta environment, and the variable is transferred into the container, as well. But "mantash" does not work inside the container until I create ~./ssh/key and ~./ssh/key.pub.

This is the error from "docker logs":

2016/09/21 18:24:19 160921 18:24:19 completed OK!
2016/09/21 18:24:19 INFO manage snapshot completed, uploading to object store
[...]
2016/09/21 18:24:20 httplib2.SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)
2016/09/21 18:24:20     2016/09/21 18:24:20 [ERR] http: Request PUT /v1/agent/check/pass/mysql-1452d7b4bb3a?note=ok, error: CheckID does not have associated TTL from=127.0.0.1:56359
2016/09/21 18:24:20 Unexpected response code: 500 (CheckID does not have associated TTL)
2016/09/21 18:24:20     2016/09/21 18:24:20 [INFO] agent: Synced service 'mysql-1452d7b4bb3a'
2016/09/21 18:24:20     2016/09/21 18:24:20 [INFO] agent: Synced check 'mysql-1452d7b4bb3a'
2016/09/21 18:24:20 ERROR manage Replica is not replicating.
2016/09/21 18:24:20     2016/09/21 18:24:20 [INFO] agent: Synced check 'mysql-1452d7b4bb3a'
2016/09/21 18:24:24 ERROR manage Replica is not replicating.
2016/09/21 18:24:29 ERROR manage Replica is not replicating.
2016/09/21 18:24:34 ERROR manage Replica is not replicating.

The backup.tar is created but it cannot be uploaded to Manta.

Is there a chance to get it to work with a self-signed cert (if that's the reason for the error)? Or do I simply need an official one?

Thx.

Not able to run cluster.

When doing docker-compose up, this error shows up in the logs

mysql_1   | 2017/12/03 22:47:20 check.mysql.Run start
mysql_1   | 2017/12/03 22:47:21 Traceback (most recent call last):
mysql_1   | 2017/12/03 22:47:21   File "/usr/local/bin/manage.py", line 407, in <module>
mysql_1   | 2017/12/03 22:47:21     main()
mysql_1   | 2017/12/03 22:47:21   File "/usr/local/bin/manage.py", line 399, in main
mysql_1   | 2017/12/03 22:47:21     manta = Manta()
mysql_1   | 2017/12/03 22:47:21   File "/usr/local/bin/manager/libmanta.py", line 36, in __init__
mysql_1   | 2017/12/03 22:47:21     signer=self.signer)
mysql_1   | 2017/12/03 22:47:21   File "/usr/local/lib/python2.7/dist-packages/manta/client.py", line 140, in __init__
mysql_1   | 2017/12/03 22:47:21     assert account, 'account'
mysql_1   | 2017/12/03 22:47:21 AssertionError: account
mysql_1   | 2017/12/03 22:47:21 check.mysql exited with error: check.mysql: exit status 1
mysql_1   | 2017/12/03 22:47:21 event: {ExitFailed check.mysql}
mysql_1   | 2017/12/03 22:47:21 event: {Error check.mysql: exit status 1}
mysql_1   | 2017/12/03 22:47:21 check.mysql.Run end
mysql_1   | 2017/12/03 22:47:21 check.mysql.kill
mysql_1   | 2017/12/03 22:47:21 killing command 'check.mysql' at pid: 378
mysql_1   | 2017/12/03 22:47:21 event: {StatusUnhealthy mysql}

When I tried running the tests(./tests/compose.sh), they too failed. Here are the logs

Failover via onChange handler Issues

The Scenario

The Failover via onChange handler in case of master failure is problematic in some scenarios. I will describe a scenario in which the Failover via onChange handler will cause data consistencies between the master and slaves with the current design. Let's assume the setup is 3 containers: C1, C2, and C3. C1 is the master with both C2 and C3 as the slaves of it. As per the documentation of the autopilot pattern for MySQL, we are using GTID based replication.

Assume that C1 crashes for some reason (I don't care why), and is not flapping nor easily recoverable. At the time of the crash C1 has written transactions 1-1000 (GTID based). Due to some issues with IO latency or whatever, both of the slaves C2 and C3 are lagging in replication. As compared to the more common scenario of slave lag due to SQL_THREAD slowness, this slave lag is caused by the IO_THREAD slowness. The slave C2 has only transactions 1-995 from the master, C1, in its relay logs and applied. The slave C3, which may have had more lag, has only transactions 1-990 from the master, C2, in its relay logs and applied.

So at this point we have to promote C2 or C3 to the new master and make the other one the slave. No matter which slave we promote to the new master (C2 or C3), it is obvious that transactions 996-1000 from the master is already lost. This is unavoidable in the current design and I will skip this part.

So according to the talk at Velocity, my reading of the documentation, and code, what happens is now that both slaves (C2 and C3) will try to become the new master by writing the lock in Consul. The other slave (the one that didn’t become the new master) will change its replication to point to the new master. The problem comes if C3 somehow wins and becomes the new master. Below I'll describe the two cases and the repressions.

C2 Becomes The Master

Since the slave C2 (with transactions 1-995 from the old master C1) has more transactions from the old master C1 than the slave C3 (with transactions 1-990 from the old master C1), it should logically be promoted to the new master. When C3 is made a slave of C2, C3 checks to see if it is missing any transactions relative to the new master C2 and in fact is missing transactions 991-995 from the old master C1. C2 finds the relevant transactions, 991-995 from the old master C1, in its binlogs and sends it to C3 to be applied. At this point in time data in C2 and C3 are in sync. Everything is good and the application is happy.

C3 Becomes The Master

When C2 is made a slave of C3, C2 checks to see if it is missing any transaction relative to the new master C3 and the answer is no. C2 has all the transactions that C3 has so replication says everything is fine. The fact that C2 has more transactions (transactions 991-995 from the old master C1) than the new master C3 does not matter to replication. At this point the data between the new master, C3, and the slave, C2, is inconsistent. The changes from transactions 991-995 from the old master, C1, is on C2 but not C3. I have successfully prototyped this scenario in Vagrant.

How To Avoid This

Off the top of my head, I can come up with two ways to avoid this.

Global Coordination To Promote Master

When the master dies, we require global coordination to promote the the slave. The global coordinator, whatever form it takes, has to examine all the slaves and choose the one that is most ahead (eg. has the most transaction from the dead master) and promote it as the master. The remaining servers become slaves on the new master. This is pretty much scenario 'C2 Becomes The Master' above.

Rebuild All Slaves Upon Master Promotion

Once a master fails, all slaves tries to become the new master using the process as described. Once a slave becomes the new master, we immediately rebuild all the other slaves using a backup of the new master. This ensures that we start everything in a consistent state. However the issue is that it can cause more data loss. If we use this logic in the 'C3 Becomes The Master' scenario above, then transactions 991-995 from the old master (C1) becomes unnecessary lost as well.

Consul 2 private interfaces

My Triton on premsis has a private ip range for both external and the fabric network.

Consul errors out when this situation occurs. Is there any better way to boot strap this or handle the dynamic IP when binding?

Some people have used things like the following:

"I don't know if it's the best way, but in my Vagrant cluster I have init files and something like BIND=ifconfig eth2 | grep "inet addr" | awk '{ print substr($2,6) }'. Then starting consul as a service."

How can we handle this in Autopilot pattern?

This container has its own installation of consul?

This container has its own installation of consul? Can someone help me understand why it would have its own oppose to linking to an existing consul container in my docker-compose.yml?

I have a few services listed in my local docker-compose, one of them being consul, as to use autopilot to communicate. Each of our services have their own mysql db. I’m trying to understand how to best use a mysql container for each. I figured I would spin up a mysql image and point to a data volume on the host, and point to the existing consul container but this mysql container doesn’t seem to be for that use-case unless I’m thinking about it wrong?

Shippable fails to run unit tests

https://app.shippable.com/runs/57ebded26fb4bc0e008f22d3/1/console

docker run --rm -w /usr/local/bin \
        -e LOG_LEVEL=DEBUG \
        -v /root/src/github.com/autopilotpattern/mysql/bin/manager:/usr/local/bin/manager \
        -v /root/src/github.com/autopilotpattern/mysql/bin/manage.py:/usr/local/bin/manage.py \
        -v /root/src/github.com/autopilotpattern/mysql/bin/test.py:/usr/local/bin/test.py \
        autopilotpattern/mysql:master-fdd92a1c2096ffbaf746e102c026cd42c64b8aa2 \
        python test.py
Timestamp: 2016-09-28 15:25:09.780142711 +0000 UTC
Code: System error

Message: not a directory

Frames:

---
0: setupRootfs
Package: github.com/opencontainers/runc/libcontainer
File: rootfs_linux.go@40

---
1: Init
Package: github.com/opencontainers/runc/libcontainer.(*linuxStandardInit)
File: standard_init_linux.go@57

---
2: StartInitialization
Package: github.com/opencontainers/runc/libcontainer.(*LinuxFactory)
File: factory_linux.go@242

---
3: initializer
Package: github.com/docker/docker/daemon/execdriver/native
File: init.go@35

---
4: Init
Package: github.com/docker/docker/pkg/reexec
File: reexec.go@26

---
5: main
Package: main
File: docker.go@18

---
6: main
Package: runtime
File: proc.go@63

---
7: goexit
Package: runtime
File: asm_amd64.s@2232
Error response from daemon: Cannot start container 68f0db235ad6fe4bd6215971a36e7701bf9e9be675dcdd2f9085e7a2896c0d54: [8] System error: not a directory
make: *** [unit-test] Error 1

This error message usually means there's something wrong in the -v arguments. This runs locally just fine so I'll have to investigate what's up with that. Relevant section of the makefile is:

    docker run --rm -w /usr/local/bin \
        -e LOG_LEVEL=DEBUG \
        -v $(shell pwd)/bin/manager:/usr/local/bin/manager \
        -v $(shell pwd)/bin/manage.py:/usr/local/bin/manage.py \
        -v $(shell pwd)/bin/test.py:/usr/local/bin/test.py \
        autopilotpattern/mysql:$(TAG) \
        python test.py

Dirty Consul state

After running for long time the remaining "mysql-backup-run" checks

caused that the Consul service started being perceived as unhealthy. That had an unexpected effect. I had Consul defined as DNS resolver. So when application tried to contact Consul by address http://consul.service.consul, Consul complained it cannot resolve the name consul.service.consul.

Traceback (most recent call last):
  File "/bin/triton-mysql.py", line 1041, in <module>
    on_start()
  File "/bin/triton-mysql.py", line 284, in on_start
    primary_result = get_primary_node()
  File "/bin/triton-mysql.py", line 957, in get_primary_node
    raise ex
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='consul.service.consul', port=8500): Max retries exceeded with url: /v1/kv/mysql-primary?token=aecf93b0-1b6a-4794-8c0a-8fc6624c80b9 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fb50ba990d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

The recursion was quite funny, but note that other names were also affected.

Support for semisynchronous replication

https://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html

In addition to the built-in asynchronous replication, MySQL 5.7 supports an interface to semisynchronous replication[...]. With asynchronous replication, if the master crashes, transactions that it has committed might not have been transmitted to any slave. Compared to asynchronous replication, semisynchronous replication provides improved data integrity because when a commit returns successfully, it is known that the data exists in at least two places.

Semisynchronous replication falls between asynchronous and fully synchronous replication. The master waits only until at least one slave has received and logged the events. It does not wait for all slaves to acknowledge receipt, and it requires only receipt, not that the events have been fully executed and committed on the slave side.

If semisynchronous replication is enabled on the master side and there is at least one semisynchronous slave, a thread that performs a transaction commit on the master blocks and waits until at least one semisynchronous slave acknowledges that it has received all events for the transaction, or until a timeout occurs.

Spitballing:

If semisynchronous replication is implemented, it may improve the consistency that may be expected of Autopilot Pattern MySQL in case of a failure of the primary.

It does not appear that all replicas need to be semisynchronous. Indeed, it may not be desirable for them to be. That suggests it might be ideal to designate one replica as semisynchronous for this purpose. That one semisynchronous replica might report itself to Consul as a different service name, just as the mysql-primary is a different service name from mysql. The semisynchronous replica would have dibs on promotion to primary in case of any primary failure.

Force preStart to poll if primary hasn't written snapshot

In #22 (comment) @tgross wrote:

The leader election process happens in the health check, and we can't restore the snapshot once mysqld is running so it has to happen in the preStart. We might be able to change this by forcing a replica to poll/wait during preStart if a primary has been elected but there's no backup yet.

In #28 (comment) @tgross wrote:

Maybe the replicas came up before the snapshot was done being written but once we check Manta/Consul everything is in place? That might be an area to improve

So clearly it looks like we're sometimes going to jump the gun on launching replicas before the primary has fully "settled" by writing its snapshot. Fixing this would make this more feasible for automated deployment.

cc @misterbisson

Move unit tests into shippable test rig

We have unit tests in ./bin/tests.py that can be joined with the tests in ./tests/tests.py and run at the same time.

manage.py is writing invalid value for the innodb_buffer_pool_size

manage.py is writing the innodb_buffer_pool_size out as MB. According to the MySQL documentation, this numeric value is in Bytes. On a 4GB server, manage.py writes out 2867. This is below the minimum value (see doc below). Since the value written is below the minimum, the min value (5MB) is used. This can be verified by looking at the docker log for the instance as it is created.

2 fixes from my point of view:

Change manage.py to calculate the value in Bytes instead of MB.
or
Change my.cnf.tmpl $buffer substitution to include the units like this: innodb_buffer_pool_size = ${buffer}M

https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_buffer_pool_size

Add Percona Monitoring and Management to the cluster

To enhance the setup, it might be a good idea to add Percona monitoring and management:
https://www.percona.com/doc/percona-monitoring-and-management/index.html

It consists basically of two Docker containers and the pmm-client package, that needs to be installed and activated on the mysql servers. The pmm-server IP/name could be transferred via its cns name (similar to the consul name).

It delivers query analysis and a grafana based metrics monitor. The backend is prometheus.

Access denied for replica user repluser in replicas after scaling up

After a docker-compose scale mysql=3 I keep seeing the following error in the replicas:

mysql_3   | 2016-06-14 20:46:25,924 ERROR manage.py (1045, u"Access denied for user 'repluser'@'localhost' (using password: YES)")
mysql_3   | Traceback (most recent call last):
mysql_3   |   File "/usr/local/bin/manage.py", line 341, in health
mysql_3   |     node.conn = wait_for_connection(**ctx)
mysql_3   |   File "/usr/local/bin/manage.py", line 614, in wait_for_connection
mysql_3   |     connect_timeout=timeout)
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/__init__.py", line 88, in Connect
mysql_3   |     return Connection(*args, **kwargs)
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 657, in __init__
mysql_3   |     self.connect()
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 851, in connect
mysql_3   |     self._request_authentication()
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 1034, in _request_authentication
mysql_3   |     auth_packet = self._read_packet()
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 906, in _read_packet
mysql_3   |     packet.check_error()
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/connections.py", line 367, in check_error
mysql_3   |     err.raise_mysql_exception(self._data)
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/err.py", line 120, in raise_mysql_exception
mysql_3   |     _check_mysql_exception(errinfo)
mysql_3   |   File "/usr/local/lib/python2.7/dist-packages/pymysql/err.py", line 112, in _check_mysql_exception
mysql_3   |     raise errorclass(errno, errorvalue)
mysql_3   | OperationalError: (1045, u"Access denied for user 'repluser'@'localhost' (using password: YES)")
mysql_3   | 2016/06/14 20:46:25 exit status 1

In consul I just see mysql-primary, no replicas.

I tested this both locally and on Triton using a fresh clone of this repo.

consul rpc error: invalid session

Hi，
I have a question。 operation as follows：
containerpilot mysqld --console --log-bin=mysql-bin --log_slave_updates=ON --gtid-mode=ON --enforce-gtid-consistency=ON
return question ：

consul.base.ConsulException: 500 rpc error: invalid session "5bbc14e3-4e7a-dc55-cc6d-76a7452a8de2"                                                                                
consul: RPC failed to server X.X.X.X:8300: rpc error: invalid session "5bbc14e3-4e7a-dc55-cc6d-76a7452a8de2"                                                                      
http: Request PUT /v1/kv/mysql-primary?acquire=5bbc14e3-4e7a-dc55-cc6d-76a7452a8de2, error: rpc error: invalid session "5bbc14e3-4e7a-dc55-cc6d-76a7452a8de2" from=127.0.0.1:47319

code position：
Traceback (most recent call last):                                             
  File "/usr/local/bin/manage.py", line 412, in <module>                       
    main()                                                                     
  File "/usr/local/bin/manage.py", line 409, in main                           
    cmd(node)                                                                  
  File "/usr/local/bin/manager/utils.py", line 64, in wrapper                  
    out = apply(fn, args, kwargs)                                              
  File "/usr/local/bin/manage.py", line 128, in health                         
    assert_initialized_for_state(node)                                         
  File "/usr/local/bin/manager/utils.py", line 64, in wrapper                  
    out = apply(fn, args, kwargs)                                              
  File "/usr/local/bin/manage.py", line 298, in assert_initialized_for_state   
    run_as_primary(node)                                                       
  File "/usr/local/bin/manager/utils.py", line 64, in wrapper                  
    out = apply(fn, args, kwargs)                                              
  File "/usr/local/bin/manage.py", line 340, in run_as_primary                 
    if not node.consul.mark_as_primary(node.name):                             
  File "/usr/local/bin/manager/utils.py", line 64, in wrapper                  
    out = apply(fn, args, kwargs)                                              
  File "/usr/local/bin/manager/libconsul.py", line 154, in mark_as_primary     
    if not self.lock(PRIMARY_KEY, name, session_id):                           
  File "/usr/local/bin/manager/utils.py", line 64, in wrapper                  
    out = apply(fn, args, kwargs)                                              
  File "/usr/local/bin/manager/libconsul.py", line 93, in lock                 
    return self.client.kv.put(key, value, acquire=session_id)                  
  File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in pu
    '/v1/kv/%s' % key, params=params, data=value)                              
  File "/usr/local/lib/python2.7/dist-packages/consul/std.py", line 39, in put 
    self.session.put(uri, data=data, verify=self.verify)))                     
  File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 111, in cb

Replicas fail to initialise

When many containers start at the same time, there is no backup yet (the freshly elected master hasn't created one yet), so replicas fail to initialise their databases, and never recover from the state.

Replication needs to be shut down on failover

PR #21 introduced a bug in the failover behavior. When we failover and run the CHANGE MASTER statement, we're not first shutting down replication and this causes an error as shown below (LOG_LEVEL=DEBUG in my env file):

2016-05-05 15:08:15,438 DEBUG manage.py   health
2016-05-05 15:08:15,439 DEBUG manage.py     update
2016-05-05 15:08:15,440 DEBUG manage.py        get_primary_node
2016-05-05 15:08:15,571 DEBUG manage.py        get_primary_node: 92d7278030bc
2016-05-05 15:08:15,571 DEBUG manage.py     update: None
2016-05-05 15:08:15,571 DEBUG manage.py     assert_initialized_for_state
2016-05-05 15:08:15,572 DEBUG manage.py        get_primary_node
2016-05-05 15:08:15,575 DEBUG manage.py        get_primary_node: 92d7278030bc
2016-05-05 15:08:15,575 DEBUG manage.py     assert_initialized_for_state: True
2016-05-05 15:08:15,578 DEBUG manage.py     is_backup_running
2016-05-05 15:08:15,578 DEBUG manage.py     is_backup_running: False
2016-05-05 15:08:15,578 DEBUG manage.py     is_binlog_stale
2016-05-05 15:08:15,579 DEBUG manage.py SHOW MASTER STATUS
2016-05-05 15:08:15,579 DEBUG manage.py ()
2016-05-05 15:08:15,582 DEBUG manage.py     is_binlog_stale: False
2016-05-05 15:08:15,583 DEBUG manage.py     is_time_for_snapshot
2016-05-05 15:08:15,585 DEBUG manage.py     is_time_for_snapshot: False
2016-05-05 15:08:15,585 DEBUG manage.py SELECT 1
2016-05-05 15:08:15,585 DEBUG manage.py ()
2016-05-05 15:08:15,586 DEBUG manage.py   health: None
2016-05-05 15:08:15,964 DEBUG manage.py     get_primary_node
2016-05-05 15:08:15,968 DEBUG manage.py     get_primary_node: 92d7278030bc
2016-05-05 15:08:15,968 DEBUG manage.py     get_primary_host
2016-05-05 15:08:15,968 DEBUG manage.py checking if primary (92d7278030bc) is healthy...
2016-05-05 15:08:15,971 DEBUG manage.py     get_primary_host: 192.168.130.102
2016-05-05 15:08:15,972 DEBUG manage.py     set_primary_for_replica
2016-05-05 15:08:15,972 DEBUG manage.py       get_primary_host
2016-05-05 15:08:15,973 DEBUG manage.py         get_primary_node
2016-05-05 15:08:15,976 DEBUG manage.py         get_primary_node: 92d7278030bc
2016-05-05 15:08:15,976 DEBUG manage.py checking if primary (92d7278030bc) is healthy...
2016-05-05 15:08:15,979 DEBUG manage.py       get_primary_host: 192.168.130.102
2016-05-05 15:08:15,979 DEBUG manage.py CHANGE MASTER TO MASTER_HOST           = %s, MASTER_USER           = %s, MASTER_PASSWORD       = %s, MASTER_PORT           = 3306, MASTER_CONNECT_RETRY  = 60, MASTER_AUTO_POSITION  = 1, MASTER_SSL            = 0; START SLAVE;
2016-05-05 15:08:15,979 DEBUG manage.py (u'192.168.130.102', 'repl', 'password2')
2016-05-05 15:08:15,979 DEBUG manage.py (1198, u'This operation cannot be performed with a running slave; run STOP SLAVE first')
2016-05-05 15:08:16,980 DEBUG manage.py     get_primary_node
2016-05-05 15:08:16,984 DEBUG manage.py     get_primary_node: 92d7278030bc
2016-05-05 15:08:16,984 DEBUG manage.py     get_primary_host
2016-05-05 15:08:16,984 DEBUG manage.py checking if primary (92d7278030bc) is healthy...
2016-05-05 15:08:16,988 DEBUG manage.py     get_primary_host: 192.168.130.102
2016-05-05 15:08:16,988 DEBUG manage.py     set_primary_for_replica
2016-05-05 15:08:16,989 DEBUG manage.py       get_primary_host
2016-05-05 15:08:16,989 DEBUG manage.py         get_primary_node
2016-05-05 15:08:16,992 DEBUG manage.py         get_primary_node: 92d7278030bc
2016-05-05 15:08:16,992 DEBUG manage.py checking if primary (92d7278030bc) is healthy...
2016-05-05 15:08:16,995 DEBUG manage.py       get_primary_host: 192.168.130.102
2016-05-05 15:08:16,995 DEBUG manage.py CHANGE MASTER TO MASTER_HOST           = %s, MASTER_USER           = %s, MASTER_PASSWORD       = %s, MASTER_PORT           = 3306, MASTER_CONNECT_RETRY  = 60, MASTER_AUTO_POSITION  = 1, MASTER_SSL            = 0; START SLAVE;
2016-05-05 15:08:16,995 DEBUG manage.py (u'192.168.130.102', 'repl', 'password2')
2016-05-05 15:08:16,996 DEBUG manage.py (1198, u'This operation cannot be performed with a running slave; run STOP SLAVE first')
...
(will loop infinitely at this point)

This fix for this should be easy and I'm looking into it now.

TRITON_MYSQL_CONSUL vs CONSUL as environment variable in triton-mysql.py

Should the default environment varible for consul be TRITON_MYSQL_CONSUL here?
https://github.com/autopilotpattern/mysql/blob/master/bin/triton-mysql.py#L35

Seems like using simply 'CONSUL' here will make it easier to line up with other services in a single compose project, especially when running on Triton using CNS.

Inject git tag into Compose file during builds

When we run on ~~Shippable~~ Jenkins we're running the integration tests against :latest which generally should already be what's on the master branch at the time we merge. But what we really want to do is run the tests and then only tag :latest afterwards. And in the future we may want to be able to run other branches. So we need to have a way to inject the git ref that we inject via make tag into the docker-compose.yml file.

Create snapshot when mysql-primary is restarted/recreated

Would it be possible to have the primary database (attempt to) take a snapshot of the DB when it is restarted or shut down?

I had rebuilt a custom docker container (not the mysql one) and wanted to update it in the runtime. What I normally do is do a docker-compose build foo, then docker-compose up -d foo to have it replaced in the project environment. Usually, only that one instance (foo) is recreated.

This time, it decided to restart all of the consul instances, the mysql instance, and the foo instance. I think this would have been OK, but for some reason the consul raft sank in the process triggering an unfortunate chain reaction resulting in the loss of the mysql data after the last backup.

If mysql could take one last snapshot before going away, I could fix the situation manually. Without a last backup, all of the data that was in the container when it went down is gone.

How to AutoScale

So I've gone through all the demos and am understanding how this all works, my question now is... how do I docker-compose scale customers=+1 or -1 on the fly? I would like to use some type of metric to autoscale. I don't know if this was the correct place to ask this, is there a mailing list?

Snapshots not backing up

I have started up a 2 mysql node system connected via 3 autopilotpattern/consuls running in a private Triton environment backed up to minio. I would have expected when I came in on Monday to see 2 snapshots stored in minio from the weekend. I logged into the docker of the primary-mysql and ran a few commands (below) to see if I could get the snapshot to run. Note the code running is that of https://github.com/certusoft/mysql

root@1a5d9a1a4ff6:/usr/local/bin# python manage.py snapshot_task
Traceback (most recent call last):
File "manage.py", line 437, in
main()
File "manage.py", line 434, in main
cmd(node)
File "/usr/local/bin/manager/utils.py", line 65, in wrapper
out = apply(fn, args, kwargs)
File "manage.py", line 249, in snapshot_task
if not node.is_snapshot_node() or not node.consul.lock_snapshot(node.name):
File "/usr/local/bin/manager/utils.py", line 65, in wrapper
out = apply(fn, args, kwargs)
File "/usr/local/bin/manager/discovery.py", line 239, in lock_snapshot
return self.lock(BACKUP_LOCK_KEY, hostname, session_id)
File "/usr/local/bin/manager/utils.py", line 65, in wrapper
out = apply(fn, args, kwargs)
File "/usr/local/bin/manager/discovery.py", line 94, in lock
return self.client.kv.put(key, value, acquire=session_id)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 534, in put
CB.json(), '/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/consul/std.py", line 40, in put
self.session.put(uri, data=data, verify=self.verify)))
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 186, in cb
CB.__status(response, allow_404=allow_404)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 143, in __status
raise ConsulException("%d %s" % (response.code, response.body))
consul.base.ConsulException: 500 rpc error: invalid session "9e196856-5447-f942-a606-d9cf9445713f"
root@1a5d9a1a4ff6:/usr/local/bin# ls /tmp/
backup/ backup.tar mysql-backup-running mysql-session percona-version-check
root@1a5d9a1a4ff6:/usr/local/bin# rm /tmp/mysql-backup-running
root@1a5d9a1a4ff6:/usr/local/bin# python manage.py snapshot_task
171009 13:22:46 innobackupex: Starting the backup operation
....

Failover not reporting to consul

If I bring up the project fresh, then bring up a second mysql slave, then destroy the primary, the failover process appears to work technically if connecting to mysql via CNS but based on the error logs it looks like the onchange container pilot function isn't doing what is expected to inform consul of the primary change.

It's definitely getting past where it was before. I got the mysql primary to come up fine and saw replication working. However when I killed the master to test failover on the slave server I get:

Version: '5.6.34-79.1-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona Server (GPL), Release 79.1, Revision 1c589f9
2017/03/12 22:58:54 INFO manage Setting up replication.
2017-03-12 22:58:54 58858 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem.
2017-03-12 22:58:54 58858 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.168.129.171', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2017-03-12 22:58:54 58858 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2017-03-12 22:58:54 58858 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2017-03-12 22:58:54 58858 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysqld-relay-bin.000001' position: 4
2017-03-12 22:58:54 58858 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'FIRST' at position 4
2017/03/12 22:58:54     2017/03/12 22:58:54 [ERR] http: Request PUT /v1/agent/check/pass/mysql-3709f21efad3?note=ok, error: CheckID "mysql-3709f21efad3" does not have associated TTL from=127.0.0.1:45860
2017/03/12 22:58:54 Unexpected response code: 500 (CheckID "mysql-3709f21efad3" does not have associated TTL)
Service not registered, registering...
2017/03/12 22:58:54     2017/03/12 22:58:54 [INFO] agent: Synced service 'mysql-3709f21efad3'
2017/03/12 22:58:54     2017/03/12 22:58:54 [INFO] agent: Synced check 'mysql-3709f21efad3'
2017/03/12 22:58:54     2017/03/12 22:58:54 [INFO] agent: Synced check 'mysql-3709f21efad3'
2017/03/12 23:04:39 INFO manage [on_change] Executing failover with candidates: [u'192.168.129.173']
2017-03-12 23:04:39 58858 [Note] Error reading relay log event: slave SQL thread was killed
2017-03-12 23:04:39 58858 [Note] Slave I/O thread killed while reading event
2017-03-12 23:04:39 58858 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000001', position 1209
2017/03/12 23:04:39 WARNING: Using a password on the command line interface can be insecure.
2017/03/12 23:04:39 # Checking privileges.
2017/03/12 23:04:39 # Checking privileges on candidates.
2017/03/12 23:04:39 # Performing failover.
2017/03/12 23:04:39 # Candidate slave 192.168.129.173:3306 will become the new master.
2017/03/12 23:04:39 # Checking slaves status (before failover).
2017/03/12 23:04:39 # Preparing candidate for failover.
2017/03/12 23:04:39 # Creating replication user if it does not exist.
2017/03/12 23:04:39 # Stopping slaves.
2017/03/12 23:04:39 # Performing STOP on all slaves.
2017/03/12 23:04:39 # Switching slaves to new master.
2017/03/12 23:04:39 # Disconnecting new master as slave.
2017/03/12 23:04:39 # Starting slaves.
2017/03/12 23:04:39 # Performing START on all slaves.
2017/03/12 23:04:39 # Checking slaves for errors.
2017/03/12 23:04:39 # Failover complete.
2017/03/12 23:04:39 #
2017/03/12 23:04:39 # Replication Topology Health:
2017/03/12 23:04:39 +------------------+-------+---------+--------+------------+---------+
2017/03/12 23:04:39 | host             | port  | role    | state  | gtid_mode  | health  |
2017/03/12 23:04:39 +------------------+-------+---------+--------+------------+---------+
2017/03/12 23:04:39 | 192.168.129.173  | 3306  | MASTER  | UP     | ON         | OK      |
2017/03/12 23:04:39 +------------------+-------+---------+--------+------------+---------+
2017/03/12 23:04:39 # ...done.
2017/03/12 23:04:40 ERROR manage [on_change] this node is neither primary or replica after failover; check replication status on cluster.
2017/03/12 23:04:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:04:45     2017/03/12 23:04:45 [INFO] memberlist: Suspect a5b06f708612 has failed, no acks received
2017/03/12 23:04:48     2017/03/12 23:04:48 [INFO] memberlist: Suspect a5b06f708612 has failed, no acks received
2017/03/12 23:04:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:04:50     2017/03/12 23:04:50 [INFO] serf: EventMemberFailed: a5b06f708612 192.168.129.171
2017/03/12 23:04:50     2017/03/12 23:04:50 [INFO] memberlist: Suspect a5b06f708612 has failed, no acks received
2017/03/12 23:04:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:04:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:04     2017/03/12 23:05:04 [WARN] agent: Check 'mysql-3709f21efad3' missed TTL, is now critical
2017/03/12 23:05:04     2017/03/12 23:05:04 [INFO] agent: Synced check 'mysql-3709f21efad3'
2017/03/12 23:05:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:48     2017/03/12 23:05:48 [INFO] serf: attempting reconnect to a5b06f708612 192.168.129.171:8301
2017/03/12 23:05:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:58     2017/03/12 23:06:58 [INFO] serf: attempting reconnect to a5b06f708612 192.168.129.171:8301
2017/03/12 23:06:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:38     2017/03/12 23:07:38 [INFO] serf: attempting reconnect to a5b06f708612 192.168.129.171:8301
2017/03/12 23:07:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:54 ERROR manage Cannot determine MySQL state; failing health check.

Optimize Dockerfile/image size

I believe the following portion of the Dockerfile can be combined into a single RUN statement that removes gcc and python-dev after doing the pip install, similarly to https://github.com/autopilotpattern/nfsserver/blob/wip/Dockerfile#L8-L19 .

RUN apt-get update \
    && apt-get install -y \
        python \
        python-dev \
        gcc \
        curl \
        percona-xtrabackup \
    && rm -rf /var/lib/apt/lists/*

# get Python drivers MySQL, Consul, and Manta
RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \
    python get-pip.py && \
    pip install \
        PyMySQL==0.6.7 \
        python-Consul==0.4.7 \
        manta==2.5.0

Improve logging coverage and flexibility

In #28, debugging the problem could have been improved by having better logging in the preStart handler. Also, in #28 (comment) @tgross wrote:

It looks like there's a couple spots we could hook in more logging too. The balance of debug vs info has been hard to get right; I'm wondering if we should keep the last N messages of a DEBUG level in a ring log with some mechanism to dump them to stdout even if the instances are logging at INFO.

Because we don't have a persistent daemon for the manage.py this might be easier said than done. We could always just have a file that we truncate if it gets above size N. Or perhaps just move all the logging around replication setup to INFO, although I'm not wild about that solution either.

manage.py is writing invalid value for the innodb_buffer_pool_size

The manage.py script is writing an invalid value to the config file when it starts up. According to the documentation for the innodb_buffer_pool_size, the number is in bytes. The script is writing MB (it divides the memory size KB by 1024 to get to MB).

On a 4GB memory server, it is writing 2867 which is below the minimum of 5242880. The default value for this field is 128MB (134217728).

Examining the docker log shows that the size is the minimum value

2016-05-23 15:59:05 39336 [Note] InnoDB: Initializing buffer pool, size = 5M

An alternative would be to change the my.cnf.tmpl file to specify the substitution like so to include the units:
innodb_buffer_pool_size = ${buffer}M

https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_buffer_pool_size

from manage.py:

        # replace innodb_buffer_pool_size value from environment
        # or use a sensible default (70% of available physical memory)
        innodb_buffer_pool_size = int(get_environ('INNODB_BUFFER_POOL_SIZE', 0))
        if not innodb_buffer_pool_size:
            with open('/proc/meminfo', 'r') as memInfoFile:
                memInfo = memInfoFile.read()
                base = re.search(r'^MemTotal: *(\d+)', memInfo).group(1)
                innodb_buffer_pool_size = int((int(base) / 1024) * 0.7)

        # replace server-id with ID derived from hostname
        # ref https://dev.mysql.com/doc/refman/5.7/en/replication-configuration.html
        hostname = socket.gethostname()
        server_id = int(str(hostname)[:4], 16)

        with open('/etc/my.cnf.tmpl', 'r') as f:
            template = string.Template(f.read())
            rendered = template.substitute(buffer=innodb_buffer_pool_size,
                                           server_id=server_id,
                                           hostname=hostname)

Is data center awareness in primary election important?

I've been been working on how to operate Autopilot Pattern apps across multiple data centers (geographically distinct data centers connected over a WAN). In Consul, that led to a data center naming question autopilotpattern/consul#23, and others.

As I explore how to do this in MySQL (using autopilotpattern/wordpress#27 as the scenario), I'm trying to determine the importance of data center awareness. On the one hand, it's important to have a solid strategy for recovering from complete data center failures. On the other, the risk of split brain scenarios grows dramatically over a WAN.

For the purpose of this question and the scenario in autopilotpattern/wordpress#27, let's assume a standard master-replica replication topology (not multi-master, not sharded).

From a data center that's remote from the primary, how can we determine the difference between a failure of the primary, the failure of the entire data center the primary is in, or a network partition of the two data centers?

Move snapshotting to periodic task

Per this TODO we want to move the snapshotting process into a periodic task. Support for this landed in ContainerPilot 2.1.0.

Replication failure

If I create a master database by itself

docker-compose up -d

Then connect to this database and load in some initial content. I have two databases. One is about .5Gb the other about 1Gb.

Then when I try to create a slave

docker-compose scale mysql=2

The replication fails with a line about Can't create database because it exists. I'm guessing this is maybe a race condition between the initial backup and replication starting. I've reproduced this scenario twice in a row.

No mutual exclusion enforced for MySQL backups

The create_backup command does not enforce mutual exclusion of itself. If the backup is triggered in the onhealth hook, and this hook gets triggered again before the backup is finished, multiple backup jobs will be running concurrently.

Adding a unix file lock should prevent this from happening.

Update Compose files to 2.0 format

#9 demonstrated Docker Compose format 2.0 for compatibility with Docker Swarm, but #14 made changes that conflict with #9.

New plan:

Update docker-compose.yml and local-compose.yml to the 2.0 format
local-compose.yml extends docker-compose.yml
Use network_mode: bridge in docker-compose.yml

docker-compose.yml will work on Triton, local-compose.yml will work in local Docker and Swarm environments. After the completion of https://smartos.org/bugview/DOCKER-723 and related tickets, we can drop network_mode: bridge from docker-compose.yml.

Improve docs around Manta keys

MANTA_KEY_ID

MANTA_KEY_ID must be the md5 formatted key fingerprint. A SHA256 will result in errors.

This one-liner generates the correct fingerprint:

ssh-keygen -E md5 -lf <ssh key path> | awk '{print substr($2,5)}

MANTA_PRIVATE_KEY

MANTA_PRIVATE_KEY must be the whole private key, not a path. I do not know of a way to put the whole, multi-line key into the _env file, but this will put it into the local environment:

export MANTA_PRIVATE_KEY=`cat ~/.ssh/id_rsa`

Failover lost master

I found a scenario where the cluster looses its master.

It occurred when:

I had 3 nodes running healthily, remote consul, static root password
I killed the master
Failover started on 37
mysqlrpladmin on 37 decided that 36 should be the master
36 detected that he is the new master
36 creates a new containerpilot.json with the service 'mysql-primary`
Then 36 runs containerpilot -reload
This causes mysql to stop and start
When mysql comes back up mysql doesn't have a record of primary
Also when reading from /v1/kv/mysql-primary there is no result

failover.log
servers

docker compose name: mysql_4 hostname: mysql-37f99a0a7a84 IP:192.168.128.236
docker compose name: mysql_5 hostname: mysql-363deb257281 IP:192.168.128.235

The fail over works great if the node that gets the fail-over lock also wins the mysqlrpladmin poll.

Implement leave_on_terminate for Consul co-process

Following from TritonDataCenter/containerpilot#218: When a Consul agent running as a co-process in a ContainerPiloted application exits, it is not de-registered from the Consul cluster.

@fitz123 suggests the leave_on_terminate Consul option will fix that. This ticket proposes implementing that option in this repo and for others in https://github.com/autopilotpattern.

Usage without Manta possible?

When starting up the compose file I get the following error because I didn't configure Manta.

config_consul_1 is up-to-date
Starting config_mysql_1
Attaching to config_mysql_1
mysql_1 | Traceback (most recent call last):
mysql_1 |   File "/bin/triton-mysql.py", line 953, in <module>
mysql_1 |     manta_config = Manta()
mysql_1 |   File "/bin/triton-mysql.py", line 189, in __init__
mysql_1 |     signer=self.signer)
mysql_1 |   File "/usr/local/lib/python2.7/dist-packages/manta/client.py", line 140, in __init__
mysql_1 |     assert account, 'account'
mysql_1 | AssertionError: account

Is it a hard dependency to use Manta? I read that the Manta environment variables are optional but I didn't find a way to disable it.

Abstract object storage interaction

Given our goal to enable local development and portability, we might consider abstracting away the object storage interaction from manage.py inside the MySQL container.

In an offline conversation previously, I'd proposed doing the MySQL backups to a container serving WebDAV and accessed via https://github.com/amnong/easywebdav (or some other non-filesystem client library). The WebDAV container could then own the responsibility of interacting with the object store.

This would work on a laptop without any internet connection, and in private clouds where there's no intention of sending the backups off-site nor of setting up a local object store.

INNODB_BUFFER_POOL_SIZE env var unit aren't documented

The README sort of implies that INNODB_BUFFER_POOL_SIZE is directly used as the value of innodb_buffer_pool_size. But the variable is actually the size of the pool in MB.

Replicas did not reconnect

This could be a Consul issue, but I noticed that after leader re-election, there is "mysql-primary" service in Consul Web UI, but the following code (bin/triton-mysql.py)

nodes = consul.health.service(PRIMARY_KEY, passing=True)[1]

kept returning an empty set. This way replicas could not connect to master.

Implement ContainerPilot telemetry

ContainerPilot 2.0 introduced a telemetry feature that would be very useful for monitoring this application.

TritonDataCenter/containerpilot#27 proposed the following gauge:

The count of MySQL Query entries from SHOW PROCESSLIST that are in any Waiting state. 0 is great. 1 or above can be trouble. 10 or more is probably critical.

There are other MySQL-specific stats that would be very useful in scaling decisions. How would we write those sensors?

Should the blueprint create the Manta dir if it doesn't exist?

To avoid this error on paths/directories that don't yet exist:

Traceback (most recent call last):
  File "/bin/triton-mysql.py", line 957, in <module>
    locals()[sys.argv[1]]()
  File "/bin/triton-mysql.py", line 387, in create_snapshot
    manta_config.put_backup(backup_id, '/tmp/backup.tar')
  File "/bin/triton-mysql.py", line 200, in put_backup
    self.client.put_object(mpath, file=f)
  File "/usr/local/lib/python2.7/dist-packages/manta/client.py", line 353, in put_object
    raise errors.MantaAPIError(res, content)
manta.errors.MantaAPIError: (DirectoryDoesNotExist) /<username>/stor/triton-mysql does not exist

Minio restarting causes unhealthy mysql-primary

In my environment I'm using docker-compose and when the minio storage servers need to restart that causes the mysql-primary service in consul to go unhealthy. This is because in the manage.main() method it creates a Minio() object, and the init of Minio() actually goes out to minio to make a bucket. The mysql servers are staying up and I would expect them to stay healthy despite the snapshot storage going down, as long as mysql isn't currently trying to backup or restore.

When using Manta() for the snapshots the mysql health check doesn't make a connection to Manta so if manta goes down it doesn't bring down the mysql with it. I think the Minio snapshot storage system should operate the same way.

InnoDB restore in the future

When a second mysql instance spins up it pulls the snapshot from minio and restores it. During the restore there are a bunch of errors in the logs about "InnoDB: is in the future!"

2017-10-19T13:24:26.436270193Z 2017-10-19 13:24:26 7ffffeda0740 InnoDB: Error: page 330 log sequence number 34726623
2017-10-19T13:24:26.43664837Z InnoDB: is in the future! Current system log sequence number 1633292.
2017-10-19T13:24:26.437019222Z InnoDB: for more information.
2017-10-19T13:24:26.437523598Z InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

I have found that when the primary started up this is in the logs

ERROR manage mysql_tzinfo_to_sql returned error: [Errno 2] No such file or directory

Coprocesses for Consul

We want to run Consul using the same co-processes configuration as we have in https://github.com/autopilotpattern/nginx. Complications to this include:

Just as with Nginx, during our preStart we need to query the infrastructure Consul (set via the CONSUL env var) rather than the local Consul agent because it won't be started yet.
We need to handle patching /etc/containerpilot.json with the environment variables when we load it in per this TODO.

We should also cover #29 in the process of doing this one.