Code Monkey home page Code Monkey logo

resec's Introduction

Gitter Go Report Card Build Status

Resec - Consul based highly available Redis replication agent

Resec - Consul based highly available Redis replication agent

Description

Resec is a successor to Redis Sentinel and redishappy for handling high availability failover for Redis.

It avoids Redis Sentinel problems of remembering all the sentinels and all the redis servers that ever appeared in the replication cluster.

Resec master election is based on Consul Locks to provide single redis master instance.

Resec continuously monitors the status of redis instance and if it's alive, It starts 2 following processes:

  • Monitor service of master for changes
    • if lock is not acquired, on every change of master it runs SLAVE OF Master.Address
  • Trying to acquire lock to became master itself
    • once lock acquired it stops watching for master service changes
    • promotes redis to be SLAVE OF NO ONE

Services and health checks

Resec registers service with TTL health check with TTL twice as big as HEALTHCHECK_INTERVAL and updates consul every HEALTHCHECK_INTERVAL to maintain service in passing state

There are 2 options to work with services:

  • Use CONSUL_SERVICE_NAME for tag based master/slave discovery

    • MASTER_TAGS must be provided for ability to watch master instance for changes.
  • Use CONSUL_SERVICE_PREFIX for service name only based discovery

    • services in consul will look like CONSUL_SERVICE_PREFIX-Replication.Role
  • If ANNOUNCE_ADDR is set it will be used for registration in consul, if it's not provided REDIS_ADDR will be used for registration in consul.

    • If REDIS_ADDR is localhost, only port will be announced to the consul.

Redis Health

  • If redis becomes unhealthy resec will stop the leader election. As soon as redis will become healthy again, resec will start the operation from the beginning.

Usage

Environment variables

Environment Variables Default Description
ANNOUNCE_ADDR IP:Port of Redis to be announced, by default service will be registered wi
CONSUL_SERVICE_NAME Consul service name for tag based service discovery
CONSUL_SERVICE_PREFIX redis Name Prefix, will be followed by "-(master/slave)", ignored if CONSUL_SERVICE_NAME is used
CONSUL_LOCK_KEY resec/.lock KV lock location, should be overriden if multiple instances running in the same consul DC
CONSUL_LOCK_SESSION_NAME resec Lock session Name to distinguish multiple resec masters on one host
CONSUL_LOCK_MONITOR_RETRIES 3 Number of retries of lock receives 500 Error from Consul
CONSUL_LOCK_MONITOR_RETRY_INTERVAL 1s Retry interval if lock receives 500 Error from Consul
CONSUL_DEREGISTER_SERVICE_AFTER 72h
CONSUL_LOCK_TTL 15s
MASTER_TAGS Comma separated list of tags to be added to master instance. The first tag (index 0) is used to configure the role of the Redis/resec task, and must be different from index 0 in SLAVE_TAGS.
SLAVE_TAGS Comma separated list of tags to be added to slave instance. The first tag (index 0) is used to configure the role of the Redis/resec task, and must be different from index 0 in MASTER_TAGS.
HEALTHCHECK_INTERVAL 5s
HEALTHCHECK_TIMEOUT 2s
REDIS_ADDR 127.0.0.1:6379
REDIS_PASSWORD
LOG_LEVEL INFO Options are "DEBUG", "INFO", "WARN", "ERROR"
Environment variables to configure communication with consul are similar to Consul CLI

Permissions

Resec requires permissions for Consul in order to function correctly. The Consul ACL token is passed as the environment variable CONSUL_HTTP_TOKEN .

Consul ACL Token Permissions

If the Consul cluster being used is running ACLs; the following ACL policy will allow Replicator the required access to perform all functions based on its default configuration:

key "resec/" {
  policy = "write"
}
session "" {
  policy = "write"
}
service "" {
  policy = "write"
}

Run the application

  • with nomad:
job "resec" {
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel = 1
    stagger      = "10s"
  }

  group "cache" {
    count = 3

    task "redis" {
      driver = "docker"
      config {
        image = "redis:alpine"
        command = "redis-server"
        args = [
          "/local/redis.conf"
        ]
        port_map {
          db = 6379
        }
      }
      // Let Redis know how much memory he can use not to be killed by OOM
      template {
        data = <<EORC
maxmemory {{ env "NOMAD_MEMORY_LIMIT" | parseInt | subtract 16 }}mb
EORC
        destination   = "local/redis.conf"
      }

      resources {
        cpu    = 500
        memory = 256
        network {
          mbits = 10
          port "db" {}
        }
      }
    }

    task "resec" {
      driver = "docker"
      config {
        image = "yotpo/resec"
      }

      env {
        CONSUL_HTTP_ADDR = "http://${attr.unique.network.ip-address}:8500"
        REDIS_ADDR = "${NOMAD_ADDR_redis_db}"
      }

      resources {
        cpu    = 100
        memory = 64
        network {
          mbits = 10
        }
      }
    }
  }
}
  • with docker-compose.yml:
resec:
  image: yotpo/resec
  environment:
    - CONSUL_HTTP_ADDR=1.2.3.4:8500
    - REDIS_ADDR=redis:6379
  container_name: resec
  • with SystemD:
[Unit]
Description=resec - redis ha replication daemon
Requires=network-online.target
After=network-online.target

[Service]
EnvironmentFile=-/etc/default/resec
ExecStart=/usr/local/bin/resec
KillSignal=SIGQUIT
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Copyright and license

Code released under the MIT license.

resec's People

Contributors

burdandrei avatar dannnir avatar jippi avatar tzahimizrahi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

resec's Issues

Incorrect update of the redis-info to consul

It seems that the redis-info is incorrect updated to consul. The redis info, section 'Replication', on the redis-master server shows two connected slaves:

# Replication
role:master
connected_slaves:2
slave0:ip=10.224.199.37,port=6381,state=online,offset=942807,lag=2
slave1:ip=10.224.199.11,port=6381,state=online,offset=942807,lag=1
master_replid:63e4319d7e22a674e2192e6d1d3fc90e548253fb
master_replid2:5b5e7340f5dd557915f84b8f8b5dfa9d877ca1f6

While the the health check of the same redis-master server in consul shows:

# Replication
role:master
connected_slaves:0
master_replid:63e4319d7e22a674e2192e6d1d3fc90e548253fb
master_replid2:5b5e7340f5dd557915f84b8f8b5dfa9d877ca1f6

Resec was started with these environment vars:

REDIS_ADDR="10.224.199.21:6381"
CONSUL_SERVICE_PREFIX="redis-cache"
CONSUL_LOCK_SESSION_NAME="resec/cache"
CONSUL_LOCK_KEY="resec/cache.lock"

Edit: It seems that pretty much all stats information are affected as well.

Is this intentional, did I miss something or is this a bug?

Split-brain when redis master restarted without restart of its resec

Hello

I found a reproducible situation: when you stop redis process on master, it becomes red in consul, but resec do not removes it from redis-master service.
Then after redis process is restarted, healthcheck sets it to green, and we now have double master setup, with slave connected to one of them.
Restart of resec process on former master resolves issue.

resec's log output from this master:

2018/03/19 17:05:42 [ERROR] Can't connect to redis running on 127.0.0.1:6379
2018/03/19 17:05:42 [INFO] Redis HealthCheck changed to NOT healthy
2018/03/19 17:05:42 [INFO] Received update for master from consul
2018/03/19 17:05:42 [INFO] No redis master services in Consul
2018/03/19 17:05:42 [INFO] Received update for master from consul
2018/03/19 17:05:42 [INFO] Redis master updated in Consul
2018/03/19 17:05:42 [INFO] Received update for master from consul
2018/03/19 17:05:42 [INFO] No redis master services in Consul
2018/03/19 17:05:47 [INFO] Received update for master from consul
2018/03/19 17:05:47 [ERROR] Can't connect to redis running on 127.0.0.1:6379
2018/03/19 17:05:52 [INFO] Redis HealthCheck changed to healthy
2018/03/19 17:05:52 [INFO] Trying to acquire leader lock
2018/03/19 17:05:52 [INFO] Received update for master from consul
2018/03/19 17:05:52 [ERROR] Found more than one master registered in Consul

enslavement should be retried (or don't talk to redis until its ready)

Saw this today in a Resec node, Redis was OOM killed and was restarted

2018/07/19 07:26:52 [INFO] Start!
2018/07/19 07:26:52 [INFO] Redis master updated in Consul
2018/07/19 07:26:57 [INFO] Redis status changed to healthy
2018/07/19 07:26:57 [ERROR] Could not enslave redis 10.0.157.245:6388 to be slave of 10.0.75.76:6388 (LOADING Redis is loading the dataset in memory)

Redis was still loading data from disk (~12 GB) when Resec tried to enslave it - this failed because redis wasn't ready to accept commands

Resec did not retry enslavement, so the node never got enslaved and never registered in consul as a slave

Consul invalid duration error

When using resec (v0.5.1) with consul 1.4.1 I'm getting a lot of errors like:
time="2019-02-01T15:58:01Z" level=error msg="Consul error: time: invalid duration " system=consul
message repeated 18 times: [ time="2019-02-01T15:58:01Z" level=error msg="Consul error: time: invalid duration " system=consul]
Maybe there were some breaking changes done in consul api in 1.4.x version.
Also I've noticed that the resec version is a hard-coded string (x.y.z). Maybe it will be a good idea to add the right version to make bug reporting more accurate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.