Code Monkey home page Code Monkey logo

skyring's Introduction

skyring

Travis branch npm npm David Docker Repository on Quay

Skyring

A distributed reliable timer service providing setTimeout functionality in a distributed fashion. Skyring servers are clustered into a hashring using consistent hashing to partition timers to specific nodes in the ring. Skyring exposes a simple HTTP API that allows to you create and cancel timers. Timer execution comes in to the form of an HTTP webhook ( more transports to come )

  • Pluggable transports (timer execution)
  • Plugable Storage (crash recovery + balancing)
  • Auto Rebalancing
  • Crash Recovery

Architecture Overview

Examples

A request can be issued to any active node in the cluster. If that node is not responsible for the timer in question, it will forward the request directly to the node that is keeping network latency to a minimum. This makes Skyring very suitable for high performance, stateless, and distributed environments. The minimum recommended cluster size is 3 nodes, 2 of which being seed or bootstrapping nodes. A cluster of this size can average between 2K - 5K requests per second.

Create a timer

POST /timer

Request

Since timers managed in Skyring are done so through the use of setTimeout, there is a maximum timeout value of 2^31 - 1 or 2147483647 milliseconds, which is approximately 24.8 days. Attempting to request a timeout great than this value will result in a 400 Bad Request response. Additionally, the timeout must be greater than 0.

curl -i -XPOST http://localhost:8080/timer -d '{
  "timeout": 6000,
  "data" : "{\"foo\":\"bar\"}",
  "callback": {
    "transport": "http",
    "method": "post",
    "uri": "http://api.someservice.com/hook/timeout"
  }
}'

Contributing

Skyring is a monorepo managed by pnpm. Clone the repo and bootstrap the project

$ git clone https://github.com/esatterwhite/skyring.git project-skyring
$ pnpm install -r
$ docker-compose -f compose/nats.yml up -d
$ pnpm test

Packages

License

MIT Licensed, Copyright (c) 2020 Eric Satterwhite

skyring's People

Contributors

alexhauser23 avatar benfleis avatar bhenhsi avatar btromanova avatar camlegleiter avatar charliezhang avatar corgiman avatar dansimau avatar davejn avatar dependant-bot avatar esatterwhite avatar iandialsource avatar jcorbin avatar jonjs avatar jwolski avatar jwolski2 avatar kriskowal avatar lupie avatar markyen avatar mennopruijssers avatar motiejus avatar raynos avatar sashahilton00 avatar severb avatar shannili avatar thanodnl avatar theconnman avatar toddsifleet avatar weikai77 avatar yulunli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

skyring's Issues

Document minimum/maximum allowed `timeout` value

It's not explicitly documented in either the API or README/Module docs, but given that skyring uses setTimeout internally for managing timers within each node, there is a minimum/maximum timeout allowed for setTimeout. From the NodeJS docs:

When delay is larger than 2147483647 or less than 1, the delay will be set to 1.

It may be worth noting this limitation in the API and README, and/or potentially updating the timer validator to check for this more explicitly to avoid unexpected behavior.

External Timer Backing Store

I like that each node has the ability to recover if it is restarted, but I would most likely run Skyring in a fully Dockerized environment where each container is transient and has no persistent volume backing it. This means each time a node dies it doesn't get restarted and instead gets recreated somewhere else, causing recovery of that "node" to fail because it's not really the same node.

Do you imagine adding an optional external backing store for timers so the cluster can recover lost timers if a node shuts down suddenly and doesn't gracefully offload its current workload?

Replace request package

The request package is EOL. The requirements of the HTTP transport are rather simple, and the replacement package can also be simple

Error while trying to execute skyring single node

The command

node ./index.js --channel:port=3455 --seeds='localhost:3455'

The error

{"level":30,"time":1631814191501,"pid":31780,"hostname":"ubuntu","name":"skyring","name":"skyringnode_modules:skyring//server:node","msg":"contacting seed nodes [\"127.0.0.1\",\"127.0.0.1:3455\",\"127.0.0.1:3456\"]"}
// Illegal instruction (core dumped)

My index.js file is:

const Skyring = require('skyring');

const server = new Skyring();

const port = process.env.PORT || 4001

function onSignal() {
  server.close(() => {
    console.log('shutting down');
  });
}

console.log(port);
server.listen(port, (err) => {
  if (err) throw err;
  console.log(`Skyring listening at %s', 'http://127.0.0.1:${port}`);
});

process.once('SIGINT', onSignal);
process.once('SIGTERM', onSignal);

I have no idea what can be happening, the strange thing is that runs normally in my local machine but not in my VPS server.

I'm using:
VPS: Ubuntu Server 20.04
NodeJS Version: v14.17.6
Node Gyp: v8.2.0
CPU: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

Class Based transports

Most transport are actually two or three functions. The should be classes.

  • add a base transport class that provides a basic shell
  • update the transports harness to instanciate each transport
  • convert the http transport to a class
  • convert the test callback transport to a class
  • update the transport docs

NatsError when connecting to Nats instances in Kubernetes

While booting up the ring node I am receiving this error.

2018-11-15T22:35:40.275Z skyring:nats creating nats client { servers: [ 'nats://10.28.0.105:4222' ] }
nats error { NatsError: Could not connect to server: Error: connect ECONNREFUSED 10.28.0.105:4222
at Socket. (/opt/skyring/node_modules/nats/lib/nats.js:513:34)
at emitOne (events.js:115:13)
at Socket.emit (events.js:210:7)
at emitErrorNT (internal/streams/destroy.js:64:8)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
name: 'NatsError',
message: 'Could not connect to server: Error: connect ECONNREFUSED 10.28.0.105:4222',
code: 'CONN_ERR',
chainedError:
{ Error: connect ECONNREFUSED 10.28.0.105:4222
at Object._errnoException (util.js:1021:11)
at _exceptionWithHostPort (util.js:1043:20)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1175:14)
code: 'ECONNREFUSED',
errno: 'ECONNREFUSED',
syscall: 'connect',
address: '10.28.0.105',
port: 4222 } }
2018-11-15T22:35:40.309Z skyring:store storage backend ready { backend: 'leveldown',
path: '/var/data/skyring',
createIfMissing: true,
errorIfExists: false }

The nat works properly under this ip and I can exec into a separate pod and connect to the nat using CLI tools. I believe that it is a problem with the wrapper for the nat.js but I am having trouble finding the exact issue.

Node 10 support

Currently there are some dependencies that do not build against node 10.
Primarily farmhash which is a required by ringpop.
Updating the version of farmhash locally is a breaking change for ringpop.

uber hasn't been very responsive with the node version of ringpop. I think the only course of acting is going to be to maintain a fork

The automated release is failing 🚨

🚨 The automated release from the master branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this πŸ’ͺ.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


No npm token specified.

An npm token must be created and set in the NPM_TOKEN environment variable on your CI environment.

Please make sure to create an npm token and to set it in the NPM_TOKEN environment variable on your CI environment. The token must allow to publish to the registry https://registry.npmjs.org/.


Good luck with your project ✨

Your semantic-release bot πŸ“¦πŸš€

The automated release is failing 🚨

🚨 The automated release from the master branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this πŸ’ͺ.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


No npm token specified.

An npm token must be created and set in the NPM_TOKEN environment variable on your CI environment.

Please make sure to create an npm token and to set it in the NPM_TOKEN environment variable on your CI environment. The token must allow to publish to the registry https://registry.npmjs.org/.


Good luck with your project ✨

Your semantic-release bot πŸ“¦πŸš€

Timers getting reset on double recovery

I am using Mongo as a persistent backend. This issue is more of an issue in Kubernetes where the nodes instantly will come back online in under a second if it is killed.

  1. If a node dies sometimes the rebalance will not occur at all (rare). But when the node comes back online it will pick up the timer at the correct timing if no rebalancing occurred.

  2. If a node dies and rebalancing occurs and the original node does not come back up the timer will still be accurate on the node that rebalanced it.

The issue that occurs is when a node dies and rebalancing occurs, if the original node comes back online the timer will be rescheduled back to the original node and at this point the timer will be reset back to 0 and go through the entire timeout again.

The automated release is failing 🚨

🚨 The automated release from the main branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this πŸ’ͺ.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the main branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Docker authentication failed

Both ENV vars DOCKER_REGISTRY_USER and DOCKER_REGISTRY_PASSWORD must be set


Good luck with your project ✨

Your semantic-release bot πŸ“¦πŸš€

Run tests in parallel

Currently, the tests are run be spinning up all of the development dependencies first and running a global test runner on the local code w/ tap. While this is generally ok, it is slow.

With the migration to pnpm, the ability to run the tests independently and in parallel is possible.

Code coverage would need to be aggregated and published

There is a github action for this

Replace debug with pino

debug, although easy to use, is hard to make sense of. Additionally, if you are using this project, you have to know how to enable logs for each module. This is less than ideal.

Errors can also be hard to catch unless you enable debug logs for everything.

Can not start the cluster using docker compose

I am running the MongoDB example with docker compose. I can not get the nodes up and running. the nodes exits with the following ERROR.

keef:conf unable to load /opt/skyring configuration: Cannot find module '/opt/skyring/conf'

I dont see an 'opt' folder in github. Am I missing something

stack trace

node-2_1  | 2020-08-07T08:49:52.352Z keef:conf project root set to /opt/skyring
node-2_1  | 2020-08-07T08:49:52.353Z keef:conf package path set to /opt/skyring/packages
node-2_1  | 2020-08-07T08:49:52.364Z keef:conf loading config file `nenv`: /opt/skyring/z-skyring.development.json
node-2_1  | 2020-08-07T08:49:52.384Z keef:conf loading config file `project`: /opt/skyring/z-skyring.json
node-2_1  | 2020-08-07T08:49:52.385Z keef:conf loading config file `home`: /root/.config/z-skyring.json
node-2_1  | 2020-08-07T08:49:52.386Z keef:conf loading config file `etc`: /etc/z-skyring.json
node-2_1  | 2020-08-07T08:49:52.387Z keef:conf setting config defaults
node-2_1  | 2020-08-07T08:49:52.387Z keef:conf configuration modules /opt/skyring
node-2_1  | 2020-08-07T08:49:52.388Z keef:conf unable to load /opt/skyring configuration: Cannot find module '/opt/skyring/conf'
node-2_1  | Require stack:
node-2_1  | - /opt/skyring/node_modules/keef/index.js
node-2_1  | - /opt/skyring/node_modules/skyring/conf/index.js
node-2_1  | - /opt/skyring/node_modules/skyring/index.js
node-2_1  | - /opt/skyring/index.js
node-2_1  | 2020-08-07T08:49:52.388Z keef:conf etcd config:  undefined
node-2_1  | internal/modules/cjs/loader.js:1188
node-2_1  |   return process.dlopen(module, path.toNamespacedPath(filename));
node-2_1  |                  ^
node-2_1  | 
node-2_1  | Error: /opt/skyring/node_modules/farmhash/build/Release/farmhash.node: invalid ELF header
node-2_1  |     at Object.Module._extensions..node (internal/modules/cjs/loader.js:1188:18)
node-2_1  |     at Module.load (internal/modules/cjs/loader.js:986:32)
node-2_1  |     at Function.Module._load (internal/modules/cjs/loader.js:879:14)
node-2_1  |     at Module.require (internal/modules/cjs/loader.js:1026:19)
node-2_1  |     at require (internal/modules/cjs/helpers.js:72:18)
node-2_1  |     at Object.<anonymous> (/opt/skyring/node_modules/farmhash/index.js:3:18)
node-2_1  |     at Module._compile (internal/modules/cjs/loader.js:1138:30)
node-2_1  |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1158:10)
node-2_1  |     at Module.load (internal/modules/cjs/loader.js:986:32)
node-2_1  |     at Function.Module._load (internal/modules/cjs/loader.js:879:14)

implement semantic release

Implement semantic release for all publishable packages in the mono repo

Will need to publish skyring npm package + docker image
npm packages for zmq transport, tcp transport, scylladown adapter.

Should implement

  • multi-release package
  • release config (docker)
  • release config (npm)

Timers executed twice during rebalance

There is a race condition that can happen during a rebalance that will result in a timer being executed twice.

During a rebalance, all of the in-memory timers are cancelled and purged from storage. However, because transports don't have any knowledge of a shutdown or rebalance, a timer executes before it is cancelled or determined to be moved, it will be executed locally on the node, and on the remote node when it arrives.

Timers lost on Shutdown occasionally

Timers will sometimes be lost on shutdown. I am using Mongo for persistency. The timers are never lost in the rebalance process and only ever lost in the shutdown. To reproduce this:

  1. Run 3 nodes then send ~100 timers at the cluster
  2. Check that all timers are present in Mongo
  3. Send SIGTERM to one node
  4. After a complete shutdown check Mongo for loss of timers (Sometimes in trials it would be correct but it was not consistent)
  5. Restart the node back up and check Mongo for correct rebalance

Repeat this process any amount of times and with any node in the cluster. A loss of timers from shutting down will be seen.

Rebalance from Recovery NEVER had errors in a script that constantly repeated this process, only shutdown had loss of timers. The more timers the node held the more profound this loss was from shutting down. I know the node is completely shutting down gracefully as I completely removed the SIGKILL timeout and I can see the logs showing that the node exited properly. I can also see the node successfully logging that it has processed the correct amount of rebalances while it is shutting down.

Timer Rebalance Reset

I ran some tests yesterday and it seemed like when a timer is rebalanced to a new node from a graceful shutdown the timer is fully reset. I set a timer for 60 seconds, shut down the node after 30 seconds, and didn't see the timer expire until 60 seconds later. I expected it to fire after the remaining 30 seconds.

I believe you have this covered in the code below, but wanted to check to see if this was not executing properly.

skyring/lib/timer.js

Lines 460 to 467 in ac5362d

const run = ( obj ) => {
clearTimeout( obj.timer );
batch.del(obj.id);
const data = Object.assign({}, obj.payload, {
id: obj.id
, created: obj.created
, count: ++sent
});

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.