esatterwhite / skyring Goto Github PK

View Code? Open in Web Editor NEW

30.0 4.0 4.0 5.08 MB

Distributed timers as a service - for Node.js

Home Page: https://esatterwhite.github.io/skyring/

License: MIT License

JavaScript 99.26% Dockerfile 0.29% Shell 0.45%

cluster nodejs distributed-systems gossip timers

skyring's Introduction

Skyring

A distributed reliable timer service providing setTimeout functionality in a distributed fashion. Skyring servers are clustered into a hashring using consistent hashing to partition timers to specific nodes in the ring. Skyring exposes a simple HTTP API that allows to you create and cancel timers. Timer execution comes in to the form of an HTTP webhook ( more transports to come )

Pluggable transports (timer execution)
Plugable Storage (crash recovery + balancing)
Auto Rebalancing
Crash Recovery

Architecture Overview

Examples

A request can be issued to any active node in the cluster. If that node is not responsible for the timer in question, it will forward the request directly to the node that is keeping network latency to a minimum. This makes Skyring very suitable for high performance, stateless, and distributed environments. The minimum recommended cluster size is 3 nodes, 2 of which being seed or bootstrapping nodes. A cluster of this size can average between 2K - 5K requests per second.

Create a timer

POST `/timer`

Request

Since timers managed in Skyring are done so through the use of setTimeout, there is a maximum timeout value of 2^31 - 1 or 2147483647 milliseconds, which is approximately 24.8 days. Attempting to request a timeout great than this value will result in a 400 Bad Request response. Additionally, the timeout must be greater than 0.

curl -i -XPOST http://localhost:8080/timer -d '{
  "timeout": 6000,
  "data" : "{\"foo\":\"bar\"}",
  "callback": {
    "transport": "http",
    "method": "post",
    "uri": "http://api.someservice.com/hook/timeout"
  }
}'

Contributing

Skyring is a monorepo managed by pnpm. Clone the repo and bootstrap the project

$ git clone https://github.com/esatterwhite/skyring.git project-skyring
$ pnpm install -r
$ docker-compose -f compose/nats.yml up -d
$ pnpm test

Packages

Skyring: Primary server
TCP Transport: TCP Timer transport
ZeroMQ Transport: ZMQ Timer transport
Ringpop: Gossip Clustering
Scylladown: Scyalldb backed timer Storage

License

skyring's People

Contributors

Stargazers

Watchers

Forkers

camlegleiter iaskebba gintian komal-kothari

skyring's Issues

Document minimum/maximum allowed `timeout` value

It's not explicitly documented in either the API or README/Module docs, but given that skyring uses setTimeout internally for managing timers within each node, there is a minimum/maximum timeout allowed for setTimeout. From the NodeJS docs:

When delay is larger than 2147483647 or less than 1, the delay will be set to 1.

It may be worth noting this limitation in the API and README, and/or potentially updating the timer validator to check for this more explicitly to avoid unexpected behavior.

External Timer Backing Store

I like that each node has the ability to recover if it is restarted, but I would most likely run Skyring in a fully Dockerized environment where each container is transient and has no persistent volume backing it. This means each time a node dies it doesn't get restarted and instead gets recreated somewhere else, causing recovery of that "node" to fail because it's not really the same node.

Do you imagine adding an optional external backing store for timers so the cluster can recover lost timers if a node shuts down suddenly and doesn't gracefully offload its current workload?

Replace request package

The request package is EOL. The requirements of the HTTP transport are rather simple, and the replacement package can also be simple

Storage error may not bubble up

https://github.com/esatterwhite/skyring/blob/master/lib/server/index.js#L86-L90

On server start up, it is possible for the storage backed to error when connecting and never bubbled up.
This timer set up should be initialed in the load or listen function where an error can be handled.

Error while trying to execute skyring single node

The command

node ./index.js --channel:port=3455 --seeds='localhost:3455'

The error

{"level":30,"time":1631814191501,"pid":31780,"hostname":"ubuntu","name":"skyring","name":"skyringnode_modules:skyring//server:node","msg":"contacting seed nodes [\"127.0.0.1\",\"127.0.0.1:3455\",\"127.0.0.1:3456\"]"}
// Illegal instruction (core dumped)

My index.js file is:

const Skyring = require('skyring');

const server = new Skyring();

const port = process.env.PORT || 4001

function onSignal() {
  server.close(() => {
    console.log('shutting down');
  });
}

console.log(port);
server.listen(port, (err) => {
  if (err) throw err;
  console.log(`Skyring listening at %s', 'http://127.0.0.1:${port}`);
});

process.once('SIGINT', onSignal);
process.once('SIGTERM', onSignal);

I have no idea what can be happening, the strange thing is that runs normally in my local machine but not in my VPS server.

I'm using:
VPS: Ubuntu Server 20.04
NodeJS Version: v14.17.6
Node Gyp: v8.2.0
CPU: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

Class Based transports

Most transport are actually two or three functions. The should be classes.

add a base transport class that provides a basic shell
update the transports harness to instanciate each transport
convert the http transport to a class
convert the test callback transport to a class
update the transport docs

how to run this in kubernetes?

Any suggestion how to run this in kubernetes? Appreciate your help!

NatsError when connecting to Nats instances in Kubernetes

While booting up the ring node I am receiving this error.

2018-11-15T22:35:40.275Z skyring:nats creating nats client { servers: [ 'nats://10.28.0.105:4222' ] }
nats error { NatsError: Could not connect to server: Error: connect ECONNREFUSED 10.28.0.105:4222
at Socket. (/opt/skyring/node_modules/nats/lib/nats.js:513:34)
at emitOne (events.js:115:13)
at Socket.emit (events.js:210:7)
at emitErrorNT (internal/streams/destroy.js:64:8)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
name: 'NatsError',
message: 'Could not connect to server: Error: connect ECONNREFUSED 10.28.0.105:4222',
code: 'CONN_ERR',
chainedError:
{ Error: connect ECONNREFUSED 10.28.0.105:4222
at Object._errnoException (util.js:1021:11)
at _exceptionWithHostPort (util.js:1043:20)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1175:14)
code: 'ECONNREFUSED',
errno: 'ECONNREFUSED',
syscall: 'connect',
address: '10.28.0.105',
port: 4222 } }
2018-11-15T22:35:40.309Z skyring:store storage backend ready { backend: 'leveldown',
path: '/var/data/skyring',
createIfMissing: true,
errorIfExists: false }

The nat works properly under this ip and I can exec into a separate pod and connect to the nat using CLI tools. I believe that it is a problem with the wrapper for the nat.js but I am having trouble finding the exact issue.

Use firebase as backend

https://www.npmjs.com/package/firebase-down
how to use firebase-down instead of mongo-down?

Node 10 support

Currently there are some dependencies that do not build against node 10.
Primarily farmhash which is a required by ringpop.
Updating the version of farmhash locally is a breaking change for ringpop.

uber hasn't been very responsive with the node version of ringpop. I think the only course of acting is going to be to maintain a fork

The automated release is failing 🚨

🚨 The automated release from the `master` branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.

No npm token specified.

An npm token must be created and set in the NPM_TOKEN environment variable on your CI environment.

Please make sure to create an npm token and to set it in the NPM_TOKEN environment variable on your CI environment. The token must allow to publish to the registry https://registry.npmjs.org/.

Good luck with your project ✨

Your semantic-release bot 📦🚀

The automated release is failing 🚨

🚨 The automated release from the `master` branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.

No npm token specified.

An npm token must be created and set in the NPM_TOKEN environment variable on your CI environment.

Please make sure to create an npm token and to set it in the NPM_TOKEN environment variable on your CI environment. The token must allow to publish to the registry https://registry.npmjs.org/.

Good luck with your project ✨

Your semantic-release bot 📦🚀

Timers getting reset on double recovery

I am using Mongo as a persistent backend. This issue is more of an issue in Kubernetes where the nodes instantly will come back online in under a second if it is killed.

If a node dies sometimes the rebalance will not occur at all (rare). But when the node comes back online it will pick up the timer at the correct timing if no rebalancing occurred.
If a node dies and rebalancing occurs and the original node does not come back up the timer will still be accurate on the node that rebalanced it.

The issue that occurs is when a node dies and rebalancing occurs, if the original node comes back online the timer will be rescheduled back to the original node and at this point the timer will be reset back to 0 and go through the entire timeout again.

The automated release is failing 🚨

🚨 The automated release from the `main` branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this 💪.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the main branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.

Docker authentication failed

Both ENV vars DOCKER_REGISTRY_USER and DOCKER_REGISTRY_PASSWORD must be set

Good luck with your project ✨

Your semantic-release bot 📦🚀

Run tests in parallel

Currently, the tests are run be spinning up all of the development dependencies first and running a global test runner on the local code w/ tap. While this is generally ok, it is slow.

With the migration to pnpm, the ability to run the tests independently and in parallel is possible.

Code coverage would need to be aggregated and published

There is a github action for this

Replace debug with pino

debug, although easy to use, is hard to make sense of. Additionally, if you are using this project, you have to know how to enable logs for each module. This is less than ideal.

Errors can also be hard to catch unless you enable debug logs for everything.

migrate tcp transport to class

Update the supported TCP transport to class based transport

Can not start the cluster using docker compose

I am running the MongoDB example with docker compose. I can not get the nodes up and running. the nodes exits with the following ERROR.

keef:conf unable to load /opt/skyring configuration: Cannot find module '/opt/skyring/conf'

I dont see an 'opt' folder in github. Am I missing something

stack trace

node-2_1  | 2020-08-07T08:49:52.352Z keef:conf project root set to /opt/skyring
node-2_1  | 2020-08-07T08:49:52.353Z keef:conf package path set to /opt/skyring/packages
node-2_1  | 2020-08-07T08:49:52.364Z keef:conf loading config file `nenv`: /opt/skyring/z-skyring.development.json
node-2_1  | 2020-08-07T08:49:52.384Z keef:conf loading config file `project`: /opt/skyring/z-skyring.json
node-2_1  | 2020-08-07T08:49:52.385Z keef:conf loading config file `home`: /root/.config/z-skyring.json
node-2_1  | 2020-08-07T08:49:52.386Z keef:conf loading config file `etc`: /etc/z-skyring.json
node-2_1  | 2020-08-07T08:49:52.387Z keef:conf setting config defaults
node-2_1  | 2020-08-07T08:49:52.387Z keef:conf configuration modules /opt/skyring
node-2_1  | 2020-08-07T08:49:52.388Z keef:conf unable to load /opt/skyring configuration: Cannot find module '/opt/skyring/conf'
node-2_1  | Require stack:
node-2_1  | - /opt/skyring/node_modules/keef/index.js
node-2_1  | - /opt/skyring/node_modules/skyring/conf/index.js
node-2_1  | - /opt/skyring/node_modules/skyring/index.js
node-2_1  | - /opt/skyring/index.js
node-2_1  | 2020-08-07T08:49:52.388Z keef:conf etcd config:  undefined
node-2_1  | internal/modules/cjs/loader.js:1188
node-2_1  |   return process.dlopen(module, path.toNamespacedPath(filename));
node-2_1  |                  ^
node-2_1  | 
node-2_1  | Error: /opt/skyring/node_modules/farmhash/build/Release/farmhash.node: invalid ELF header
node-2_1  |     at Object.Module._extensions..node (internal/modules/cjs/loader.js:1188:18)
node-2_1  |     at Module.load (internal/modules/cjs/loader.js:986:32)
node-2_1  |     at Function.Module._load (internal/modules/cjs/loader.js:879:14)
node-2_1  |     at Module.require (internal/modules/cjs/loader.js:1026:19)
node-2_1  |     at require (internal/modules/cjs/helpers.js:72:18)
node-2_1  |     at Object.<anonymous> (/opt/skyring/node_modules/farmhash/index.js:3:18)
node-2_1  |     at Module._compile (internal/modules/cjs/loader.js:1138:30)
node-2_1  |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:1158:10)
node-2_1  |     at Module.load (internal/modules/cjs/loader.js:986:32)
node-2_1  |     at Function.Module._load (internal/modules/cjs/loader.js:879:14)

implement semantic release

Implement semantic release for all publishable packages in the mono repo

Will need to publish skyring npm package + docker image
npm packages for zmq transport, tcp transport, scylladown adapter.

Should implement

multi-release package
release config (docker)
release config (npm)

Timers executed twice during rebalance

There is a race condition that can happen during a rebalance that will result in a timer being executed twice.

During a rebalance, all of the in-memory timers are cancelled and purged from storage. However, because transports don't have any knowledge of a shutdown or rebalance, a timer executes before it is cancelled or determined to be moved, it will be executed locally on the node, and on the remote node when it arrives.

Timers lost on Shutdown occasionally

Timers will sometimes be lost on shutdown. I am using Mongo for persistency. The timers are never lost in the rebalance process and only ever lost in the shutdown. To reproduce this:

Run 3 nodes then send ~100 timers at the cluster
Check that all timers are present in Mongo
Send SIGTERM to one node
After a complete shutdown check Mongo for loss of timers (Sometimes in trials it would be correct but it was not consistent)
Restart the node back up and check Mongo for correct rebalance

Repeat this process any amount of times and with any node in the cluster. A loss of timers from shutting down will be seen.

Rebalance from Recovery NEVER had errors in a script that constantly repeated this process, only shutdown had loss of timers. The more timers the node held the more profound this loss was from shutting down. I know the node is completely shutting down gracefully as I completely removed the SIGKILL timeout and I can see the logs showing that the node exited properly. I can also see the node successfully logging that it has processed the correct amount of rebalances while it is shutting down.

Timer Rebalance Reset

I ran some tests yesterday and it seemed like when a timer is rebalanced to a new node from a graceful shutdown the timer is fully reset. I set a timer for 60 seconds, shut down the node after 30 seconds, and didn't see the timer expire until 60 seconds later. I expected it to fire after the remaining 30 seconds.

I believe you have this covered in the code below, but wanted to check to see if this was not executing properly.

skyring/lib/timer.js

Lines 460 to 467 in ac5362d

    
           const run = ( obj ) => { 
        
             clearTimeout( obj.timer ); 
        
             batch.del(obj.id); 
        
             const data = Object.assign({}, obj.payload, { 
        
               id: obj.id 
        
             , created: obj.created 
        
             , count: ++sent 
        
             });

migrate zmq transport to class

Update the supported zmq transport to a class based transport

	const run = ( obj ) => {
	clearTimeout( obj.timer );
	batch.del(obj.id);
	const data = Object.assign({}, obj.payload, {
	id: obj.id
	, created: obj.created
	, count: ++sent
	});

esatterwhite / skyring Goto Github PK

skyring's Introduction

Skyring

Architecture Overview

Examples

Create a timer

POST /timer

Contributing

Packages

License

skyring's People

Contributors

Stargazers

Watchers

Forkers

skyring's Issues

The command

The error

🚨 The automated release from the master branch failed. 🚨

No npm token specified.

🚨 The automated release from the master branch failed. 🚨

No npm token specified.

🚨 The automated release from the main branch failed. 🚨

Docker authentication failed

Recommend Projects

Recommend Topics

Recommend Org

POST `/timer`

🚨 The automated release from the `master` branch failed. 🚨

🚨 The automated release from the `master` branch failed. 🚨

🚨 The automated release from the `main` branch failed. 🚨