iarna / abraxas Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 11.0 190 KB

A streaming gearman client / worker / server (as you choose)

JavaScript 100.00%

abraxas's Introduction

Hi, you can find me various places as iarna:

NPMJS
Masto don
Twitter (mostly on hiatus in favor of Mastodon)

abraxas's People

Contributors

Stargazers

Watchers

Forkers

f5itc mignoncornet amv digideskio gitter-badger nponeccop v8jupiter photoup-godwinh passcod rayn0r126

abraxas's Issues

Latest version from npm crashes

I will be posting more when I uncover the problem, but currently our abraxas clients die like this:

_stream_readable.js:476
  dest.on('unpipe', onunpipe);
       ^
TypeError: Cannot call method 'on' of undefined
    at PassThrough.Readable.pipe (_stream_readable.js:476:8)
    at ClientTask.acceptResult (/opt/meetings/data-api/node_modules/abraxas/task-client.js:42:16)
    at exports.handleJobResult (/opt/meetings/data-api/node_modules/abraxas/client-jobs.js:168:10)
    at /opt/meetings/data-api/node_modules/abraxas/client-jobs.js:52:18
    at success (/opt/meetings/data-api/node_modules/abraxas/packet-handler.js:68:9)
    at queueEventListener (/opt/meetings/data-api/node_modules/abraxas/packet-handler.js:15:40)
    at EventEmitter.emit (events.js:95:17)
    at null.<anonymous> (/opt/meetings/data-api/node_modules/abraxas/socket.js:89:97)
    at EventEmitter.emit (events.js:95:17)
    at null.<anonymous> (_stream_readable.js:746:14)

registerWorker call feels awkward

(Unless you're using the payload as a stream.)

I find that using the payload as a string is quite awkward as it means another layer of indirection to either promise or concat the stream. It would be nice to be able to register the worker such that you get the payload at initial call time. In that scenario the task wouldn't be readable, but it would still be writable. There are a few ways we might do this... a task.payload property or an additional argument to the callback both come to mind.

I kind of want to just trigger off of the callback's function signature, even though my "maybe this is too magical" alarm is going off:

gm.registerWorker('test', function (task) {
    // Task is exactly as today, a readable, writable stream that can
    // also be resolved as a promise
});
gm.registerWorker('test', function (task,payload) {
    // Task is a writable stream only, with no promise
});

Alternatively we could pass in an option, for example:

gm.registerWorker('test',{nostream: true}, function (task,payload) {

Unreachable server causes an uncaught error

I am writing a naive round-robin multi-server client using abraxas as the client implementation. I wanted to find out how it behaves in error situations so I took the example from the documentation and ran it without a gearman server.

My multi-server code should deal with unavailable servers somehow but at this point using the abraxas client with a non-existing gearmand causes my mode process to emit an uncaught error in the net.js socket level:

var Gearman = require('abraxas');
var client = Gearman.Client.connect({ host:'127.0.0.1', port:4730, defaultEncoding:'utf8' });
client.submitJob('toUpper', 'test string', function(error, result) {
  if (error) console.error(error);
  console.log("Upper:", result);
});
console.log('sent');

Running this code results with this:

sent

/Users/amv/tmp/node_modules/abraxas/socket.js:116
  throw error;
            ^
Error: Socket error: Error: connect ECONNREFUSED
  at Socket.<anonymous> (/Users/amv/tmp/node_modules/abraxas/socket.js:52:65)
  at Socket.emit (events.js:117:20)
  at net.js:440:14
  at process._tickCallback (node.js:419:13)

This makes me worried about all other kids of errors being thrown also due to network sockets doing their weird things (like closing prematurely in bad network conditions, which usually leads to and ECONNRESET being thrown).

How should I deal with this? Could it be possible to try to limit the possible errors that the client can throw to a minimum?

I am not sure about this but I have a faint recollection about these errors being thwarted if the sockets have an eventListener listening to the "error" events.

Typo error

When I call

sender.submitJobBg('xxx', 'zzz')

I get an error in server-job-multi.js:10

Job.call(this, id, func, priority, body);

ReferenceError: Job is not defined

Looks like type error. 'Job' should be renamed to 'SingleJob'.

Implementing graceful shutdown

I am trying to implement a graceful shutdown for my workers, meaning that when I receive a signal (or otherwise decide so), all of my workers should continue doing their job, but no new jobs should be accepted.

My current implementation is to call client.forgetAllWorkers() when receiving the signal, and then waiting for all of the jobs to do their thing before exiting. I only call client.disconnect() some seconds after the last job has finished executing (and should have delivered it's payload).

Unfortunately at least with the C version of the gearmand server, the results of these jobs that finish after the client.forgetAllWorkers() never reach the clients who have submitted the jobs.

So either the gearmand stops passing the results on after unregistering the work, or abraxas fails to deliver the results after client.forgetAllWorkers() has been called.

In the former case I would need a way to tell abraxas to stop grabbing any future jobs even though the functions are still registered.. And in the latter case I would love to have a fix for this on abraxas side..

Would anyone happen to come up with a simple way to test which of these theories (or some alternate one) is the case?

Abraxas dies on node v0.8.26

Our staging was still running node 0.8.26 which caused abraxas to die with this error:

util.js:538
  ctor.prototype = Object.create(superCtor.prototype, {
                                          ^
TypeError: Cannot read property 'prototype' of undefined
    at Object.exports.inherits (util.js:538:43)
    at Object.<anonymous> (/opt/meetings/data-api/node_modules/abraxas/stream-replay.js:22:6)
    at Module._compile (module.js:449:26)
    at Object.Module._extensions..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:362:17)
    at require (module.js:378:17)
    at Object.<anonymous> (/opt/meetings/data-api/node_modules/abraxas/server-job-single.js:4:20)
    at Module._compile (module.js:449:26)

An upgrade the 0.10.29 branch solved this issue. Our vagrant boxes were on 0.10.26 which still worked fine too.

It might be a good idea to indicate in the package.json file that a recent version of node is required.

askForWork() performance over WAN

Now the sequence is:

-> PRE_SLEEP
<- NOOP
-> GRAB_JOB

But to compensate for latency it is better to GRAB_JOB as soon as possible - when job.end() is called. Even before the result is sent, as it could be large so sending it takes significant time.

Tasks that throw errors remain 'Running' on Gearman server until worker is killed.

I am trying to understand some unexpected behavior, and am not sure if it is normal behavior that I simply don't understand, or if I am encountering a bug.

Here's the scenario:
A Worker begins running a new task; the task is a promise-returning function.
The Worker throws an error during processing; it is thrown all the way up and becomes a Rejection. I see it emitted across the link to the Abraxas Client as a JobException type error. I have ensured that the client is receiving the JobException error in the submitJob() callback.

The strange thing is that even after the error has been received by the Client, the Gearman server still lists the task as 'Running' (and 'Queued', strangely?) indefinitely, and no other worker attempts to re-try the task (obviously).

If I actually kill the worker's node process manually from shell, another worker immediately attempts to retry the task. Although my current code uses a (bluebird) Promise returning function to run the task on the Worker, I have also tried calling task.error() manually in my promise chain's callback, rather than throwing an error in the promise chain, but with the same results. The only thing that properly cleans up the running process is calling task.end() in my worker-side catch(), but this is not the behavior I want, as Gearman then things the task finished successfully and no other worker picks it up.

So I guess my questions are:

Should a task that is running a promise returning method be recognized as no longer running by the gearman server if the promise is rejected with an error, or should I have to manually call task.end()?
Likewise, should emitting an error using task.error() result in the Gearman server recognizing that the task has failed?

This seems like such a common use case that I can't help but think I am missing something basic about how to manage tasks.

Client shutdown?

When i shutdown my server i call my abraxas client.shutdown(true) but i get this error

/home/playlyfe/code/gaia/node_modules/abraxas/admin.js:64
        packets.unacceptSerial('error', failure);
                ^
TypeError: Object [object Object] has no method 'unacceptSerial'
  at success (/home/playlyfe/code/gaia/node_modules/abraxas/admin.js:64:17)
  at [object Object].PacketHandler._write (/home/playlyfe/code/gaia/node_modules/abraxas/packet-handler.js:22:109)
  at doWrite (_stream_writable.js:225:10)

and then when i start my server i get error connection refused,

Error
-----
{ [ConnectError: Error: connect ECONNREFUSED] message: 'Error: connect ECONNREFUSED', name: 'ConnectError' }

StackTrace
----------
ConnectError: Error: connect ECONNREFUSED
  at Socket.connectionError (/home/playlyfe/code/gaia/node_modules/abraxas/socket.js:53:59)
  at Socket.g (events.js:180:16)
  at Socket.emit (events.js:117:20)

So i have to restart my gearman server everytime what could the problem be?

client seems to only connect to first server

we have a herd of 3 gearman servers that our nodejs web app connects to as client, using the array of host:port strings form. It seems that it's only connecting to the first one, though there's no errors. if i change the order of the servers, I see all the jobs get put on whichever one is first. Is that a known bug?

return connect({
            servers: this.servers,
            defaultEncoding: 'utf8',
            submitTimeout: 3000,
            connectTimeout: 5000,
            debug: this.devMode })
            .then((client) => {

                client.on('connect',(...args) => Debug('gearman-client-connected', [...args].toString().slice(0,200)));
                client.on('disconnect',(...args) => Debug('gearman-client-disconnected', [...args].toString().slice(0,200)));
                client.on('connection-error', (...args) => Debug('gearman-client-error', ...args));
                client.on('error', (...args) => Debug('gearman-error', ...args));
                this.client = client;
            })
            .catch((err) => {

                err.message = 'Gearman: ' + err.message;
                Debug('Gearman connect error: ', err);
                throw err;
            });

Collaborators

I've added some collaborators from folks who've been active in the repo, to give a little better momentum here then I'm able to provide myself. If you didn't get added and would like to me, comment and I'll add you.

Collabs added:

@amv @streamcode9 @nponeccop

Please PR your changes, but feel free to merge them after you've gotten a thumbs up from someone (doesn't have to be me.)

Please have descriptive commit messages and for fixes to issues have "Fixes: #issuenum" in the body of the commit message. (So the issue gets autoclosed when the commit is merged.)

I haven't yet gone handing out npm publish bits, but I'm probably up for that too.

getStatus() throws an exception

The following code throws:

var ab = require('abraxas');
var s = ab.Server.listen();
var worker = ab.Client.connect();
worker.registerWorker('test_bg', function (job) {
    job.end();
});
worker.submitJobBg('test_bg', 'wizzle').then(function (jobid) {
    worker.getStatus(jobid, function (err, status) {
        console.log({ jobid: jobid, status: status });
        s.shutdown();
        worker.forgetAllWorkers();
    });
});

The problem can be fixed by adding var status and replacing client with self.

registerWorker() performance over WAN

From my tests it seems that getting each task requires a RTT. So if there is 50 ms of round trip time between a worker and gearmand, it can never perform more than 1000 / 50 = 20 tasks per second.

Is there a way to send multiple GRAB_JOB_UNIQ without waiting for JOB_ASSIGN_UNIQ in between? Does gearmand support this kind of pipelining?

Server "streaming" option

Currently, the server has two modes of operation. At start, it works the same as the C++ gearmand. When a connection is made to it and the "streaming" option is selected, it changes semantics to those better supporting streaming clients. Currently this means:

Clients receive a FAIL event if the worker disconnects, instead of being silently retried by the server.
Uniqueid jobs have their results queued and sent to newly connected clients, so that the series of DATA/WARNING/COMPLETE events are identical across all workers.

On reflection, I want to change the second one to be:

Uniqueid foreground jobs result in an error.

This would remove substantial complexity, while maintaining the same promises. I don't believe that streaming uniqueid foreground jobs are sufficiently useful to justify the code needed to support them.

Respect client.workers in Server job assignment

var abraxas = require('abraxas');

abraxas.Server.listen();
var c1 = abraxas.Client.connect();
var c2 = abraxas.Client.connect();

c1.registerWorker('c1', job => job.payload);
c2.registerWorker('c2', job => job.payload);
c1.submitJobBg('c1', 'c11');
c1.submitJobBg('c1', 'c12');
c2.submitJob('c2', 'c21').done(result => console.log('***', result.toString()));

Why the code crashes? The message is

Error: Assigned job for worker we no longer have
    at Error (native)
    at Worker.dispatchWorker (/home/.../node_modules/abraxas/worker.js:196:32)

The following code crashes too:

c2.registerWorker('c1', job => job.payload);
c1.registerWorker('c2', job => job.payload);
c1.submitJob('c1', 'c12').done(() => console.log("aaa"));
c1.submitJob('c2', 'c21').done(result => console.log('***', result.toString()));

Debug logging shows that JOB_ASSIGN packet for c1 job is sent on a wrong connection.

It seems that job dispatching code in Server.prototype.grabJob is broken

        return self.workersCount[job.function] && !job.worker;

should be

        return client.workers[job.function] && !job.worker;

Update Docs

The client.submitJob(func, data, options, handler) signature is like this in client_connection.js this but in the docs it is still client.submitJob(func[,options][,data][,callback])

Autoreconnect don't work

If gearman server was restarted, abraxas API does not reconnect automatically as stated in the guide. Nothing happend even if you call submitJob, set submitTimeout option or send echo. Fortunately "connection-error" event is fired and I just restart connect manually.

abraxas version 2.1.10

performance of workers limited by downlink bandwidth

Imagine that there is one worker that is heavy. I.e. it consumes much resources so it's not practical to run more than one worker.

In this case it is still beneficial to grab more than one job to fully utilize the connection. E,g. if one job (job_assign packet) is 10kb long, on 10 mbit connection with 25ms latency there should be 10 * 1024 * 1024 * 0.025 / (8 * 10 * 1024) = 4 packets in flight (after rounding up from 3.2)

My proposal is to have another control for job count. maxJobs controls how many jobs are executed concurrently, and maxExtraJobsInFlight (or a shorter name) controls, well, the extra jobs in flight.

So in the example situation mentioned above, we will have maxJobs = 1; maxExtraJobsInFlight = 4

Event Emitter Memory Leak

I get this memory leak warning when running a lot of jobs in my test cases. Is there any way to reduce the no of event emitters.
This is my test code

      client = abraxas.Client.connect({
         maxJobs: 1000
      })
      q = []
      for i in [0..500]
        q.push client.submitJob('game job', { }, "")
      Promise.all(q)

This is the warning

:(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace: 
  at Socket.addListener (events.js:160:15)
  at Socket.Readable.on (_stream_readable.js:707:33)
  at Socket.once (events.js:185:8)
  at [object Object].AbraxasClient.newTask (/home/playlyfe/code/gaia/node_modules/abraxas/client.js:91:21)
  at [object Object].exports.submitJob (/home/playlyfe/code/gaia/node_modules/abraxas/client-jobs.js:44:21)
  at Object.create (/home/playlyfe/code/gaia/src/modules/service/job/main.coffee:23:16)

Immediate disconnect breaks worker's task.end

I am doing workers that do one task and then exit. In effect I want to do this:

client.registerWorker( 'success-function', {}, function( task ) {
    task.end("success!");
    client.disconnect();
} );

However when I do this, the worker never seems to send a success message to the gearman server.

I hackishly solved the problem by adding "nextTick" timeout before disconnecting:

client.registerWorker( 'success-function', {}, function( task ) {
    task.end("success!");
    setTimeout( function() { client.disconnect(); }, 0 );
} );

Do you think this would warrant a fix? I'm personally not sure wether it is evident for everyone that calling task.end does not actually put the response to the current buffers, and that calling disconnect at the end of the handler causes the contents of the buffers to be checked before the worker function can put it's stuff to the queue (which I would think causes this behaviour, haven't checked yet).

Maybe just moving the disconnect queue checking to nextTick would help?

Is there anything I could do to help with this?

worker.js setClientId error

Hello,
I've encountered an issue using Abraxas that I believe may be a simple typo in workers.js:
https://github.com/iarna/abraxas/blob/master/worker.js#L43
This line references 'id' but there is no 'id' variable defined in that scope.

I set client id in the connect callback, using a promisified (by bluebird) connect method, like so:

Abraxas.Client.connectAsync(jobServerConf)
  .then(function(client) { client.setClientId('Worker1'); });

Strangely, in my development environment where Gearman is on my local machine, this works fine 100% of the time, but in production with added network latency, the same code throws the above mentioned error 100% of the time.

I noticed there's no eslintrc or jshintrc file for the project; do you use static analysis of any kind for Abraxas?

Abraxas clients possibly leak memory

I have been running abraxas clients (submitJob / submitJobBg) in production for a couple of days now and functionally everything seems to work pretty well.

We did notice however that the memory consumption of the node.js processes started gradually going up recently. However it might still be that we made some other changes at the same time which cause the memory leaking and I have not been able to make any tests to isolate the issue yet.

Just wanted to give you the heads up on this though and ask wether you have yourself found anything similar with Abraxas?

The code I am running with is still the one I posted earlier:

https://gist.github.com/amv/25d2dcaef56a287805ee

It should call client.disconnect() on all clients it creates after the request has somehow resolved. Can there be something else that I should also be cleaning up myself?

connection-error event is not fired

Expected behavior:

Abraxas Client fires an event whenever it detects that the server is dead

Current behavior:

Abraxas only fires an event if a server was alive and then dies. But if a server was never alive, it never fires an event

Sample code:

const Client = require('abraxas').Client
const client = Client.connect({servers: ['127.0.0.1'], defaultEncoding: 'utf8'})
const check = eventName => client.on(eventName, () => console.log(eventName))
check('connect')
check('disconnect')
check('connection-error')
setTimeout(() => client.echo("aaa", console.log), 10000)

If you start it without a gearman on 127.0.0.1, it never displays any message. I want it to emit a message when it detects that initial connection attempt failed.

Proposed multiserver support

[Edit: reconnectTimeout was removed]

New connection option: submitTimeout

This is the default amount of time a submitJob*/echo call waits for a server connection to exist before failing the request with an error. With errors of this kind you KNOW that the job hasn't been submitted as no communication with the server has occurred.

New connection option: responseTimeout

This is the default amount of time we wait for a response from a submitJob*/echo/getStatus/admin query before failing the request with an error. Errors of this sort MAY mean that:

There are no workers available.
That a worker is just very slow.
That the server never received anything we sent.

There's no way to distinguish these cases, so proceed accordingly. (Later, I'm all for extending the protocol to support timeouts SERVER SIDE that indicate that no workers were available. A max-time-till-work-begins, if you will. That's separate from the max-time-till-work-complete.)

Workers

Workers are easy. Their APIs just update properties of the worker object that are synchronized with ALL connected servers. If a connection goes away while a worker is running, any result from it will be quietly ignored. (The server will fail and/or retry the job.) Reconnecting just means resynchronizing the current properties. This will mean refactoring how these functions work to update an internal state table and then telling it to resynchronize itself with active connections (or something like that).

APIs included in this:

setClientId
registerWorker
unregisterWorker
forgetAllWorkers

Client

Client commands, by contrast, need to try and fail on their own account and are only executed on one server at a time. Client commands would all take submitTimeout and responseTimeout options that would override the defaults if passed. Client commands will wait until a server is available or submitTimeout. In the latter case, the task will fail with a connect error. In the former case, they'll be sent to the server and the responseTimeout starts ticking. If the timeout occurs the task will fail with the a timeout exception. If the connection goes away before a response is receive then the task will fail with a connection error exception.

Client APIs are:

submitJob
submitJobBg
submitJobAt
submitJobSched
echo

getStatus

This is the odd one out– it takes a specific job id, but we don't necessarily know which server its associated with. Perhaps it should be bundled with the admin commands below... that is, applied to the servers object if you want results from ALL connected servers, or passed to a specific server.

Admin

I'm thinking that we should expose a new kind of thing, a list of servers. Admin commands can be either run against all connected servers using that thing, OR, against an individual server. In the former case, results will be merged in a command specific manner.

That is:

gm.servers[0].shutdown(); // shutdown *just* server 0
gm.servers.shutdown(); // shutdown ALL servers
var status = gm.server[1].status(); // get status from only server 1
var combinedStatus = gm.servers.status(); // get status from all servers

Remaining admin APIs are:

status
maxqueue
workers
shutdown
version

Handle ERROR results to all server interactions

Assume that the server may respond with an ERROR result to any command sent to the server and in the case that it does, tie them to their associated commands.

task.error() gives ServerError: Job given in work result not found

In case of error I want to use task.error() to handle this. It happens (not always), that the following error occurs. Only one worker is registered.

events.js:160
      throw er; // Unhandled 'error' event
      ^
ServerError: Job given in work result not found
    at .../abraxas/client-connection.js:27:32
    at PassThrough.<anonymous> (.../node_modules/abraxas/stream-to-buffer.js:8:51)
    at PassThrough.g (events.js:292:16)
    at emitNone (events.js:91:20)
    at PassThrough.emit (events.js:185:7)
    at endReadableNT (.../node_modules/readable-stream/lib/_stream_readable.js:926:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Seems something tried to send an additional end or something.

Any suggestion for that? Thanks.