Code Monkey home page Code Monkey logo

goose's Introduction

Goose

Test & Lint Workflow Clojars Project cljdoc badge

The Next-Level background job processing library for Clojure.

Simple. Pluggable. Reliable. Extensible. Scalable.

Performance

Please refer to the Benchmarking section.

Features

Getting Started

Clojars Project

Add Goose as a dependency

;;; Clojure CLI/deps.edn
com.nilenso/goose {:mvn/version "0.5.1"}

;;; Leiningen/Boot
[com.nilenso/goose "0.5.1"]

Client

(ns my-app
  (:require
    [goose.brokers.rmq.broker :as rmq]
    [goose.client :as c]))

(defn my-fn
  [arg1 arg2]
  (println "my-fn called with" arg1 arg2))

(let [rmq-producer (rmq/new-producer rmq/default-opts)
      ;; Along with RabbitMQ, Goose supports Redis as well.
      client-opts (assoc c/default-opts :broker rmq-producer)]
  ;; Supply a fully-qualified function symbol for enqueuing.
  ;; Args to perform-async are variadic.
  (c/perform-async client-opts `my-fn "foo" :bar)
  (c/perform-in-sec client-opts 900 `my-fn "foo" :bar)
  ;; When shutting down client...
  (rmq/close rmq-producer))

Worker

(ns my-worker
  (:require
    [goose.brokers.rmq.broker :as rmq]
    [goose.worker :as w]))

;;; 'my-app' namespace should be resolvable by worker.
(let [rmq-consumer (rmq/new-consumer rmq/default-opts)
      ;; Along with RabbitMQ, Goose supports Redis as well.
      worker-opts (assoc w/default-opts :broker rmq-consumer)
      worker (w/start worker-opts)]
  ;; When shutting down worker...
  (w/stop worker) ; Performs graceful shutsdown.
  (rmq/close rmq-consumer))

Refer to wiki for Redis, Periodic Jobs, Error Handling, Monitoring, Production Readiness, etc.

Getting Help

Get help on Slack

Please open an issue or ping us on #goose @Clojurians slack.

Companies using Goose in Production

Contributing

Why the name "Goose"?

Named after LT Nick 'Goose' Bradshaw, the sidekick to Captain Pete 'Maverick' Mitchell in Top Gun.

License

Licence

goose's People

Contributors

alishamohanty avatar chage avatar jysandy avatar kitallis avatar olttwa avatar shhivam avatar siripr4 avatar tfidfwastaken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goose's Issues

Reconsider client library for Redis

Context

To begin with, Goose chose Carmine because it was popular, stable, well-maintained & most importantly, it solved basic needs of pushing to & popping from a list.

Issues

With passage of time, we've discovered certain issues:

  1. Carmine doesn't support closing connections cleanly. More details can be found in Carmine's issue #266 and issue #224
  2. Carmine doesn't support commands introduced in Rredis 6.2.0.LMOVE command is needed to enqueue in-progress jobs to the front of Job queue. Lua scripting or atomic transactions are difficult to implement for this task.

As a workaround, Goose sends dummy value to a utility queue. This helps exit the blocking call as it receives a message it was waiting on.

Requirements

Ideally, the client should help Goose handle it's connection-pool, be well-maintained, support redis cluster, etc.

Options

  1. Celtuce and Obiwan have provisions for closing a connection.
    Cons: They don't seem well-maintained & stable. When spiking them locally, I observed random issues like thread not closing, unexpected serializations, etc.
  2. Write a simple wrapper around Jedis or use any stable Java library using Interop

Create a Glossary for Goose

Issue

Goose has many context-heavy keywords & the same keyword might mean multiple things depending on time & perspective of object.

Create a Glossary defining the following:

  • broker
  • threads
  • queues
    • queue
    • schedule-queue
    • retry-queue
    • dead-queue
    • in-progress-queue, preservation-queue & orphan-queue
  • ...

Create an audit trail for a job

In the beginning a Job can be:

  • enqueued
  • scheduled
  • cron-scheduled

As part of it's lifecycle, the job could be:

  • enqueued to ready-queue by a scheduler
  • fail execution
  • be retried
  • be dead

Since a job could stay in Goose ecosystem for 60+ days, an audit-trail would help with debugging.

The audit-trail can have following details:

  • event
  • time

Customized queues

  • While enqueuing, Clients can specify queue name
  • Workers can be initialized with a set of queues

Priority can be tweaked by concurrency of workers.

Clean Graceful Shutdown

Because of #14, Goose can either timeout for long times, or shutdown in a clean manner.

If we have a way to interrup connections to redis in a clean manner, Goose will have long-polling & clean graceful-shutdown.

Support for Redis Cluster via Carmine Library

Context

Problem

Solution

  • Goose can have a small set of macros for the subset of Redis operations to map keys to the right cluster node. This macro will rewrite Redis calls to follow the cluster protocol of identifying the node and then query it
  • If these macros can be generalised, we can raise a PR to Carmine itself

Coordinate multiple-worker scheduler polling interval

Issue

For multiple worker processes, amplify & randomize polling interval.
This reduces load on redis, approximately achieves configured scheduled queue polling interval despite n workers

Solution

  • Blocked on #28
  • Sleep for (* poll-interval process-count (rand))

[redis] Periodic Jobs Feature

Like perform-in-sec, schedule a job to run recurrently.

For ex, perform-every and take CRON as input.

Implementation details:

  1. Add a perfrom-every function to Goose Client that takes a cron expression
  2. Calculate next date for the expression, say 3-Aug-1-PM
  3. Schedule the job to run at next CRON 3-Aug-1-PM
  4. When scheduler finds jobs due for execution, enqueue it to front of queue
  5. Alongwith enqueuing, also schedule it back into the queue for recurring execution
  6. Enqueue & re-scheduling should happen in 1 transaction so as to not loose the job
  7. Define number of times a periodic job should run and then stop
  8. Limit :scheduler-polling-interval-sec config to 60s as minimum interval of periodic time is 1 min

Open questions:

  1. Should we reuse scheduled-jobs queue for periodic jobs?
  2. How we'll calculate latency? Calculate time difference from previous cron schedule?
  3. Alongwith standard APIs, add an API to modify CRON period?

0.2 Release Laundry List

  • Total process count for statsD & scheduler sleep time
  • Add docstrings for cljdocs
  • Update API, StatsD & Middleware logic as per Wiki
  • Inject error service config into error & death handlers
  • Update README & it's badges
  • Add prefixed-queue to Job
  • Add Redis as default broker

[rmq] Implement Publisher Confirms

  • Enable publisher-confirm mode on client
    • If broker responds with basic.nack for 3 jobs in a row
      • Callback with failed jobs
      • switch to synchronous acks for future jobs
      • switch to async after 5 successful acks
    • Do acks async because latency can be few hundred millis

References

0.3 Release Laundry List

  • Rename Goose Description?
    • s/sidekick for Clojure/Durable background job processing library
  • Change integration-test promises to atoms
    • Reset atoms as part of fixtures
  • s/prefixed-queue/ready-queue
  • Put log-on-exceptions macro inside while-loop macro
  • Add more integration tests:
    • heartbeat+orphan-checker
    • Middleware
    • Parallelism: a 5-threaded worker should complete 10 50ms tasks in 110ms
  • Improve exception handling in redis namespace
  • Re-format function args: either in 1 line or vertically aligned as per clojure style guide
  • Modify defs in tests to be within fixtures
  • rename fn inside broker protocol: start -> start-worker
  • s/:schedule/:schedule-run-at
  • s/dead-at/died-at
  • Rename metrics.protocol/Protocol to metrics/Metrics
  • periodic jobs
    • Return name & schedule when registering a cron-job
    • Accept cron-opts as single config-map in perform-every
    • add config for timezone
  • Add util for calculating sleep-duration & sleeping
  • Test error-service-config is injected in error & death handlers
  • [redis] Test job recovery by orphan-checker

Integration test for orphan-checker

Issue

  • How to kill a worker thread from inside a thread?
  • Calling .shutdownNow() causes in-progress job to fail and be scheduled for retry
  • Kill a worker thread & validate orphan job is re-picked by another worker thread
  • This implicitly tests heartbeat

Inject a logger in Goose

  • Checkout timbre & tools.logging
  • Idiomatic/Functional way would be the server injecting a logger & Goose sending events to the logger.
  • The interface can be: log('time', 'level', 'msg', & params...)
    • try it out once with tools.logging

Internal Protocol for Multi-Broker support

  • This will be a first step in direction of supporting RabbitMQ
    The protocol will have following functions:
- enqueue
- schedule
- start
  - reify (goose.worker/stop)
- middleware-chain?
- APIs
    - enqueued
    - dead
    - scheduled

Improve async interface

Current interface

(async 
  `fn-sym
   {:args '(1 2 3) 
   :other-opts :other-vals})

Better interface

(async
  opts
 `fn-sym
 "variadic" :args 1.0 2 {:map :val} '("list") ["vector"])

Multiple worker threads

  • Client can configure number of worker processes to run in parallel
  • Stick to 1 process spawning n threads for now

Test statsd-metrics

  • When integration tests are run, listen on configured statsd port to verify stats are emitted as expected. Refer this github gist for Datagram listener in Clojure

Auto-stop Periodic Jobs after a certain count/time

When registering a periodic job, set a run-count or run-until.
The job will be deregistered from cron-schedule once the run-count has been reached.

nil run-count/run-until means the job will run indefinitely.

Pre open-source Laundry List

Done

  • Add ADR to repository
    • Interface
    • Redis client library choice
    • Validations
    • logging
    • using deps, lower version of libraries, etc.
    • reliability
    • priority of queues
    • scheduling
  • Add Goose to clojure-toolbox https://github.com/weavejester/clojure-toolbox.com
  • Publish to Clojars

Deferred to 0.2

  • All exposed functions have a doc string, and present in Clojure docs. Host on cljdocs once done
  • Add logo to README
  • Flow-chart for scheduler, enqueue, failed jobs, failed jobs with custom queue, dead jobs
  • Maintainability Badges
  • wiki for every feature
  • Changelog

Emit stats

Goose should emit stats for:

  • enqueue-execute time diff
  • schedule SLA diff
  • successful/failed/orphaned/dead jobs

Implement `worker` function

Acceptance Criteria:

  • Pull jobs from redis
  • Deserialize the arguments and execute the functions
  • Gracefully shutdown and enq in-progress jobs back

Stretch goal:

  • Multi-threaded workers

Reconsider number of threads long-polling redis

Issue

Goose polls redis n times for n threads.
To reduce load on redis, we might want to consider polling from just 1 thread, and enqueuing jobs' execution to the threadpool. To limit execution parallelism/concurrency to the user config, we can have an in-memory buffered queue.

N worker instances with T threads polling means O(N*T) operations per second (or long-polling if #65 gets resolved) slamming Redis.

What next?

  • Benchmark 2 approaches:
    • Polling redis n times
    • Polling redis once

While benchmarking, measure 2 things:

  • time taken to complete 1000 jobs averaging 50ms execution time
  • redis memory/CPU consumption

Maintain worker process count using heartbeat

Issue

A count of worker process is needed to coordinate reliable processing of hanging jobs, scheduler-polling time, etc.

Solution

  • Generate process-id (based on VM host/container-id + random string)
    • create key with TTL of 1 min
    • Renew TTL every 30 sec
    • Run GC every 1 min to handle abrupt shutdowns
  • On startup, add to processes list
  • On shutdown, remove from list

Generate smart job IDs

  • Instead of a random UUID, a job-id can have date+time of enqueuing, and other generic info limited to <60 characters

Benchmark performance

Outcomes

  • Finalize between 2 interfaces: pre-defined jobs using code OR resolving jobs at runtime
  • Stats for users
  • Comparison for newer releases

Schedule a job

Ability to enqueue a job to be picked up at a certain time in future.

Add extensive Documentation & Flow-charts

Create either a Github wiki, or leverage clj-docs.

It should have following details:

  • docstring for public-facing functions
  • Broker config, Enqueuing/Dequeuing, Scheduling, Error-handling, API, etc.
  • Flow-charts for life-cycles of a job
    • Enqueue-Dequeue
    • Schedule-Enqueue-Dequeue
    • Enqueue-Fail-Schedule-Dequeue
    • Enqueue-Fail-Schedule-Dequeue-from-retry-queue
    • Enqueu-Fail-Dead

Implement `async` function

Acceptance Criteria:

Arguments:

  • Takes the job function
  • Arguments for the job function

Validations:

  • Ensure that the job-fn is serializable
  • Ensure that the namespace in the job-fn is present
  • Ensure that the args are edn-serializable

Implementation:

  • Puts them in redis to be later picked up by the worker

Choose job-enqueuing interface

At, present, Goose has 2 options for interfaces:

  • Provide fully-qualified & resolvable function symbol
  • Predefine functions that can be enqueued, & their retry/schedule configs

Choose between the 2 based on #33

Mark a job as poisonous after n recoveries

Sometimes, a job execution might lead to worker process being crashed.

Due to orphan-checks, such jobs will be re-enqueued & retried.

Keep a note of recovered jobs, if a job is recovered more than n times, it can be assumed it's poisonous & leads to worker crash.

Add middleware support

  • Users should be able to inject code pre/post job execution
  • Pre/post Job enqueue isn't necessary because of Goose's interface

Intelligent Args Validation

Issues

Args validation just checks if it is edn serializable. edn serializes anonymous & symbolized functions too, which isn't supported by Goose as they cannot be stored somewhere and retrieved in a different JVM.

Possible solutions:

Try them out, they aren't tested at time of writing

  • Check if parents of said object are serializable?
  • Define an exhaustive list of allowed types and validate against them

API to manage jobs

An API that helps view:

  • List failed jobs in retry queue
  • List dead jobs (exhausted retries)
  • Retry dead jobs
  • Retry failed jobs now (instead of later)

Error handling & Retries

  • When a job throws an exception, worker re-enqueues with updated retry-count
  • Job is retried with an exponential back-off function. User can configure their own back-off function
  • users can add an error service like Honeybadger, sentry to report errors on email
  • If 0 retries, put in dead-letter queue

[rmq] Add enqueue-dequeue feature

Implementation details:

Client

  • connection factory
  • create a channel. (is should be re-usable across threads)
  • create a queue (this operation should be memoized)
    • durable: true, auto-delete: false, exclusive: false
  • publish
    • persistent: true, priority: 0

Worker

Modify validations/assertions approach

Issue

The approach to validation doesn't feel functional. We avoided spec, :pre form in defn, expound, Metis, Validateur because of 2 reasons:

  • They simply print error statements. Client doesn't get an explicit error message describing what went wrong
  • Customized validations weren't possible

Unsolved requirement

  • A perfect solution would mean users generating/understanding function params just by looking at validations

FWIW, Claypoole, a popular Clojure library also does Validations exactly like Goose

Possible solutions

Whichever solution gets chosen, be sure it follows above 2 requirements.

  • Wrap spec inside exceptions like this gist

Todo

  • Use mocks to assert client/worker validates redis, queue, etc. To avoid duplication, we aren't validating redis from client tests, as redis tests already check that.
  • Use table-driven tests for validations

Client-side Callbacks when a Job executes

Should be picked up after #105 is completed

Allow clients to register for a callback when a Batch of Jobs is executed
Callback can contain either execution result (success scenario) or exception (failure scenario)

[rmq] Add Scheduling feature

Implementation details

  • If schedule-time is in the past, publish with priority: 1
    • Check if this is necessary. i.e. does a negative delay result in job being enqueued to head of queue
  • publish(x-delay: 123 ms)

[rmq] Ready it for Production usage

Checklist

  • transactions?
  • use pool/get-pool & pool/borrow
  • should ACK be a middleware? (it's used in 2 places: consumer & retry)
  • can we do DLX instead of bury-job?
  • use :handle-shutdown-signal-fn inside lc/subscribe
  • inject ExecutorService for subscribers
  • check if automatically-recover is always true, if not add it
  • Memoize queue creation & give control to users
  • add return-listener
  • add ShutdownListener
  • add dead-jobs/replay-n-jobs API
  • add payload/encode, payload/decode feature inside util
  • retry negative acks for sync publisher confirms strategy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.