Code Monkey home page Code Monkey logo

archie's Introduction

archie

archie architecture diagram

summary

A file archiver for Kubernetes built around MinIO and our scalable archie worker used to archie files from MinIO source buckets to any S3 compatible or Google Cloud Storage destination bucket by way of MinIO's bucket event notifications, NATS JetStream's durable streams, and KEDA auto-scaling.

app features:

  • replicate bucket data from minio sources
    • copy and remove
  • replicate bucket data to multiple destinations
    • minio or any aws s3 compatible
    • google-storage
  • async healthcheck server
  • prometheus metrics server
  • nats jetstream provisioning
  • efficient queue pull consumer
  • graceful shutdown wait timer
  • ignore lifecycle expirations
  • exclude paths with pcre regex

detailed

A MinIO bucket can be configured to send bucket event notifications for (put & delete) activity to a NATS Jetstream cluster stream. The NATS Jetstream stream provides a place for the notification messages to queue where they are stored on persistent storage and it guarantees exactly-once delivery to the archie worker pool using a server-side max timeout retry. The archie workers use the same NATS Jetstream durable consumer, each one requesting to pull a single event notification message at a time. In the event of a failure archie will inform the NATS server of the failure and the client-side will request a retry using a more rapid exponential backoff. The process with only give up and terminate retries after the maxRetries in a few situations, if the source file in MinIO doesn't exist on a copy, or if the destination file doesn't exist on a delete, any other errors or timeouts will result in retrying forever or until the message is expired from the NATS Jetstream stream.

notes

The MinIO server does not need to be in Kubernetes, but it does need to be able to communicate with the NATS cluster to deliver the event notifications to the stream. The NATS cluster could also exist outside of Kubernetes, but I have not tested it.

deploy

Check out the helm chart INSTALL.md

chart features:

  • archie worker deployment
  • keda ScaledObject deployment scaler
  • prometheus ServiceMonitor metrics
  • prometheus PrometheusRules alerts

usage

For CLI and config.yaml settings visit CONFIGURE.md

queue

Use NATS JetStream to queue bucket event notifications from MinIO.

autoscaling

Use KEDA's NATS JetStream Scaler to scale the workers.

development

Check out DEVELOPER.md

known issues

  • NATS JetStream stream's first sequence metric is unstable - TODO: Create PR (monitoring issue only)
  • PCRE Regex module somewhat limits our build OS and ARCH - INFO
  • KEDA needed a patch to fix the scaler for using jetstream in a cluster - PR #3564 (merged)
  • NATS-Exporter needed to pass the first_seq stream info - PR #190 (merged)
  • MinIO doesn't reconnect to NATS server if it is down for a while - PR #16050 (merged)

archie's People

Contributors

rayjanoka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

rayjanoka

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.