Code Monkey home page Code Monkey logo

stream-replicator's People

Contributors

bastelfreak avatar dependabot[bot] avatar gerrity95 avatar ploubser avatar ripienaar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

stream-replicator's Issues

Should support recreating durables in order to copy entire stream

Sometimes like a KV bucket you'd want to just always copy the whole thing into the target so that if a node gets recreated we dont later re-attach to a past durable and so not getting any data.

Infact the current model might just be bad for KV so need a think about that.

Support HA deployes

We need to be able to run multiple instances of this on multiple nodes.

We can either do failover or active-active. With the strictly order preserving mode this is kind of pointless for active-active but we want to support partitions in time so might be worth investing in active-active now.

We can either keep the idtrack synced using core nats gossip or by updating a stream and loading that up at start

Expand the state admin tool

The state admin tool should be able to list all hosts matching certain criteria:

  • regex against value
  • seen since duration
  • exclude ones advised
  • show only the names
  • Work on a directory or a specific file

Support skipping target checks in source initiated mode

In source initiated mode we will get stream info and create the stream if not exist and then never use this data again. In cases where the account the target resides in does not actually have full jetstream and all you want to do is do js publishes to subjects we should be able to avoid that initial overhead

Heartbeat publishing bugs

During testing found a few bugs:

  • The paused metric is 1 when not paused (leadership was won)
  • The hbSubjects metric should not have the subject as a label
  • The heartbeat election name should be SR_SITE_HB and the value posted into the heartbeat should be the hostname, now its something like SITE_HB with election name heartbeats

Support publishing heartbeat messages

When there's constant data monitoring SR is easy, but for idle streams or where messages come and go it's quite hard.

SR should support publishing heartbeat messages into a list of subjects, each subject should:

  • Have a timestamp body
  • Have an originator host header Choria-SR-Originator
  • Have a published subject header Choria-SR-Subject
  • Support arbitrary headers

The publisher should be leadership aware

Sample configuration might be like this, it would go in the existing config file:

heartbeats:
  leader_election: true
  interval: 10s
  url: "nats://x.com:4222,nats://y.com:4222"
  tls: {} # connection properties that applies to all publishers
  choria: {} # connection properties that applies to all publishers
  headers:
     X: Y

  subjects:
     - subject: s.1:
       interval: 20s # overrides
       headers:      # merges
          Y: Z.           
     - subject: s.2   # defaults to 10s and X:Y headers

This would spawn a single go routine per subject that just publish this stuff all the time on the frequency. It needs to use a JetStream publish and log on ack failures.

The election name would derive from the main ReplicatorName in the configuration and be something like {{ReplicatorName}}_HB. All publishers would share the same active/paused boolean managed by Leader Election.

We need to keep at least these prom stats:

  • hb_subjects number of subjects being published in the hb system with labels replicator and subject
  • hb_published_ctr with labels replicator and subject
  • hb_publish_error_ctr with labels replicator and subject
  • hb_publish_time indicating time taken to publish a message with labels replicator and subject
  • hb_paused indicating if its paused under leader election or not with labels replicator

Add support to build el9 packages

At the moment we can build el7 and el8 compatible packages. We want to also build el9 packages and so, we need to add support to build these and then add them to the nightly builds.

This can be done using the same process as the el8 packages so we can copy the contents from there to generate the new el9 packager

On busy streams pausing does not work

When a stream is always busy pausing never happens because we keep sending AckNxt - we check for pause on polls but not when we ack. On a busy stream we never poll.

Allow being used as a package

Generally this is usable as a package but there's a few things we might need to export to help along.

  • configConfig#validate for example would be useful to be called from a pacakge.

Support in-process connections

For embedded scenarios it would be beneficial if we can do in-process connections for copiers, advisories and heartbeats.

Should support running in the target and using a push ephemeral for rapid copies

The typical source -> dest method in use is heavily penalised by latency as every message carries latency*2 cost (acks etc)

We can speed this up, but at the expense of sampling support, by running in the target, creating an ephemeral push consumer and then reading the streams (subject to FC and heatbeats). This is what a source does in jetstream proper.

Resuming is done by reading the messages in the destination and figuring out the resume position based on the last message found.

Support trimming subjects

Given a subject like choria.stream.input.registration.f.q.d.n we now have a target_subject_prefix setting, but be good to also have a target_subject_remove setting that can take out, so given: target_subject_prefix: registration.SITE and target_subject_remove: .choria.stream.input.registration. we would see target subjects of registration.SITE.f.q.d.n

Be less agressive on reconnects

We need to be less aggressive on reconnections, today we backoff to 10 seconds but on big sites can make many connections then spread across a cluster and multiplied by many customer DCs.

When there is a broker outage the impact of reconnects are too large. Backoff should go up to 1 minute at least.

Support sampling other properties

In addition to JSON key in the payload we should support:

  • A Header value
  • The value of a token in the subject the message was recieved on

Support ephemeral consumers

When doing a source to destination copy of a KV bucket, should the destination bucket be recreated or purged, the replication will never resume until after the next put as the durable will show its up to date. True for all kinds of stream really.

So for KV specifically it might make sense to just use an ephemeral to do the copying, this way, should the system restart and buckets get wiped it will fetch the entire bucket again.

During normal run there should not be resets as we track last seen sequence internally for resume should the destination consumer fail

Stop tracking lag as a metric

In HA mode or when using partitioned copies the lag number is just entirely unreliable and confusing in graphs and hard to monitor.

For now we should remove it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.