Code Monkey home page Code Monkey logo

goduck's Introduction

goDuck

This project's purpose is to be an engine that abstract message dispatching for workers that deals with the concept of either streams or pools. In other words, reading a message from a stream or a pool, and delivering that message through the Processor Interface and interpreting its return value.

It is important to note that, if the Process function returns an error, the engine wont Ack the message, thus, not removing it from the queue or stream. The main idea for this, is that the engine guarantees that every message will be processed at least once, without errors.

Sample of a stream processor

import(
	"github.com/arquivei/goduck"
	"github.com/arquivei/goduck/engine/streamengine"
)
// The engine requires a type that implements the Process function
type processor struct{}

// Process func will receive the pulled message from the engine.
func (p processor) Process(ctx context.Context, message []byte) error {
	...
    err := serviceCall(args)
    ...
	return err
}
func main {
    // call below returns a kafka abstraction (interface)
    kafka := NewKafkaStream(<your-config>)
    engine := streamengine.New(processor{}, []goduck.Stream{kafka})
    engine.Run(context.Background())
}

Sample of a pool processor

import(
	"github.com/arquivei/goduck"
	"github.com/arquivei/goduck/engine/streamengine"
)
// The engine requires a type that implements the Process function
type processor struct{}

// Process func will receive the pulled message from the engine.
func (p processor) Process(ctx context.Context, message []byte) error {
	...
    err := serviceCall(args)
    ...
	return err
}


func main {
    // call below returns a pubsub abstraction (interface)
    pubsub, err := NewPubsubQueue(<your-config>) 
    if err != nil {
        <handle err>
    }
    engine := jobpoolengine.New(pubsub, processor{}, 1)
    engine.Run(context.Background())
}

Important configuration

Kafka

  • Commit interval: Set this to allow asynchronous message processing between commits. Without a value, defaults to every message having to be acknowledged before a new one is retrieved. This is bad to have when you should avoid message reprocessing. Suppose there is a failure and the engine stops executing while processing. The larger the commit interval is, higher is the chance of duplicating messages

To terminate the engine execution, a simple context cancellation will perform a shutdown of the application.

goduck's People

Contributors

andremissaglia avatar dependabot[bot] avatar eric-reis avatar joesantos418 avatar leoeareis avatar marcosbmf avatar nandohenrique avatar renovate[bot] avatar rilder-almeida avatar rjfonseca avatar victormn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goduck's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

  • WARN: Found renovate config warnings

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

  • fix(deps): update all non-major dependencies (cloud.google.com/go/bigquery, cloud.google.com/go/pubsub, cloud.google.com/go/storage, github.com/IBM/sarama, github.com/arquivei/foundationkit, github.com/confluentinc/confluent-kafka-go/v2, github.com/googleapis/gax-go/v2, golang.org/x/sync)

Detected dependencies

github-actions
.github/workflows/go.yml
  • actions/checkout v4
  • actions/setup-go v5
.github/workflows/golangci-lint.yml
  • actions/setup-go v5
  • actions/checkout v4
  • golangci/golangci-lint-action v6
gomod
go.mod
  • go 1.21
  • cloud.google.com/go/bigquery v1.61.0
  • cloud.google.com/go/pubsub v1.38.0
  • github.com/IBM/sarama v1.43.2
  • github.com/arquivei/foundationkit v0.9.0
  • github.com/confluentinc/confluent-kafka-go/v2 v2.4.0
  • github.com/go-kit/kit v0.13.0
  • github.com/imkira/go-observer v1.0.3
  • github.com/olivere/elastic/v7 v7.0.32
  • github.com/rs/zerolog v1.33.0
  • github.com/segmentio/kafka-go v0.4.47
  • github.com/stretchr/testify v1.9.0
  • cloud.google.com/go/storage v1.41.0
  • github.com/googleapis/gax-go/v2 v2.12.4
  • golang.org/x/sync v0.7.0

  • Check this box to trigger a request for Renovate to run again on this repository

Allow RawMessage to have metadata

A lot of messaging systems allow the user to send custom metadata attached to the message. But with the current goduck implementation, it is impossible to read those values, since the RawMessage interface allows to extract only the bytes from the content of the message.

Specifically for Streams, there is usually a special metadata called "key", from which the ordering guarantee comes. For now, it is not possible to read a message's key, which could be limiting.

Confluent kafka types external dependency

To configure goduckStreams, there is a need to instantiate kafka (config and producer) using the same version of kafka in the goduck project, which ends up generating overhead and the need to change structures in several projects to update dependencies.

Proposal for a solution: creation of a Goduck.ConfigMap as type alias of confluent kafka types, so that the dependency of kafka types from confluent is only on the side of goduck and not in external projects.

pubsubsink writes messages sequentially

pubsubsink waits for the last message to be completely delivered before sending the next one.

This is inefficient, because if we don't batch the requests, the throughput is limited to 1/write_latency .

A solution would be to .Get the result only once all messages have been published.

Create TableEngine

As soon as #12 is implemented, it is possible to implement a custom engine, built on top of the StreamEngine, that transforms a (key,value) stream into a in-memory table.

This would be similar to the KTable in kafka streams. The stream is materialized into a RocksDB table, which can be stored both in the memory and in the disk.

Things to bear in mind:

  • The group-id should be unique in each worker, or else only part of the data will be available
  • The stream should disable the commits, since we want to always read the full topic

Refactor kafka - segment

The first implementation of a stream backed by kafka was done using segmentio's library, in the folder goduck/impl/implstream/. But now that there are other implementations, it should be done in goduck/impl/implstream/kafkasegmentio.

This should be done in a backward compatible way, marking the old code as deprecated before deletion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.