Code Monkey home page Code Monkey logo

learn_kafka's Introduction

Kafka

Some example code: https://github.com/confluentinc/confluent-kafka-dotnet/tree/master/examples/Web

What is Kafka

  • A high throughput distributed messaging system
  • "Distributed Commit Log"
  • Essentially a pub-sub system
  • Time line
    • Incepted at LinkedIn in 2009
    • First LinkedIn internal deployment in 2010
    • Released open source in 2011
    • 1.1 Trillion messages a day at LinkedIn
    • 2.0 released in 2019
  • Named after Franz Kafka
    • Kafkaesque: a nightmarish situation which most people can somehow relate to, although strongly surreal.
    • The name is meant to describe the situation they were trying to escape

Zookeeper

  • Centralized service for maintaining metadata about a cluster of distributed nodes
    • Configuration information
    • Health status
    • Group membership
  • Used by Hadoop, HBase, Mesos, Solr, Redis, and Neo4j
  • Distributed system consisting of multiple nodes

architecture

Topics

  • Central Kafka abstraction
  • named feed or category of messages
    • Producers produce to a topic
    • Consumers consume from a topic
  • Logical entity
  • Physically represented as a log

Event Sourcing

Message

  • Time stamp: set when a brokers receives the message
  • Referenceable identifier
  • Binary payload

Offset

  • A placeholder
    • Last read message position
    • Maintained by consumers
    • Corresponds to a message identifier

Kafka maintains all messages for 7 days by default

Partitions

  • Each topic has 1 or more physical log files called partitions
  • This is why Kafka can Scale, Fault-Tolerant, and high throughput
  • Each partition is maintained on at least one broker (usually more)
  • The number of consumers should not exceed the number of partitions. If there are more, the extra consumers will not receive any messages.

Offset Management

  • Auto = enable.auto.commit = true
    • Analogous to garbage collection
  • Manual = enable.auto.commit = false
    • commitSync
    • commitAsync

Challenges of Event Driven Architectures EDA

  • What is the source of truth?
  • How do we deal with duplicate events
  • Complexity
  • Loss of transactions
  • Lineage - events can be lost or corrupted

Reliability Validation

  • Reliable Configurations
    • VerifiableProducer
    • VerifiableConsumer
  • Test Scenarios
    • Consumer rebalancing
    • Leader re-election
    • [Consumer/Producer/Broker] rolling restart
  • Monitoring

Important Metrics:

  • Under-replicated Partitions
    • Reported by lead broker
    • Any non-zero value indicates risk of data loss
  • Offline Partitions:
    • Partitions with no leader
  • Active Controller Count
    • Should always 1
  • All Topics Bytes In/Out
  • Partition Count
  • Leader Count
  • Request Metrics
  • Producers
    • record-error-rate
    • request-latency-avg
    • outgoing-byte-rate
    • record-send-rate
    • record-rate
    • record-queue-time-avg
  • Consumers
    • fetch-latency-avg
    • bytes-consumed-rate
    • records-consumed-rate
    • sync-time-avg
    • sync-rate
    • commit-latency-avg
    • assigned-partitions
    • Lag (github.com/linkedin/Burrow)

learn_kafka's People

Contributors

dalealleshouse avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.