Code Monkey home page Code Monkey logo

kafka-splunk-consumer's Introduction

Kafka Consumer For Splunk

Description

A Kafka consumer that implements a pykafka balanced consumer and Python multiprocessing to send messages to Splunk HTTP Event collector tier with scalability, parallelism and high availability in mind.

Compatibility

  • Splunk >= 6.4.0 (uses HEC Raw endpoint)
  • Kafka >= 0.8.2
  • Developed against Kafka 0.10.0.0
  • Requires Zookeeper to balance multiple consumers

Dependencies

Optional

Features

  • Multiprocessing for parallelization in consumption. Scale vertically and horizontally across multiple instances.
  • Support for Gzip and Snappy compression (with optional python-snappy module installed)
  • Transparently handles nested Message Sets
  • Support for SSL secured Kafka clusters
  • Ability to set per topic sourcetype and source metadata values
  • Supports Kafka consumer groups for auto load balancing consumers across a topic
  • Http/Https support for Splunk HTTP Event Collector
  • Supports gzip compression sending data to Splunk HEC
  • Configurable message bactch size to reduce network overhead and increase throughput
  • Built-in retry with configurable parameters in the event of network issues or an outage
  • Offsets only update after successfully sending to HEC (HTTP status code 200) to ensure delivery of messages in topic

Limitations

Installation

$ sudo python setup.py install

Configuration

See comments in the sample YAML file for all available configuration options.

Usage

$ kafka_splunk_consumer -c <config.yml>

Deployment Guidance

This script can be run on as many servers as you like to scale out consumption of your Kafka topics. The script uses Python multiprocessing to take advantage of multiple cores. Configure as many instances of the script on as many servers as necessary to scale out consuming large volumes of messages. Do not exceed more workers than cores available for a given server. The number of workers across all your instances of the script should not exceed the number of partitions for a given topic. If you configure more workers than the number of partitions in the topic, you will have idle workers that will never get assigned to consume from a topic.

The splunk HTTP Event Collector should be deployed as a tier of collectors behind a VIP or load balancer. See the links in the Limitations section above for architrecture guidance.

For more information on the specifics of the pykafka balanced consumer and its benefits, see this section of the docs.

If you have a busy topic and you're not getting the throughput you had hoped, consider disabling HTTPS for your HTTP Event Collector tier to see if that speeds up ingest rates. (see use_https)

If you're using Splunk 6.4, I suggest you bump up the max content length for the http input in limits.conf. It is set far too low by default (1MB). I'd bump it up to the default setting in Splunk 6.5 (800MB)

[http_input]
# The max request content length (800MB, to match HTTP server).
max_content_length = 838860800

Bugs & Feature Requests

Please feel free to file bugs or feature requests if something isn't behaving or there's a shortcoming feature-wise.

kafka-splunk-consumer's People

Contributors

shaskell avatar

Watchers

Rampal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.