Code Monkey home page Code Monkey logo

fluent-plugin-scribe's Introduction

Scribe input plugin for Fluentd

Overview

This is a plugin for fluentd data collector. This plugin adds the Scribe compatible interface to fluentd.

What’s Scribe?

Scribe is a server for aggregating log data streamed in real time from a large number of servers, developed at Facebook.

It uses Thrift, a cross-language RPC framework, to communicate between clients and servers.

What’s Scribe plugin for fluent?

The Scribe plugin for fluentd, which enables fluentd to talk the Scribe protocol. Scribe protocol is defined as follows, in Thrift-IDL format:

enum ResultCode
{
  OK,
  TRY_LATER
}

struct LogEntry
{
  1:  string category,
  2:  string message
}

service scribe extends fb303.FacebookService
{
  ResultCode Log(1: list<LogEntry> messages);
}

The category field is used as fluentd ‘tag’.

How to use?

fluent-plugin-scribe contains both input and output.

Scribe Input

Please add the following configurations to fluent.conf. This allows your Scribe client to import logs through port 1463.

# Scribe input
<source>
  type scribe
  port 1463
</source>

These options are supported.

  • port: port number (default: 1463)

  • bind: bind address (default: 0.0.0.0)

  • server_type: server architecture either in ‘simple’, ‘threaded’, ‘thread_pool’, ‘nonblocking’ (default: nonblocking)

  • is_framed: use framed protocol or not (default: true)

  • add_prefix: prefix string, added to the tag (default: nil)

  • msg_format: format of the messages either in ‘text’, ‘json’, ‘url_param’ (default: text)

Scribe Output

Please add the following configurations to fluent.conf. This allows fluentd to output its logs into another Scribe server. Note that fluentd conveys semi-structured data while Scribe conveys unstructured data, thus, ‘field_ref’ parameter is prepared to specify which field will be transferred.

# Scribe output
<match *>
  type scribe
  host scribed-host.local
  port 1463
  field_ref message
</match>

These options are supported.

  • host: host name or address (default: localhost)

  • port: port number (default: 1463)

  • field_ref: field name which sent as scribe log message (default: message)

  • timeout: thrift protocol timeout (default: 30)

  • format_to_json: if true/yes, format entire record as json, and send as message (default: false)

  • remove_prefix: prefix string, removed from the tag (default: nil)

For Developers

To run fluentd with this plugin on chaging,

$ bundle # (or 'bundle update')
$ bundle exec fluentd -v -v -v -c example.conf

Then please execute the sample client.

$ bundle exec bin/fluent-scribe-remote

Contributors

Copyright

Copyright © 2011 Treasure Data, Inc.

License

Apache License, Version 2.0

fluent-plugin-scribe's People

Contributors

frsyuki avatar hfwang avatar iyagi15 avatar krobertson avatar kzk avatar repeatedly avatar tagomoris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fluent-plugin-scribe's Issues

Severe performance issues

We deployed fluentd to production using this plugin along with the out_redshift plugin.

Even during our initial benchmarks we saw that working with in_scribe gives far worse results than working with other input methods (like in_forward, which was giving 18kmsg/sec vs. 1kmsg/sec with in_scribe). But when we pushed real production traffic with all the plugins setup (during benchmark we used only in_scribe and out_file) it just couldn't handle the load (we're talking about ~300msg/sec).

It looks like the culprit is that all the message handling is happening on the same thread as the one that receives the Scribe messages and there is no actual use of Cool.io. So very often the processing gets delayed for some reason, the Scribe server will get a timeout and will stop sending data in until the retry period ends. But even then after a minute or so it dies again.

We worked around this issue by having in_scribe enqueue all the messages into a Queue and have another thread that will call Engine.emit on the messages in the queue. But this is sub optimal and far from being "production ready".

Parse JSON and Non-JSON for Scribe Input

Currently, I can only specify one msg_format in my config. I have entries that both include JSON and don't include JSON. I'd like the ability to parse both.

Ideally it should try to parse the object via JSON, and then fall back to text if it can't.

Does scribe output plugin support multiple workers

We are sending logs to both elasticsearch and scribe. For elasticsearch, we are able to run multiple workers in the same machine. We are getting error when trying to run multiple workers for scribe.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.