Code Monkey home page Code Monkey logo

parsible's Introduction

Parsible

A tool to help you parse your log files, written in Python. The goal was to make a tool that will do the grunt work of following your logs in realtime, and to be easily be extended via plugins. Check out our tech blog post on why we wanted Parsible herehttp://tech.yipit.com/2012/08/03/parsible-straightforward-log-parsing/.

Concepts

===========

There are a few core ideas we tried to bring together in Parsible:

  • Plugins: We wanted to make the parsers, processors, and outputs all customizable and easy to write. All plugins are autodiscovered on startup and have a very simple format for connecting them together.

  • Real Time: Parsible will tail your log file as the lines come in, we opted away from a stateful approach where new lines are read in batches since we feel it simplifies the flow and reduces complexity.

  • Batch Processing: Parsible has a switch to modify the behavior so that it acts like a standard parser. Instead of tailing from the end of the file, it will start at the beginning and exit once it has reached the last line.

  • Generators: Log files can get big, really big. By leveraging generators Parsible can keep it's memory footprint small and independent of the size of log file. There is no hard restriction on memory, disk, or CPU usage, so be careful when writing your custom plugins.

  • System Conventions: Since Parsible works with logs it is wise to follow Linux logging conventions. Parsible integrates easily with logrotate.

Plugins

===========

Parsers: We wanted to make it easy to write custom parsers for whatever type of log file you may be reading. Your custom parser has to follow a few conventions. A simple nginx parser has been included as an example.

  1. Your parser should live inside of the plugins/parsers directory

  2. Your parsing function should start with parse_, take for example the included nginx parser which contains a function called parse_nginx

  3. The parsing method signature should take one parameter which will consist of one line from the log file. You may parse the line however you see fit, we opted for a regex implementation since it fits nicely with our dictionary output format and we expect our nginx log data to be well structured.

  4. The parsing method can output whatever you like, as it will be fed directly into the processing functions. In our case we found that a dictionary works very well as lightweight storage for the parsed data although this is not required as you get to write the processing functions as well.

  5. Errors from a parse method are swallowed by the same try/except block that handles process methods due to lazy evaluation. Currently there is no recording of these occurrences although this behavior can be easily modified.


Processors: Once a line of log data is parsed it is time to do something useful with it. You can have your processors do whatever you wish, although it is suggested that they remain stateless so that you don't have any lingering effects from feeding on large log files over the course of a day. Some sample process methods can be found in plugins/outputs/url.py.

  1. Your processors should live inside of the plugins/processors directory

  2. Your parsing function should start with process_ so that the autoloader can find it. For example the sample processor contains functions called process_api and process_business.

  3. Your parsing method can take in whatever you please. The output of your parsing function will be fed directly to each processor that Parsible was able to discover.

  4. Outputting from parsing methods is up to you. The suggested flow is that you import any output functions you wish to use directly and call them as needed.

  5. Any errors from a process method are currently swallowed and left untracked, although it is very simple to modify this behavior if desired.


Outputs: Output functions are given their own directory to simplify the structure of Parsible. The output functions should be called directly by your code in your process methods, but it is cleaner to logically separate them inside the plugin system for clarity. Parsible will not attempt to run any output functions directly. For some example output functions check out plugins/outputs/statsd.py


System Conventions

======================

Log Rotate: We like our code to play nice with our systems, especially for a program like Parsible.

  • Parsible creates a PID file in /tmp/parsible.pid so that it is easy to find with other programs.

  • Parsible reloads the log file on receipt of the USR1 signal. This makes it easy to work with inside of logrotate scripts.

Here is our logrotate script before Parsible:

    postrotate
        [ ! -f /var/run/nginx.pid ] || kill -USR1 `cat /var/run/nginx.pid`
    endscript

And After

    postrotate
        [ ! -f /var/run/nginx.pid ] || kill -USR1 `cat /var/run/nginx.pid`; \
        [ ! -f /tmp/parsible.pid ] || kill -USR1 `cat /tmp/parsible.pid`
    endscript

If you don't care to set up logrotate or logrotate does not apply, just use --auto-reload True and it will try to reload the log file after 10 seconds of inactivity.

Usage

=========

  1. Clone Parsible
  2. Write your parser (or use one someone else wrote!)
  3. Figure out how you want to process your log lines and write some processors
  4. Set up any outputs you want to use
  5. Run it! (We keep ours running under supervisord, although we have not had issues with crashes.)
parsible.py --log-file /var/log/mylog --pid-file /tmp/parsible.pid --parser parse_nginx

To add debug messages regarding errors that my have been swallowed by Parsible add the --debug True option to your command line arguments. This can be relatively verbose since it can create multiple messages per processed line so it is not the recommended production configuration.

To enable batch processing mode, just append --batch-mode True to your command line invocation and Parsible will act as a standard parser that exits at the end of the file. This can be useful for backfilling data or doing ad hoc analysis of old files.

Requirements

================

  • Linux
  • Python 2.7+ (due to argparse)
  • Some tasty logs

Warnings

============

Parsible does not gaurantee that every line of your log file will get parsed. When it is first started Parsible seeks to the end of the log file. Additionally, whenever the USR1 signal is received Parsible will attempt to load the file at the configured location. There is no logic to make sure the current file is fully parsed before switching. This can lead to some lines not being processed during the switchover. If this is a major issue for you please feel free to submit a feature request.

Although Parsible is designed to be lightweight it does not gaurantee it. User created plugins have no restrictions on their behavior and can monopolize resources as they see fit.

Parsible grabs a 'line' based on the return of file.readline(). This means it usually won't handle multiline exceptions very well. Feel free to request the feature if you want it added.

The mispelling of the name of the project is intentional (vs Parsable)

Contribute

==============

If you are interested in contributing to Parsible here are the steps:

  1. fork Parsible from here: http://github.com/Yipit/parsible
  2. Clone your fork
  3. Hack away
  4. If you are adding new functionality, document it in the README
  5. If necessary, rebase your commits into logical chunks, without errors
  6. Push the branch up to GitHub
  7. Send a pull request to the Yipit/parsible project.
  8. We'll take a look and try to get your changes in!

Contributors

================

None for now, but feel free to check the commit history!

A special thanks to the fine folks at Etsy for publishing the StatsD project which gave me an excellent README to use as a template.

parsible's People

Contributors

andrewgross avatar james-valente-simplisafe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsible's Issues

Auto-detect Log File Changes

There should be an option to autodetect when the underlying log file changes.

Should Parsible only check for this when it is not receiving new lines?

  • Lets you know you have completed the log file you were looking at
  • Could lead to you getting farther behind in the new file

When Parsible switches should it start at the end or beginning of the new file?

  • User initiated non-batch runs should go to the end of the file
  • Reloading a log file should have Parsible start at the beginning of the file.

Default setting?

  • Off for now, unsure of performance impact.

Modifications

Hello,

I have some similar internal tool that I wrote, but close to re-writing; I would like to use parsible instead. I would like to make the following modifications, let me know if you guys are alright with me committing these back to the code when ready and if they are in scope with the overall direction of the project.

  • Bulk indexing of logs will support input of a directory, all files in the directory will be indexed. Additionally, -l argument will support multiple files/directories as input.
  • Bulk indexing will support compressed files and re-compress them after parsing.
  • Keep track of which files were parsed by having control file in each bulk indexing directory.
  • Support breaking out bulk indexing into multiple processes that will index given files independently.

Thanks

File Reading Issues

Hello,

When parsing a log file, a randomly get the following error messages.

[ERROR] 2016-09-03 17:41:49,379 - File Statistics: Current Byte Location 4515058 [ERROR] 2016-09-03 17:41:49,379 - File Statistics: Current File Byte Size 26925736 [ERROR] 2016-09-03 17:41:49,380 - File Statistics: Processed Percentage 16.77 %

In my parser, I'm only splitting by whitespace and my processor is simply printing lines. What could be causing this? If I need to provide more information, I'm happy to do so.

Understanding parser plugins

I'm playing around with this project for logfile parsing, and am struggling with having more than 1 parsing function, especially in more than 1 file. I'm not sure if I have something misconfigured, or if I just don't understand the underlying plugin design.

As an example, look at this simplified testcase

Essentially, I've created 2 files, with 2 dumb-as-can-be "parsing" functions in them. Each function prints out a simple debugging line, so we know what function is getting called.

Then I run using this syntax:

python parsible.py --batch-mode true --log-file <(echo "test line")

And I get this result:

DEBUGGING: function parse_test2 got line: test line

If I move the "parse_tes12.py" file aside, the same syntax gives me this result:

DEBUGGING: function parse_test4 got line: test line

So, I guess the question is this: Is this by design? Am I supposed to only have 1 parsing function total? Or do I have something misconfigured?

Thanks,
Lloyd

License not distributed with code.

Having the license distributed with the source code will enable me to use this code in our production environment. Would it be possible to add it explicitly? Thanks!

Custom Plugins Location

Allow users to specify a custom plugins directory for parsers, outputs and processors. This will make it easier to push Parsible into Pip since we will have decoupled the application code from the user generated code.

Start with option --debug True

Traceback (most recent call last):
File "parsible.py", line 256, in
p.main()
File "parsible.py", line 189, in main
for parsed_line in parsed_log_file:
File "parsible.py", line 145, in follow
self.logger.debug('Tick Tock, waited for {} iterations'.format(empty_iterations))
ValueError: zero length field name in format

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.