Code Monkey home page Code Monkey logo

skaetl's Introduction

SkaETL

SkaLogs ETL is a unique real time ETL designed for and dedicated to Logs and Events.

Build Status

Core features :

  • Centralized Logstash Configuration
  • Log Parsing Simulations based on extensive list of common pre-set patterns
  • Consumer Processes: Ingestion Pipeline handling through guided workflow
    • Ingestion (from specific Kafka topic)
    • Parsing: ability to handle multiple input formats:
      • CEF (HP Arcsight/MicroFocus),
      • Nitro (MacAfee),
      • GROK,
      • CSV,
      • json as string
    • Parsing Simulations (ability to simulate multiple preset grok patterns on a json log)
    • Transformation: add csv lookup, add field, add geolocalization, capitalize, delete field, format boolean, format date, format double, format email, format geopoint, format ip, format long, hash, lookup external, lookup list, lower case, rename field, swap case, trim, uncapitalize, upper case.
    • Metrics
      • functions: count, count-distinct, sum, avg, min, max, stddev, mean,
      • window types: tumbling, hopping, session,
      • time units: seconds, minutes, hours, days,
      • join types: none, inner, outer, left.
    • Notifications
      • email
      • Slack
  • Build data referential on the fly based on events processed by SkaETL
  • Build metrics on the fly (standard statistical & count functions): before storing in ES (avoids computations in ES, reduces ressources dedicated to ES cluster)
    • Create new mathematical functions to extend standard statistical metrics
  • Create threshold and notifications
  • Preview live data (before storing and indexing in ES)
    • At ingestion in Kafka
    • After Parsing
    • After Transforming
  • Output: ES, Kafka
  • Notifications: email, Slack

SkaETL parses and enhances data from Kafka topics to any output :

  • Kafka (enhanced topics)
  • Elasticsearch
  • Notfications : email, Slack
  • more to come...

Detailed features :

  • Real Time: real-time streaming (Kafka, transformation, analysis, standardization, calculations and visualization of all ingested and processed data
  • Guided Workflows:
    • "consumer processes" (data ingestion pipelines) to guide you through transformation, normalization, analysis - avoiding the tedious task of transforming different types of Logs via Logstash
    • Optional metrics computations via simple functions or complex customized functions via SkaLogs Language
    • Optional alerts and notifications
    • Referentials creation for further reuse
  • Logstash Configuration Generator: on the fly Logstash configuration generator
  • Parsing: grok, nitro, cef, with simulation tool
  • Error Retry Mechanism: automated mechanism for re-processing data ingestion errors
  • Referentials: create referentials for further reuse
  • CMDB: create IT inventory referential
  • Computations (Metrics): precompute all your metrics before storing your results in ES (reduces the use of ES resources and the # ES licenses),
  • SkaLogs Language: Complex queries, complex computations, event correlations (SIEM) and calculations, with an easy-to-use SQL-like language
  • Monitoring - Alerts: Real-time monitoring, alerts and notifications based on events and thresholds
  • Visualization: dashboard to monitor in real-time all your ingestion processes, metrics, referentials, kafka live stream
  • Output: Kafka, ES, email, Slack, more to come...

Requirements

  • Java >= 1.8
  • Kafka 1.0.0

Building the Source

SkaETL is built using Apache Maven.

Build the full project and run tests:

$ mvn clean install

Build without tests:

$ mvn clean install -DskipTests

License

SkaETL is released under Apache License 2.0.

skaetl's People

Contributors

christophefromparis avatar davidsenouf avatar jeanlouisboudart avatar nicopoirier avatar themanfromearth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.