Code Monkey home page Code Monkey logo

gchq / stroom Goto Github PK

View Code? Open in Web Editor NEW
423.0 31.0 56.0 176.79 MB

Stroom is a highly scalable data storage, processing and analysis platform.

Home Page: https://gchq.github.io/stroom-docs/

License: Apache License 2.0

Java 60.27% HTML 2.20% CSS 0.71% JavaScript 31.34% Shell 0.35% XSLT 0.40% Dockerfile 0.05% Python 0.01% TypeScript 4.35% Makefile 0.01% SCSS 0.33%
xslt lucene big-data pipeline-processor data-analytics enrichment visualisation dashboards xml

stroom's Introduction

Stroom

Stroom is a data processing, storage and analysis platform. It is scalable - just add more CPUs / servers for greater throughput. It is suitable for processing high volume data such as system logs, to provide valuable insights into IT performance and usage.

Stroom provides a number of powerful capabilities:

  • Data ingest. Receive and store large volumes of data such as native format logs. Ingested data is always available in its raw form.
  • Data transformation pipelines. Create sequences of XSL and text operations, in order to normalise or export data in any format. It is possible to enrich data using lookups and reference data.
  • Integrated transformation development. Easily add new data formats and debug the transformations if they don't work as expected.
  • Scalable Search. Create multiple indexes with different retention periods. These can be sharded across your cluster.
  • Dashboards. Run queries against your indexes or statistics and view the results within custom visualisations.
  • Statistics. Record counts or values of items over time, providing answers to questions such as "how many times has a specific machine provided data in the last hour/day/month?"

โ€ƒ โ€ƒ

Get Stroom

To run Stroom in docker do the following:

# Download and extract Stroom v7.0 stack
bash <(curl -s https://gchq.github.io/stroom-resources/v7.0/get_stroom.sh)

# Navigate into the new stack directory
cd stroom_core_test/stroom_core_test*

# Start the stack
./start.sh

For more details on the commands above and any prerequisites see Single Node Docker Installation.

For the releases of the core Stroom product, see Stroom releases. For the releases of the docker application stacks, see Stroom-Resources releases.

Documentation

The Stroom application spans several repositories but we've bundled all the documentation into one Stroom Documentation site.

Contributing

If you'd like to make a contribution then the details for doing all of that are in CONTRIBUTING.md.

Repositories

Stroom and its associated libraries, services and content span several repositories:

  • stroom - The core Stroom application.
  • stroom-agent - An application for capturing and sending log files to Stroom.
  • stroom-auth - The OAuth2 authentication service used by Stroom.
  • stroom-clients - Various client libraries for sending logs to Stroom.
  • stroom-content - Packaged content packs for import into Stroom.
  • stroom-docs - Documentation for the Stroom family of products.
  • stroom-expression - An expression library used in Stroom's dashboards and query API.
  • stroom-headless - An example of how to run Stroom in headless mode from the command line.
  • stroom-proxy - An application that acts as a data receipt proxy for Stroom ( Legacy v5 only).
  • stroom-query - A library for querying Stroom's data sources.
  • stroom-resources - Configuration for orchestrating stroom in docker containers and released docker stacks.
  • stroom-stats - An application for storing and querying aggregates of event data.
  • stroom-visualisations-dev - A set of visualisations for use in Stroom.
  • event-logging-schema - An XML Schema for describing auditable events.
  • event-logging - A JAXB API for the event-logging XML Schema.

stroom's People

Contributors

at055612 avatar gcdev373 avatar gchq-11 avatar jabley avatar jc064522 avatar jsoref avatar p-kimberley avatar stroomdev66 avatar timyagan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stroom's Issues

NPE in upload dialog

In streams-> data tab
press upload button
when dialog appears click in the effective date box
when calendar appears press enter
get NPE

(in v4 the above steps cause a blank stream to be uploaded)

Allow recursive delete of a folder

Currently deleting a folder that contains entities fails with a FK violation. Need to give the user the option to delete all the contents as well (including any sub folders) or cancel.

Entity selection control does not show name for selected entity

If the selected entity is changed since the DocRef was created then the name shown will be incorrect.
If the DocRef contains no name for the selected entity then it will appear as if no entity has been selected (blank text box) even though the selection is valid. This is especially confusing post upgrade where all of the entity references may appear as if they have been wiped out.

Playing forward does not work across multiple sub sections of a "batch"

This is related to when a proxy combining several data sets in to a "batch" of data.
When a batch contains multiple smaller data sets for the same FEED, only the first dataset can be "debugged" in the play forward function where you select the RAW_FEED and then the pipeline you want to debug/test/develop.

Add window closing handler to prevent user from accidentally refreshing or leaving Stroom if they have unsaved content

Add a window closing handler to Stroom that checks to see if any content tabs have dirty (unsaved) content and asks the user if they really want to leave/refresh the page if there is any unsaved content. It may be good to ask the user if they really want to leave even if there is no unsaved content as they could be in the middle of a query or other activity.

An example of a window closing handler used for dashboards that could be added to ContentManager:

 Window.addWindowClosingHandler(event -> {
           if (dashboardPresenter.isDirty()) {
               String name = "";
               if (dashboard != null) {
                   name = "'" + dashboard.getName() + "'";
               }

               event.setMessage("Dashboard " + name + " has unsaved changes. Are you sure you want to close it?");
           }
        });

If the above is added then the KeyboardInterceptor can be changed to stop blocking F5 and backspace keys as this is crude and potentially problematic.

Auto import of content packs

Create a process on stroom startup to import all content packs found in a configured directory.

Intended for use on new installations or docker instances so that stroom can startup with the required content, e.g. core schemas, template pipelines, etc.

Create a property to enable/disable this feature.

Mark the property as disabled after first run.

Move the imported packs into a sub directory and failed packs into a failed sub-dir.

Auto creation of a default Volume

When a new instance is started up the user must create a volume before they can store any data. For docker instances and the startup of a new instance it would be useful if stroom created a default volume on the root filesystem.

This could be disabled via a property for production installations.

Feature request: Separate Raw Events and Events

I have several different Raw Events [Audit Raw Events] [List Raw Events] etc.
Currently all Raw Events and translated Events are stored under the same path.
I would like to store (any) Raw Events on a slow file system and all Event data on a Fast file system such as SSD.

I.e.
/stroomdata/Raw_Events/Audit_Raw_Events/hostname/etc
/stroomdata/Raw_Events/List_Raw_Events/hostname/etc
/stroomdata/Events/hostname
Where /stroomdata/Raw_Events is mounted to a slow file system and
/stroomdata/Events is mounted to a fast file system

Cheers

Add sorting a series by the value of one of the fields in that series

In the visualisations, currently sorting of series is done by the name of the series. We need the ability to sort by one of the values within that series, e.g. if one of the fields is a count then doing a sort DESC by that count would give us a top-n type query.

The setting dialog would need a drop down (or some other mechanism) to allow the user to specify the field that the sort will operate on.

Timezone data in dashboard does not work

When using a date field in tables and queries containing zone data, neither queries, formating of the field or formulas on the field works. If the zone data is UTC, i.e. 0, it works fine.

Allow dashboards to be passed parameters on open

Allow dashboards to be created with conditions in the query for which the user will be prompted when the dashboard is run.

This would make it easier to do things like get a quick view of events from a system. E.g. run and just pop in the name of the system and optionally event type then see results. Would feel less cludgey than changing an existing dashboard.

Ideally...
Values that are not filled in by users would have their conditions removed before running (i.e. don't error)
Default values could be assigned during dashboard creation
When the query is re-run, the previously used values are utilised, maybe with a "don't show this dialog again" to always use the same values for subsequent runs.
The name of the dashboard on the tab would be able to be composed from the supplied parameters, e.g. to make it easy to refer back to tabs if several dashboards have been run up using the same template.

Feature request: Improved filtering in Feeds

At the moment, it is possible to filter on several different fields such as streamid, date etc. There is an area where it is possible to filter on some of the meta data.
Please add the ability to filter on any meta data supplied along with a batch of data to a Feed. I.e. it would be really handy to be able to filter on sending host, the UUID of the dataset etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.