gchq / stroom Goto Github PK

Stroom is a highly scalable data storage, processing and analysis platform.

Home Page: https://gchq.github.io/stroom-docs/

License: Apache License 2.0

Java 60.27% HTML 2.20% CSS 0.71% JavaScript 31.34% Shell 0.35% XSLT 0.40% Dockerfile 0.05% Python 0.01% TypeScript 4.35% Makefile 0.01% SCSS 0.33%

xslt lucene big-data pipeline-processor data-analytics enrichment visualisation dashboards xml

stroom's Introduction

Stroom is a data processing, storage and analysis platform. It is scalable - just add more CPUs / servers for greater throughput. It is suitable for processing high volume data such as system logs, to provide valuable insights into IT performance and usage.

Stroom provides a number of powerful capabilities:

Data ingest. Receive and store large volumes of data such as native format logs. Ingested data is always available in its raw form.
Data transformation pipelines. Create sequences of XSL and text operations, in order to normalise or export data in any format. It is possible to enrich data using lookups and reference data.
Integrated transformation development. Easily add new data formats and debug the transformations if they don't work as expected.
Scalable Search. Create multiple indexes with different retention periods. These can be sharded across your cluster.
Dashboards. Run queries against your indexes or statistics and view the results within custom visualisations.
Statistics. Record counts or values of items over time, providing answers to questions such as "how many times has a specific machine provided data in the last hour/day/month?"

Get Stroom

To run Stroom in docker do the following:

# Download and extract Stroom v7.0 stack
bash <(curl -s https://gchq.github.io/stroom-resources/v7.0/get_stroom.sh)

# Navigate into the new stack directory
cd stroom_core_test/stroom_core_test*

# Start the stack
./start.sh

For more details on the commands above and any prerequisites see Single Node Docker Installation.

For the releases of the core Stroom product, see Stroom releases. For the releases of the docker application stacks, see Stroom-Resources releases.

Documentation

The Stroom application spans several repositories but we've bundled all the documentation into one Stroom Documentation site.

Contributing

If you'd like to make a contribution then the details for doing all of that are in CONTRIBUTING.md.

Repositories

Stroom and its associated libraries, services and content span several repositories:

stroom - The core Stroom application.
stroom-agent - An application for capturing and sending log files to Stroom.
stroom-auth - The OAuth2 authentication service used by Stroom.
stroom-clients - Various client libraries for sending logs to Stroom.
stroom-content - Packaged content packs for import into Stroom.
stroom-docs - Documentation for the Stroom family of products.
stroom-expression - An expression library used in Stroom's dashboards and query API.
stroom-headless - An example of how to run Stroom in headless mode from the command line.
stroom-proxy - An application that acts as a data receipt proxy for Stroom ( Legacy v5 only).
stroom-query - A library for querying Stroom's data sources.
stroom-resources - Configuration for orchestrating stroom in docker containers and released docker stacks.
stroom-stats - An application for storing and querying aggregates of event data.
stroom-visualisations-dev - A set of visualisations for use in Stroom.
event-logging-schema - An XML Schema for describing auditable events.
event-logging - A JAXB API for the event-logging XML Schema.

stroom's People

Contributors

Stargazers

Watchers

Forkers

jabley stroomdev66 jc064522 mrshoks quadhat datastark k9team3 markrey stroomworks stroomworksdev00 tim1512 m41doror lukw00heck edreams ricklentz ndevops mikekiwa johnfelipe olgermolla timyagan dyna-dot jsoref intrinsic-flow ecompositor juanfranblanco triplekill frankiegu tomzhang leo23 5l1v3r1 stroomconsumer rakhithjk gcdev373 msgpo ammasajan croceit p-kimberley uk-gov-mirror standardgalactic britisharmy nono9527 bellmit shauravmahmud dekoder vincentwei2021 l4371714 mrzhang638993 sts0mrg0 elvyssoares a3957273 marcostolosa stjordanis iq-scm c0dewhacker irakoze-io

stroom's Issues

Pipeline stepping does not always highlight current stepping input correctly

This is due to the byte positions being taken prior to the input being formatted for display

Searching index shards should only open relevant shards

make dashboard text area control show data for currently selected table and not single table

NPE in upload dialog

In streams-> data tab
press upload button
when dialog appears click in the effective date box
when calendar appears press enter
get NPE

(in v4 the above steps cause a blank stream to be uploaded)

Email reset password not working

stroom.security.server.AuthenticationServiceMailSender.()

throws java.lang.NoSuchMethodException

Clicking ok multiple times on dialog boxes (e.g. data upload) can cause action to be performed more than once

Can't export anything apart from root System

Dashboard table field selection list gets too long to see all entries

Allow recursive delete of a folder

Currently deleting a folder that contains entities fails with a FK violation. Need to give the user the option to delete all the contents as well (including any sub folders) or cancel.

Entity selection control does not show name for selected entity

If the selected entity is changed since the DocRef was created then the name shown will be incorrect.
If the DocRef contains no name for the selected entity then it will appear as if no entity has been selected (blank text box) even though the selection is valid. This is especially confusing post upgrade where all of the entity references may appear as if they have been wiped out.

Playing forward does not work across multiple sub sections of a "batch"

This is related to when a proxy combining several data sets in to a "batch" of data.
When a batch contains multiple smaller data sets for the same FEED, only the first dataset can be "debugged" in the play forward function where you select the RAW_FEED and then the pipeline you want to debug/test/develop.

replace() expression function on an empty column doesn't do anything

Table should treat blank columns as empty strings

Dashboard table columns should show menu on left click

Dashboard table expressions substring function takes static integer params and not sub functions to get positions

Saving anything seems to kill running searches

IndexFilter and StatsFilter should appear in Writers shouldn't they?

Allow dashboard tables to be filtered using parameters

You can't terminate tasks as the task termination button is never active

Dashboard components should be listed as 'name (id)' for linking them

Dashboard expressions are not BODMAS compliant

Dashboard with tables on two tabs, switching between tabs loses highlighted row on switched away table

Make it possible to specify time zones in dashboard query expressions

Add formatting option to dashboard tables so that text can be turned into a hyperlink

An expression could be used to create the link text and the link could open another dashboard with parameters or an external resource.

Add window closing handler to prevent user from accidentally refreshing or leaving Stroom if they have unsaved content

Add a window closing handler to Stroom that checks to see if any content tabs have dirty (unsaved) content and asks the user if they really want to leave/refresh the page if there is any unsaved content. It may be good to ask the user if they really want to leave even if there is no unsaved content as they could be in the middle of a query or other activity.

An example of a window closing handler used for dashboards that could be added to ContentManager:

 Window.addWindowClosingHandler(event -> {
           if (dashboardPresenter.isDirty()) {
               String name = "";
               if (dashboard != null) {
                   name = "'" + dashboard.getName() + "'";
               }

               event.setMessage("Dashboard " + name + " has unsaved changes. Are you sure you want to close it?");
           }
        });

If the above is added then the KeyboardInterceptor can be changed to stop blocking F5 and backspace keys as this is crude and potentially problematic.

Entering stepping mode a prompt to choose a pipeline is shown twice

Auto import of content packs

Create a process on stroom startup to import all content packs found in a configured directory.

Intended for use on new installations or docker instances so that stroom can startup with the required content, e.g. core schemas, template pipelines, etc.

Create a property to enable/disable this feature.

Mark the property as disabled after first run.

Move the imported packs into a sub directory and failed packs into a failed sub-dir.

Pipeline reports 'pipeline contains no child elements capable of processing' if a reader is placed before a parser

Provide an option to wrap text in dashboard table column

Size limited streams (StreamAppender) and size limited files (FileAppender)

Auto creation of a default Volume

When a new instance is started up the user must create a volume before they can store any data. For docker instances and the startup of a new instance it would be useful if stroom created a default volume on the root filesystem.

This could be disabled via a property for production installations.

Dynamic XSLT selection doesn't work in stepping mode

If you set the XSLT to ${feed}

Add conditional visibility and/or colour to rows in dashboard tables

Add context menu to dashboard table row cells to open another dashboard with the selected cell content as a parameter

Feature request: Separate Raw Events and Events

I have several different Raw Events [Audit Raw Events] [List Raw Events] etc.
Currently all Raw Events and translated Events are stored under the same path.
I would like to store (any) Raw Events on a slow file system and all Event data on a Fast file system such as SSD.

I.e.
/stroomdata/Raw_Events/Audit_Raw_Events/hostname/etc
/stroomdata/Raw_Events/List_Raw_Events/hostname/etc
/stroomdata/Events/hostname
Where /stroomdata/Raw_Events is mounted to a slow file system and
/stroomdata/Events is mounted to a fast file system

Cheers

Make dashboard queries use user defined parameters

Add sorting a series by the value of one of the fields in that series

In the visualisations, currently sorting of series is done by the name of the series. We need the ability to sort by one of the values within that series, e.g. if one of the fields is a count then doing a sort DESC by that count would give us a top-n type query.

The setting dialog would need a drop down (or some other mechanism) to allow the user to specify the field that the sort will operate on.

Provide an option in dashboards for a single line text only query instead of structured expression

Timezone data in dashboard does not work

When using a date field in tables and queries containing zone data, neither queries, formating of the field or formulas on the field works. If the zone data is UTC, i.e. 0, it works fine.

Field ordering is incorrect

The order of the fields on a query does not match the order of the fields in the index.

Make dashboards able to auto query on open

Allow for date/times in dashboard queries to be calculated from relative times +- duration

Allow dashboards to be passed parameters on open

Allow dashboards to be created with conditions in the query for which the user will be prompted when the dashboard is run.

This would make it easier to do things like get a quick view of events from a system. E.g. run and just pop in the name of the system and optionally event type then see results. Would feel less cludgey than changing an existing dashboard.

Ideally...
Values that are not filled in by users would have their conditions removed before running (i.e. don't error)
Default values could be assigned during dashboard creation
When the query is re-run, the previously used values are utilised, maybe with a "don't show this dialog again" to always use the same values for subsequent runs.
The name of the dashboard on the tab would be able to be composed from the supplied parameters, e.g. to make it easy to refer back to tabs if several dashboards have been run up using the same template.

TextWriter header doesn't write new lines

Even if you include a "\n" in the string it does not create a new line.

Filtering on logical conditions in a dashboard table does not work.

If I use the function under expression to trigger a true or false value, I'm unable to filter in include/exclude using the true or false value.

Add a visual indicator that a filter is in operation on a column heading

Feature request: Improved filtering in Feeds

At the moment, it is possible to filter on several different fields such as streamid, date etc. There is an area where it is possible to filter on some of the meta data.
Please add the ability to filter on any meta data supplied along with a batch of data to a Feed. I.e. it would be really handy to be able to filter on sending host, the UUID of the dataset etc.