Code Monkey home page Code Monkey logo

nexus-streamer's Introduction

License (2-Clause BSD) codecov Build Status

NeXus Streamer

PLEASE USE https://github.com/ess-dmsc/nexus-streamer-python INSTEAD

This repository will be archived once the new implementation supports all used features

Stream event data from a NeXus file to an Apache Kafka cluster. Each message sent over Kafka comprises the event data from a single neutron pulse. Data in NXlogs, for example sample environment data, are also published. Histogram data from NeXus files recorded at ISIS can also be streamed by setting --histogram-update-period to something higher than 0.

Part of the ESS data streaming pipeline.

Geometry

A file can be provided with a json description of the NeXus file, using --json-description, this can include full geometry information about the instrument, which can be used by Mantid. Further documentation and a utility for automatically generating the JSON description is included here. A simple example NeXus file with geometry for a source, sample and detector is included at data/SANS2D_minimal_with_geometry.nxs

Getting Started

Prerequisites

Dependencies are managed by Conan. Conan can be installed using pip and CMake handles running Conan. The following remote repositories are required to be configured:

You can add them by running

conan remote add <local-name> <remote-url>

where <local-name> must be substituted by a locally unique name. Configured remotes can be listed with conan remote list.

If conan does not pick up your compiler settings, you can manually specify these by editing your conan profile.

for example to build with gcc 8.3 on Centos7:

[settings]
os=Linux
os_build=Linux
arch=x86_64
arch_build=x86_64
compiler=gcc
compiler.version=8.3
compiler.libcxx=libstdc++11
build_type=Release
[options]
[scopes]
[env]

Building

As usual for a CMake project:

cmake <path-to-source>
make

There are some useful python scripts in the data directory for creating test data such as truncating large NeXus files or generating a detector-spectrum map file.

Running the tests

Build the CMake UnitTests target. Then use as follows:

UnitTests <OPTIONS>

Options:
  -h,--help                     Print this help message and exit
  -d,--data-path TEXT REQUIRED  Path to data directory

Running via docker

The docker-compose script can be used to launch a single-broker Kafka cluster and the NeXus Streamer. Run the following in the root directory of the repository to launch the containers.

docker-compose up

By default the streamer publishes some test data using the instrument name TEST. The Kafka broker is accessible at localhost:9092. In docker-compose.yml note the SEND_GEOMETRY option, set to 1 to automatically generate the JSON description of the NeXus file and include this in the run start message sent to Mantid.

Pre-built containers are available at Docker Hub tagged by the last commit on master at the time of building.

Built With

  • CMAKE - Cross platform makefile generation
  • Conan - Package manager for C++
  • Docker - Container platform

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the BSD-2 Clause License - see the LICENSE.md file for details

nexus-streamer's People

Contributors

amues avatar lamarmoore avatar martyngigg avatar mattclarke avatar matthew-d-jones avatar mchorazak avatar mortenjc avatar rerpha avatar

Watchers

 avatar  avatar  avatar

nexus-streamer's Issues

Add Windows build to Jenkins pipeline

Already builds on Windows, but requires the third-party dependencies repo from ScreamingUdder. First get the ESS conan packages working on Windows (googletest and ??).

File reader should check if isis file

In NexusFileReader.cpp:

For ISIS files the name of NXlogs are in the parent classes. We should add a check for ISIS files to set the name of the log to the log object's parent instead of the log itself. ESS files use the name of the log with the log object itself

First frame event data IDs are set to 0

When using a SANS_test.nxs to stream event data to a topic in the first frame all the detector IDs are set to 0. This is due to the memcpy function in the flatbuffers builder stage.

Support different types and units

Currently makes assumptions based on format used at ISIS, for example that event_time_zero is a double in units of seconds.

Check if there are units of ns or nanoseconds, and falling back on ISIS behaviour if not, would be enough to support ISIS and example ESS files. I think this is sufficient for now as supporting all possible types and units which NeXus allows would add significant complexity.

Ensure works with example ESS NeXus files

  • Don't assume raw_data_1 as the NXentry name.
  • Don't rely on there being a good_frames dataset in the file
  • Find sample environment information in NXlogs (remove dependence on finding selog group)
  • Remove dependence on detector_1_events/total_counts as total_counts is not required by the NeXus standard, use size of one of the other datasets in the `NXevent_data instead.
  • Generally try to search for particular NX_classs, rather than assuming a particular group name.

edit: Streaming event data from ESS files should now be in a working state. Just need to support reading sample env logs from NXlogs

Problem with h5cpp failing to bring in boost_filesystem

An error is displayed at runtime with the containerised version.

/nexus_streamer/bin/nexus-streamer: error while loading shared libraries: libboost_filesystem.so.1.65.1: cannot open shared object file: No such file or directory

All timestamps should simulate live data source

Pulse times in event data messages should look like "live" data, not be based on the offset attribute of the frame times in the NeXus file. This will make it more straightforward to use NeXus-Streamer with the NeXus File Writer which checks the pulse timestamp is within its start-stop time range.

Easy performance gains with flatbuffers

Profiling shows a lot of time is spent in flatbuffers::createVector(). Jonas suggests instead using CreateUninitializedVector. Maybe something like this:

std::vector<things> dataVector = {0, 1, 2, 3, 4, 5};
size_t dataSizeBytes = dataVector.size() * sizeof(things);
std::uint8_t *tempPtr;
auto payload = builder.CreateUninitializedVector(dataSizeBytes, &tempPtr);
std::memcpy(tempPtr, dataVector.data(), dataSizeBytes);

Ensure histogram data matches what is expected by Mantid

Initial histogram support was implemented in #90 but this needs to be tested against the histogram listener in Mantid. Also check for consistency with just-bin-it and maybe how the schema is used at PSI too?
For example dimension names need to match what is expected by the listener in Mantid. May also require particular type for time-of-flight bin edges, or other fields?

Use H5cpp library

Rather than the HDF5 library directly. This should reduce the verbosity of NeXusFileReader.cpp.
It will also make it easier to create in-memory test "files" for unit testing changes in #3.

Improve conan and build documentation

Document what to do if conan doesn't pick up compiler settings.
Include compiler version requirements in the build information.

Requested by Torben 05/09/2018

Support streaming NeXus geometry

If available in stream Mantid would make use of this information instead of loading an IDF for the instrument.

This is part of the work required to support streaming geometry for Torben. Also need to convert IDF from McStas to NeXus geometry, or directly implement NeXus geometry (+ Nxevent_data) in McStas?

Make docker image build faster

By putting the conan installation in a separate step and before the source is copied they will only be built if outdated instead of every time the source code changes.

Use spdlog

"Unsupported datatype" should be warnings for example.

Generate detector-spectrum map if none provided

We don't use spectrum numbers for ESS file (yet, at least), so generate a 1 to 1 mapping (in other words treat the IDs in the NeXus files as detector IDs rather than spectrum numbers) if no file is provided.

Data from different NXdetector groups should be published to different partitions

This is not currently high priority.

This would simulate the situation that a different EFU is running for different detector panels of an instrument.

Consider if it should be an option rather than always being the case.
Would require the topic to already exist with the correct number of partitions, or if topic did not exist it could be created with the new Admin API functionality (which I think requires the next release of librdkafka).

Document in readme + cli11 which args are optional

Requested by Torben 05/09/2018

Note, can mark mandatory arguments with CLI11:

  App.add_option("-f,--filename", settings.filename,
                 "Full path of the NeXus file")
      ->check(CLI::ExistingFile)
      ->required();
  App.add_option("-d,--det_spec_map", settings.detSpecFilename,
                 "Full path of the detector-spectrum map")
      ->check(CLI::ExistingFile)
      ->required();
  App.add_option("-b,--broker", settings.broker,
                 "Hostname or IP of Kafka broker")
      ->required();
  App.add_option("-i,--instrument", settings.instrumentName,
                 "Used as prefix for topic names")
      ->required();
  App.add_option("-m,--compression", settings.compression,
                 "Compression option for Kafka messages");
  App.add_option("-e,--fake_events_per_pulse", settings.fakeEventsPerPulse,
                 "Generates this number of fake events per pulse instead of "
                 "publishing real data from file");
  App.add_flag("-s,--slow", settings.slow,
               "Publish data at approx realistic rate (detected from file)");
  App.add_flag("-q,--quiet", settings.quietMode, "Less chatty on stdout");
  App.add_flag(
      "-z,--single_run", settings.singleRun,
      "Publish only a single run (otherwise repeats until interrupted)");
  App.set_config("-c,--config_file", "", "Read configuration from an ini file",
                 false)
      ->check(CLI::ExistingFile);

Reduce file requirements

The streamer should not require good_frames dataset or total_counts dataset.
It should warn but not cause a problem if it does not find sample env logs in the file.

Allow proton_charge and period_number and run_state to be missing from file, in which case don't add ISIS-specific data to event data messages.

This will make life easier for Torben and co using it for the DMSC integration pipeline.

Support publishing data from file with multiple NXdetector groups

Currently only data from the first NXevent_data that the NeXus Streamer finds is published to Kafka. It should publish from all, keep in mind that it should eventually have the option to publish data from different NXevent_data groups into different partitions, but that is not required for this ticket.

This ticket is a request from Lamar 17/01/2019.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.