tunnell / wax Goto Github PK

(Abandoned) Generic software trigger, event building, and output for particle physics applications.

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.26% Python 17.04% C++ 13.70% C 68.00%

wax's Introduction

This code constitutes a framework for software triggering intended for particle and astroparticle physics experiments with pre-trigger data rates of 300 MB/s. The code scales to 300 MB/s by using celery over numerous computers, but the goal is to have a single thread be able to process a few MB/s.

Free software: BSD license
Documentation: http://tunnell.github.io/wax/

Usage

To run wax:

$ pip install git+https://github.com/tunnell/wax.git
$ event-builder --help

See the usage section in the documentation for more information.

Features

Processing of Caen V1724 blocks
Scalable with celery distributed task queue
Flexible trigger windows and thresholds
MongoDB data backend
First open-source software trigger framework
Data analysis online wax-on and offline wax-off

Overview

The program wax consists of two main components: data processing (wax process) and data recording (wax file builder).

wax's People

Contributors

Stargazers

Watchers

Forkers

sanderbreur

wax's Issues

GridFS as File Builder output

Check run doc for compression option

Basically we are always running the DAQ reader with compression on. However there is the option to turn it off. If that comes up for some reason it would be nice if it didn't crash wax.

The toggle is in the run document under run['reader']['compressed']. It is a bool (or an int, if you'd like, with 0=false and 1=true).

For Mongo input, read compression flag from run db

This should work similar to injector class.

Make jargon page in docs

Additional datapoints in sum

In the 1.8.8 version some extra data seems to be collected in the sum waveform.

Attached are zoomed pictures from xenon100 data where:

Event 0 has a higher baseline
All events seem to have 'additional' peaks in their sum
All real peaks are also in the sum
Events 1 t/m 4 have no higher baseline
No difference is seen between sum and pmt sum data.

I/O using BSONDump format .bson

Consult:

https://groups.google.com/forum/#!topic/bson/XvobOF8yQiA

Reimplement celery

Get Celery working again

Read trigger mode field and adjust options accordingly

There is a trigger mode field in the runs db. Looks like:

        run['trigger']['mode'] = string field

Wax should use this mode field to decide what type of data is being written by the run and adjust itself accordingly.

Right now we want three modes for testing:

'bern_test_daq' - Normal peak finding. Data is usually NaI pulses.
'bern_test_led' - No peak finding. Triggered data. Group occurrences that come at the same time and shovel them to the output. If that's difficult you don't even have to group occurrences at the moment (will talk to LED group later what they want data to look like).
'bern_test_off' - Don't look for a run in the buffer DB. There isn't one. We're testing something that doesn't need the event builder but will still put an entry in the runs DB.

Input DB Interface: get_max_time

This should be told to cito through a control document.

Feature to find duplicates

Find duplicate data payloads.

Make extensions for other trigger types

This could be useful if we wanted to, for example, trigger on some different signal. It could be an extension so people didn't have to modify the main code base.

Most likely overkill.

http://stevedore.readthedocs.org/

Syncdelay

Maybe syncdelay will help with jumps in I/O speeds?

http://docs.mongodb.org/manual/reference/parameters/#param.syncdelay

Implement C++ friendly output format: HDF5, Bson, etc

Currently, the data is set to the output data base after processing. However, there is a file builder. The idea behind this is that it's easy to store the data for, e.g., tape and later reprocessing. Currently, the file builder just pickles the file, which is a Python specific serialization routine that can't be easily used in C++.

What is better for longer term storage? Maybe another file format. The options I see are:

Output to HDF5Output file with file builder.
- Pro:
  - Standardized, so nice...
  - C++ and Python bindings
  - Standardly used for data analysis
  - Can convert to ROOT file
- Con:
  - Extra dependency
  - Seems jargony.... wasn't immediately obvious how to do this
Have no file builder and just use mongodump
- Pros:
  - Easy API in Python (C++ also?)
  - Fewer dependencies
  - https://groups.google.com/forum/#!topic/bson/XvobOF8yQiA
- Cons:
  - Less standard and documented that normal file formats
- Protocol buffers (by google)
- Pros:
  - Easy API in Python and C++
- Cons:
  - Yet another format
  - Not typically used for data analysis

The main thing of interest is not to create our own file format (such as on other experiments).

InputDBInterface.get_data_from_doc and Logic.get_samples do similar things

Combine? Avoid two numpy casts.

Mongo Mock special case in XeDB

XeDB.get_min_time asks if something is mongomock due to limitations on mongmock. This is a bug.

Combine samples and indices

Make this a 2xN array. This requires fiddling with PeakFinding and math. However, it should greatly simplify some logic where the index within the array is used.

Explore circular buffers with capped collections

http://docs.mongodb.org/manual/tutorial/use-capped-collections-for-fast-writes-and-reads/

wax .ini files should be store-able as Mongodb docs

Would be nice if the event builder .ini objects were stored in the same DB as the reader .ini objects. A reader .ini object would then link an event builder .ini file that should be used for that data by writing the name of the .ini doc to the run['trigger']['mode'] field. The same will hopefully be done for the data processor.

Right now reader .ini files are in online.run_modes. Trigger ones could go in online.trigger_modes or something like that.

Just a wishlist thing for unification of these parts, so no hurry on this.

Delete follows progress

Delete used time ranges after processing rather than dropping collection at end.

db purge clears mongo

    # TODO: Maybe purge celery too?
    #from celery.task.control import discard_all
    # discard_all()