xenon1t / hax Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 0.0 13.48 MB

Handy Analysis for XENON (reduce processed data)

Python 94.39% C++ 4.78% Makefile 0.83%

analysis-framework

hax's People

Contributors

Stargazers

Watchers

hax's Issues

Hax needs an init

Right know hax does a lot of stuff on import:

load hax.ini and set the required options
load the pax event class (which can trigger compilation)
check the data files still exists

It would be better to have a hax.init(config_file, **options) that you always call before using hax. This way you can ensure the right pax class gets loaded, the right data dir gets used. Now you have to modify internal hax datastructures, then call some internal functions, etc.

Moreover, if the default pax event class isn't the one you want to load, you need to load the one you want yourself, and you get a whole bunch of warnings about loading everything twice.

Multi-run queries with hax.runs.get_run_info

It would be nice if we could query multiple runs at a time with get_run_info, to write something like

lifetimes = hax.runs.get_run_info(my_list_of_run_numbers, 'processor.DEFAULT.electron_lifetime_liquid')

Hax.runs.tags_selection no longer supports exclude-only

The tag version printing logic introduced in hax.runs.tags_selection (https://github.com/XENON1T/hax/blob/master/hax/runs.py#L342) assumes there is always an include tag. If you try to do e.g.

hax.runs.tags_selection(exclude=['bad', 'worse', 'terrible'])

you will now get an error ('NoneType' object is not iterable).

LargestPeakProperties does not return position

Since LargestPeakProperties is missing 'peaks.reconstructed_positions*' in its extra_branches list, its code for finding peak positions (https://github.com/XENON1T/hax/blob/master/hax/treemakers/common.py#L244) never succeeds. Consequently the branches s2_x etc. created by this treemaker are always nan.

Incidentally, this minitree has some branches that clash with Basics (e.g. s2_area_fraction_top), but they do not have the same meaning (in Basics the s2 refers to the main S2, in LargestPeakProperties the largest S2).

Loading minitrees with num_workers can lead to undying processes

When you load minitrees with multiple processes using the num_workers option, but the load crashes or is interrupted, it seems to be possible that some process remain alive even after restarting your notebook kernel. This might be one of the reasons many people have a lot of processes open on the jupyterhub.

I'm not sure what we can do against this, except perhaps investigate and try to isolate the issue. If we have a clear example we can report it upstream to dask or jupyterhub (wherever the problem seems to lie).

`segmentation violation` while `import hax` or `haxer --version`

Hi all,

I have installed pax with following packages and it is perfectly working, but hax throws now a segmentation violation. Do I need to install a special pyROOT or root-numpy version?

wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh
bash Anaconda3-4.3.0-Linux-x86_64.sh
export PATH=/home/l-althueser/anaconda3/bin:$PATH
conda config --add channels defaults
conda config --add channels http://conda.anaconda.org/NLeSC
conda create -q -n pax python=3.4 root=6.04 toolz numpy scipy matplotlib pandas cython h5py numba pip python-snappy pytables scikit-learn rootpy psutil jupyter root_pandas
source activate pax
pip install coveralls nose coverage
pip install mongodbproxy
git clone https://github.com/XENON1T/pax.git
cd pax
python setup.py develop
paxer --version

conda install root_numpy
git clone https://github.com/XENON1T/hax.git
cd hax
python setup.py develop
haxer --version

$ haxer --version
ERROR:ROOT.TUnixSystem.DispatchSignals] segmentation violation
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/bin/haxer", line 6, in <module>
ERROR:stack]   exec(compile(open(__file__).read(), __file__, 'exec'))
ERROR:stack] File "/home/l-althueser/hax/bin/haxer", line 6, in <module>
ERROR:stack]   import hax
ERROR:stack] File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
ERROR:stack] File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1129, in _exec
ERROR:stack] File "<frozen importlib._bootstrap>", line 1471, in exec_module
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/hax/hax/__init__.py", line 13, in <module>
ERROR:stack]   from . import misc, minitrees, paxroot, pmt_plot, raw_data, runs, utils, treemakers, data_extractor, \
ERROR:stack] File "<frozen importlib._bootstrap>", line 2284, in _handle_fromlist
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
ERROR:stack] File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1129, in _exec
ERROR:stack] File "<frozen importlib._bootstrap>", line 1471, in exec_module
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/hax/hax/minitrees.py", line 15, in <module>
ERROR:stack]   from .paxroot import loop_over_dataset, function_results_datasets
ERROR:stack] File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
ERROR:stack] File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1129, in _exec
ERROR:stack] File "<frozen importlib._bootstrap>", line 1471, in exec_module
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/hax/hax/paxroot.py", line 14, in <module>
ERROR:stack]   from pax.plugins.io.ROOTClass import load_event_class, load_pax_event_class_from_root, ShutUpROOT
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/ROOT.py", line 301, in _importhook
ERROR:stack]   return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
ERROR:stack] File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1129, in _exec
ERROR:stack] File "<frozen importlib._bootstrap>", line 1471, in exec_module
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/pax/pax/plugins/io/ROOTClass.py", line 17, in <module>
ERROR:stack]   from pax import plugin, datastructure, exceptions
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/ROOT.py", line 301, in _importhook
ERROR:stack]   return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "<frozen importlib._bootstrap>", line 2284, in _handle_fromlist
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/ROOT.py", line 301, in _importhook
ERROR:stack]   return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
ERROR:stack] File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1129, in _exec
ERROR:stack] File "<frozen importlib._bootstrap>", line 1471, in exec_module
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/pax/pax/plugin.py", line 16, in <module>
ERROR:stack]   from pax import dsputils
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/ROOT.py", line 301, in _importhook
ERROR:stack]   return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "<frozen importlib._bootstrap>", line 2284, in _handle_fromlist
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/ROOT.py", line 301, in _importhook
ERROR:stack]   return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "<frozen importlib._bootstrap>", line 2237, in _find_and_load
ERROR:stack] File "<frozen importlib._bootstrap>", line 2226, in _find_and_load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
ERROR:stack] File "<frozen importlib._bootstrap>", line 1129, in _exec
ERROR:stack] File "<frozen importlib._bootstrap>", line 1471, in exec_module
ERROR:stack] File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
ERROR:stack] File "/home/l-althueser/pax/pax/dsputils.py", line 9, in <module>
ERROR:stack]   nopython=True)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/decorators.py", line 176, in wrapper
ERROR:stack]   disp.compile(sig)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/dispatcher.py", line 532, in compile
ERROR:stack]   cres = self._compiler.compile(args, return_type)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/dispatcher.py", line 81, in compile
ERROR:stack]   flags=flags, locals=self.locals)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 693, in compile_extra
ERROR:stack]   return pipeline.compile_extra(func)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 350, in compile_extra
ERROR:stack]   return self._compile_bytecode()
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 658, in _compile_bytecode
ERROR:stack]   return self._compile_core()
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 645, in _compile_core
ERROR:stack]   res = pm.run(self.status)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 228, in run
ERROR:stack]   stage()
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 583, in stage_nopython_backend
ERROR:stack]   self._backend(lowerfn, objectmode=False)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 538, in _backend
ERROR:stack]   lowered = lowerfn()
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 525, in backend_nopython_mode
ERROR:stack]   self.flags)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/compiler.py", line 811, in native_lowering_stage
ERROR:stack]   lower.lower()
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/lowering.py", line 141, in lower
ERROR:stack]   self.library.add_ir_module(self.module)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/targets/codegen.py", line 158, in add_ir_module
ERROR:stack]   self.add_llvm_module(ll_module)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/targets/codegen.py", line 170, in add_llvm_module
ERROR:stack]   self._optimize_functions(ll_module)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/numba/targets/codegen.py", line 88, in _optimize_functions
ERROR:stack]   fpm.run(func)
ERROR:stack] File "/home/l-althueser/anaconda3/envs/pax_p34/lib/python3.4/site-packages/llvmlite/binding/passmanagers.py", line 127, in run
ERROR:stack]   return ffi.lib.LLVMPY_RunFunctionPassManager(self, function)

ImportError: cannot import name 'get_aqm_pulses'

Seeing this error running within cax in pax_v6.2.0 environment (hax v1.2.0):

Traceback (most recent call last):
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/cax-4.10.5-py3.4.egg/cax/main.py", line 131, in main
    task.go(args.run)
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/cax-4.10.5-py3.4.egg/cax/task.py", line 65, in go
    self.each_run()
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/cax-4.10.5-py3.4.egg/cax/tasks/process_hax.py", line 101, in each_run
    self.run_doc['detector'])
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/cax-4.10.5-py3.4.egg/cax/tasks/process_hax.py", line 53, in _process_hax
    init_hax(in_location, pax_version, out_location)   # may initialize once only
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/cax-4.10.5-py3.4.egg/cax/tasks/process_hax.py", line 25, in init_hax
    minitree_paths = [out_location])
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/hax-1.2.0-py3.4.egg/hax/__init__.py", line 66, in init
    update_treemakers()
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/hax-1.2.0-py3.4.egg/hax/minitrees.py", line 123, in update_treemakers
    __import__('hax.treemakers.%s' % module_name, globals=globals())
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/ROOT.py", line 301, in _importhook
    return _orig_ihook( name, *args, **kwds )
  File "/project/lgrandi/anaconda3/envs/pax_v6.2.0/lib/python3.4/site-packages/hax-1.2.0-py3.4.egg/hax/treemakers/trigger.py", line 6, in <module>
    from hax.trigger_data import get_aqm_pulses
ImportError: cannot import name 'get_aqm_pulses'

Incorrect PMT plot scale in physical layout

As it says here: https://github.com/XENON1T/hax/blob/master/hax/pmt_plot.py#L8, if you use plot_on_pmts with the physical geometry, the color scale shown in the color bar applies to only one of the plots. The other plot has its own color scale, which can be different. The current workaround is to specify vmin and vmax manually, but this should be fixed.

Moreover, the API for pmt_plot leaves a lot to be desired (e.g. have to specify color and size, can't specify scalar for one and array for the other, etc.).

Check if old pmt_plot issues have been resolved

I found this in pmt_plot.py:

## Known issue's I'm to lazy to fix right now:
##  - on physical layout, color and/or size probably not on same scale in two subplots 
##    unless vmin&vmax are specified explicitly.
##  - Have categorical labels event if _channel present. Make digitizer obey _channel suffix convention.

but don't recall if I since fixed these :-) Worth checking at some point.

Slow control interface can't find variable list

from hax import slow_control
slow_control.init_sc_interface()

Gives:

KeyError: 'sc_variable_list'

Error in looproot

Here's a brand new bug report :)

https://gist.github.com/ErikHogenbirk/5da9801d81b253ef7717

Seems it asks for hax CONFIG, but is that still in the latest hax version?

Get more XENON1T run info

Currently only some very basic info is queried from the db -- name, number and source. We should get several more fields for the analyst to use, such as timestamp, tags, duration, ... . There should also be a utility function to get all the info from a specific run (the complete json).

S2 x-y correction should be applied in basics

We want to start migrating corrections from pax to hax wherever possible. Since it seems like pax is skipping the x-y correction anyway, we can start with that one as a test case.

This update should set a basic framework for applying up-to-date corrections similar to how it is done in cax. This should be relatively easy for all the multiplicative area corrections.

Duplicate columns when using Treemaker

The 'data' from hax now has duplicate column names named: 'índex' . When trying to convert pandas to numpy this gives an error that can be solved with:

# Convert dataframe to numpy array, so we don't need .values all the time
data = data.T.groupby(level=0).first().T
data = data.to_records(index=False)
`
Probably every hax.minitrees.TreeMaker one uses adds one 'index' column. There should always only be one 'índex' column in 'data'.

Flexible pax_version_policy for patch releases

Pax patch releases don't always introduce new processing functionality. Hence. you would like to be allowed to mix patch versions -- 6.1.0 and 6.1.1 for example -- by saying pax_version_policy='6.1'.

However, currently you must either choose an exact pax version with e.g. pax_version_policy='6.1.0' (excluding 6.1.1 datasets) or take the latest available with the default pax_version_policy='latest' (including pax 5.x.x datasets for some runs).

Fields missing from LargestPeakProperties minitree

When making minitrees of pax v6.1.0 data on midway with LargestPeakProperties the *_x and *_y fields are missing.

To reproduce:

import hax
hax.init(pax_version_policy='6.1.0')
data = hax.minitrees.load(2047, treemakers=['LargestPeakProperties'], force_reload=True, num_workers=1)
data.keys()

Missing Branches in Output

The lone_hit_* branches are missing in some LargestPeakProperties minitrees, causing problems in the MC workflow. Can reproduce with files on Midway in:

/project/lgrandi/pdeperio/161206-hax_debug

by running:

source activate pax_v6.1.1
HAXPYTHON="import hax; "
HAXPYTHON+="hax.init(main_data_paths=['/project/lgrandi/pdeperio/161206-hax_debug'], minitree_paths=['.'], pax_version_policy = 'loose'); "
HAXPYTHON+="hax.minitrees.load('Xenon1T_TPC_Kr83m_00000_g4mc_NEST_Patch_pax', ['LargestPeakProperties']);"; python -c "${HAXPYTHON}" # Seems OK
HAXPYTHON+="hax.minitrees.load('Xenon1T_TPC_Kr83m_00183_g4mc_NEST_Patch_pax', ['LargestPeakProperties']);"; python -c "${HAXPYTHON}" # Missing branches

producing the two files:

Xenon1T_TPC_Kr83m_00000_g4mc_NEST_Patch_pax_LargestPeakProperties.root # Seems OK
Xenon1T_TPC_Kr83m_00183_g4mc_NEST_Patch_pax_LargestPeakProperties.root # Missing branches

Add time to run end to Proximity (or other minitree)

@tunnell and @coderdj would like this to avoid the dependency on hax in lax due to XENON1T/lax#43. As with any other change to a minitree, we should first update all minitrees using hax in a branch, then copy them over and only then merge.

Job hangs with "Exceeded step memory limit"

Most jobs are hanging after hax completes with error:

slurmstepd-midway2-0091: error: Exceeded step memory limit at some point.

for example in this log:

/project2/lgrandi/xenon1t/cax/5803_v6.4.2/5803_v6.4.2_24432875.log

causing them to occupy the batch queue for much longer than necessary.

The minitrees were created successfully, so seems like an issue with clearing memory. Will also contact RCC for suggestions.

Add init option: folder to save minitrees in (feature request)

Right now the minitrees get created in the folder where hax is called. To keep a nice and clean work space it would be better if one could specify where hax creates the minitrees. The standard option would still be: in this folder, but the user could specify one folder that holds all the minitrees that one uses.

Hax would also need to find the minitrees in the user specified folder.

Error when using option 'use_runs_db=False'

There is an option for hax.init() to not use the runs database. However, this means that update_datasets does not do anything so that datasets will be None. Perhaps I am missing something here, but it seems that this means that you can specify not to use the runs database, but if you attempt to load anything, hax will give you an error since there is no variable listing datasets.

For now, I made a workaround by building this variable using the files in the main_data_paths, setting this as the run name. I can implement this and create a pull request, but maybe I am missing some secret configuration... Any thoughts?

Error loading Hit library when building minitrees

I'm getting an error when loading some minitrees...
https://gist.github.com/ErikHogenbirk/38f8c047932023b0520f521b834fe42c
Hax is complaining about RuntimeError: failed to load the library for 'std::vector<Hit>' @ 9ed875416084b362. I have seen this error with simulated data, but also in real data. Anyone have any idea what might be the problem?

mc v 0.1.7, pax 6.2.1, hax 1.3.0

force_reload=True on a single dataset makes minitree twice

We must first make the minitrees from the first dataset given to hax.minitrees.load, since the dask multiprocessing only works if it knows which variables to expect. Since these both get the force_reload option, using a single dataset with minitrees.load and force_reload causes a double remake.

Not a big issue but hints at a bigger problem: having to make the minitrees to know which variables it will produce. This is inconvenient for other reasons too (see e.g. #47).

TotalProperties should only look at TPC peaks

Currently TotalProperties computes the total peak area of an event by looping over all peaks: https://github.com/XENON1T/hax/blob/master/hax/treemakers/common.py#L176

This includes "peaks" from acquisition monitor channels, in particular the analog summed bottom array, which has peaks with a non-negligible area. This is bad, because it makes a quantity like (s1+s2)/total_area dependent on whether the analog sum waveform triggered or not, and cause nonlinear behaviour near the threshold.

bumpversion

Implement bumpversion to keep track of versions.

pandas RuntimeWarning: divide by zero

Following error in a few runs causing jobs to hang:

/project/lgrandi/anaconda3/envs/pax_v6.4.2/lib/python3.4/site-packages/pandas/computation/align.py:98: RuntimeWarning: divide by zero encountered in log10
  ordm = np.log10(abs(reindexer_size - term_axis_size))

in e.g. run 5414:

/project2/lgrandi/xenon1t/cax/5414_v6.4.2/5414_v6.4.2_24265319.log

Add second S2 width to Basics minitrees?

When we want to do a single scatter cut, we mostly cut on the S2 size in some way. However, based on explorations in this note by Tianyu:
https://xecluster.lngs.infn.it/dokuwiki/lib/exe/fetch.php?media=xenon:xenon1t:sim:notes:jhowlett:main_singlescatter_simplified_copy_feb_20_2017.html
it seems that a lot of the second-largest S2s could just be single-electron pile-up and that they can be identified by their with. I could imagine a single scatter cut where we cut not only on the second S2 size, but also on the width. For this it would be great if a property largest_other_s2_width could be added. It's not critical, it's just a nuisance if we want to use this parameter and have to add it by hand all the time. Let me know what you think. If you disagree I'll drop the issue.

Trigger data access

The trigger produces a trigger data file (which will change format soon, see XENON1T/pax#343) with useful information such as the dark rate in each PMT over time. It would be useful to have some common access tools for this file. For example we may want to have a cut/TreeMaker which tells you if the PMTs contributing to the main S1 have an unusually high dark rate.

Add end of run time to a minitree

This removes lax dependency on hax. @coderdj

hax.init() gives error

https://gist.github.com/ErikHogenbirk/663a3511a272c12098ce

Just pulled the latest version this afternoon, using examples/07_check_data.ipynb, only first cell.
Relevant code:

import hax # (works fine)
hax.init()

Version requirement mismatch

Get this error:

KeyError: 'rz_position_distortion_map'

using hax v1.4.3 with pax_v6.2.0. I guess since it's not implemented back then.

Can we try to specify version requirements for all our packages (hax, lax, cax) more stringently?

Make new examples

The XENON100 examples we used at the analysis workshops are out of date by now; it would be nice to distill a few new examples from some of the notes that people have published.

Strange config handling in hax.raw_data

When you use some of the hax.raw_data functions, you often have to specify a lot of custom config options to avoid pax making a root file or erroring on event proxies. E.g.:

config = dict(pax=dict(pre_output=[], encoder_plugin=None, output='Dummy.DummyOutput'))
for event in hax.raw_data.process_events(some_run_name, config_override=config):
    ...

For the Muon veto this is even worse, currently the XENON1T (i.e. TPC) config is used automatically even if hax.init(detector='muon_veto'). You can work around this with e.g.

config = dict(pax=dict(pre_output=[], dsp=[], encoder_plugin=None, output='Dummy.DummyOutput'))
from copy import deepcopy
config = deepcopy(hax.utils.combine_pax_configs(mv_config, config))

for event in hax.raw_data.process_events(some_run_name, config_override=config):
    ...

but even this gives errors if you did not blank the dsp group. Also the use of deepcopy is tricky, if you don't pax will modify the config as it loads it (to insert the encoder plugin group) and you eventually end up in a mess.

Hitfinder diagnostic plots

It would be nice if there was a quick way to get hitfinder diagnostic plots (i.e. the "PMT waveforms", but also with the hitfinder's interpretation) directly in hax.

add event number to the acquisition monitor pulses

This tool is a nice one for debugging, but it will be nice if the return object of get_aqm_pulses could have an event number. For example a tuple like: return {k: ( event_number,np.array(v) ) for k, v in aqm_signals.items()}.

Peak, Hit, etc. data access

It would be nice to include a bit of code to extract all peaks, hits, etc. from a file, with an extra column which tells you which event they came from. This doesn't have to be very fast, as it is not a very common analysis task, though ideally it should not fry your RAM (so maybe allow for some chunking).

The peaks in particular are useful to plot in exploratory analyses (see e.g. the graphs in Erik's GXe note, even though those were only produced for the largest peaks since that's what hax currently allows you to do conveniently). The hits and pulses are useful for gain calibrations and hitfinder efficiency studies.

You can write your own code to do this of course, e.g. with `hax.paxroot.looproot; I and others have some code like that -- a nice version of this would be a good addition to hax.

Auto bypass dask when loading single dataset

as suggested by @JelleAalbers in XENON1T/cax#116 (comment)

Dump minitree to file

I've used the cache_file option in minitrees.load, and I was thinking that this is maybe something you'd also want for a normal minitree. For instance, suppose you apply a bunch of cuts and you keep only a small number of events (like the NR single scatter band), and you want to share this with a collegue (or some other script if you don't have any friends). It'd be nice to be able to load and store minitrees in a standard haxy way.
I think it would be quite easy to make, if people agree that this could be useful I'll make the code and make a pull request. If not then I'll shut up about it and just use pickle for my own code :)

"Could not find run number" error

Seems to be a benign error running on MC (command, pax_version_policy = 'loose'):

Could not find run number for Xenon1T_TPC_Kr83m_96.000000_g4mc_NEST_Patch_pax, got exception <class 'IndexError'>: index 0 is out of bounds for axis 0 with size 0. Setting run number to 0.

Would be nice to suppress/fix this to clean up logs and facilitate (real) error searching.

hax.raw_data.inspect_events focus='s1' still gives focus on s2

Not much more to say. This also applies to inspect_events_from_dataframe

can't cope with trigger information

When calling hax.raw_data.inspect_events(198,0) I get an error about the name of the trigger info:

ValueError: Invalid file name: /xetemp/xenon1t/160425_1103/trigger_monitor_data.zip. 
Should be tpcname-something-firstevent-lastevent-nevents.zip

I guess this is due to the recent addition of trigger zipped bsons?
pax 4.9.1, hax 0.2, running on xecluster06, trying to read from /xetemp/xenon1t/

Error with waveform plot

I'm not really sure if this is a pax or hax issue, but I'm unable to plot from my notebook. Will try plotting directly from pax.

Input: https://gist.github.com/ErikHogenbirk/25d4f8808ce5fb70b51fcadca99f4a2a
Output: https://gist.github.com/ErikHogenbirk/33b0e735ac0677a6abdeb39912c1ca3d

It seems the data is found and read; there is a checkpulses warning at the beginning. Then it throws some ROOT error and everything goes wrong.

pax 4.8.0
hax 0.1

hax install on xecluster

I tried installing hax on my xecluster account, but I got two errors:

with

git clone http://github.com/XENON1T/hax.git

I get

Initialized empty Git repository in /home/wittweg/hax/.git/
error: Failed connect to github.com:443; Operation now in progress while accessing https://github.com/XENON1T/hax.git/info/refs

fatal: HTTP request failed

So I tried a workaround by just doing scp from my local pc. If I run

python setup.py develop

the installation fails with:

running develop
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:
[Errno 13] Permission denied: '/archive_lngs/common/anaconda3/envs/pax440r5/lib/python3.4/site-packages/test-easy-install-16436.write-test'
The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:
/archive_lngs/common/anaconda3/envs/pax440r5/lib/python3.4/site-packages/
Perhaps your account does not have write access to this directory? If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account. If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.

For information on other options, you may wish to consult the
documentation at:

https://pythonhosted.org/setuptools/easy_install.html

Please make the appropriate changes for your system and try again.

slow_control database connection problems

file hax/hax/slow_control.py L176.

In the for loop often the variable entry is not a dict type variable. It's expected a kind of ntupla where the entry "timestampseconds" (datetime.utcfromtimestamp(entry['timestampseconds'])) is supposed to be present but very often this is not true.
I don't understand the reasons (some errors in the sc database entries?) I fix this problem by insert in the for loop a check on the entry variable type:
if not isinstance(entry,dict):
continue

This fix the problem, but it's not the only one.
Several time when I try to process the dataset with pax_v6.5.0, cax by means of this hax function try to connect to the slow_control database and only after several attempts (40-50 times) is able to connect and read the Voltage values of each PMT in the AddGains function.

After few tentatives the cax starts to process the run without error messages.

Add init option: mongodb = False

For people who want to run hax locally on their laptop without using the rundb info it would be nice to have an init option for hax to work without the db and thus without setting a mongodb password. (this works with experiment = 'XENON100', but not anymore with experiment = 'XENON1T')

Even though hax is created with analysis facilities in mind, making a small option that lets you work locally is still a very nice feature that me and others use.

hax cannot find data only when notebook is in certain folder

When looking for the file 'xenon.root', hax successfully finds the file when the notebook is run in directory /my/dir/, but not when run in /my/dir/subdir/. There are no event class files or minitrees in either folder. Here is the traceback of the error.

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-5d76aadf63af> in <module>()
      1 # dataset = 'radon_cut_xe100_110423_1252'
----> 2 data = hax.minitrees.load('xenon', treemakers=['Basics'], force_reload= True);

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/minitrees.py in load(datasets, treemakers, force_reload)
    151         dataframes = []
    152         for dataset in datasets:
--> 153             minitree_path = get(dataset, treemaker, force_reload=force_reload)
    154             new_df = pd.DataFrame.from_records(root_numpy.root2rec(minitree_path))
    155             dataframes.append(new_df)

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/minitrees.py in get(dataset, treemaker, force_reload)
    115         # We have to make the minitree file
    116         # This will raise FileNotFoundError if the root file is not found
--> 117         skimmed_data = treemaker().get_data(dataset)
    118         print("Created minitree %s for dataset %s" % (treemaker.__name__, dataset))
    119 

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/minitrees.py in get_data(self, dataset)
     44         """Return data extracted from running over dataset"""
     45         loop_over_dataset(dataset, self.process_event,
---> 46                           branch_selection=hax.config['basic_branches'] + list(self.extra_branches))
     47         self.check_cache(force_empty=True)
     48         if not hasattr(self, 'data'):

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/paxroot.py in loop_over_datasets(datasets_names, event_function, branch_selection)
     84         except Exception as e:
     85             rootfile.Close()
---> 86             raise e
     87 
     88 # For backward compatibility

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/paxroot.py in loop_over_datasets(datasets_names, event_function, branch_selection)
     79                 t.GetEntry(event_i)
     80                 event = t.events
---> 81                 event_function(event)
     82         except StopEventLoop:
     83             rootfile.Close()

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/minitrees.py in process_event(self, event)
     38 
     39     def process_event(self, event):
---> 40         self.cache.append(self.extract_data(event))
     41         self.check_cache()
     42 

/project/lgrandi/anaconda3/envs/pax_head/lib/python3.4/site-packages/hax-0.2-py3.4.egg/hax/treemakers/common.py in extract_data(self, event)
     84                 continue
     85             if p.detector == 'tpc':
---> 86                 peak_type = p.type
     87             else:
     88                 # Lump all non-lone-hit veto peaks together as 'veto'

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 0: invalid continuation byte

Can't find variables CSV file

Why split the variables up?

OSError: File b'/Users/tunnell/Work/anaconda/envs/xenon_stack/lib/python3.4/site-packages/hax-1.3.0-py3.4.egg/hax/sc_variables.csv' does not exist

I don't think you included this file in the setup.py @JelleAalbers

Making minitrees on the batch queue

It would be really nice if we could leverage the batch queue to make minitrees for us... with an easy flag like batch_queue=True or something rather than asking around who has the latest version of the script with the right qsub/bash incantations.

Add more variables to proximity trees

We need at least "last_busy_type" = on/off and "last_hev_type" = on/off.

We might also want nearest/previous type for convenience.

This should not be needed for all good data. But for old data with DAQ bug we need it to reject partial events.

remove weird index column in load_single_dataset?

Getting error:

ValueError: field 'index' occurs more than once

when using load_single_dataset in lax.

Should this hack in load() to remove the index column go in load_single_dataset() instead?

xenon1t / hax Goto Github PK

hax's People

Contributors

Stargazers

Watchers

hax's Issues

Recommend Projects

Recommend Topics

Recommend Org