cta-observatory / lstmcpipe Goto Github PK

View Code? Open in Web Editor NEW

7.0 11.0 15.0 5.56 MB

Scripts to analyse MC files on LST cluster at La Palma

Home Page: https://cta-observatory.github.io/lstmcpipe/

License: MIT License

Python 99.75% Shell 0.25%

cta lst-1 ctapipe lstchain slurm

lstmcpipe's Introduction

lstMCpipe

Scripts to ease the reduction of MC data on the LST cluster at La Palma. With this package, the analysis/creation of R1/DL0/DL1/DL2/IRFs can be orchestrated.

Contact: Thomas Vuillaume, thomas.vuillaume [at] lapp.in2p3.fr Enrique Garcia, garcia [at] lapp.in2p3.fr Lukas Nickel, lukas.nickel [at] tu-dortmund.de

Cite us 📝

If lstMCpipe was used for your analysis, please cite:

https://doi.org/10.48550/arXiv.2212.00120

@misc{garcia2022lstmcpipe,
      title={The lstMCpipe library},
      author={Enrique Garcia and Thomas Vuillaume and Lukas Nickel},
      year={2022},
      eprint={2212.00120},
      archivePrefix={arXiv},
      primaryClass={astro-ph.IM}
}

in addition to the exact lstMCpipe version used from https://doi.org/10.5281/zenodo.6460727

You may also want to include the config file with your published code for reproducibility.

Install 💻

As an user:

For lstmcpipe >= 0.10.3, the preferred installation should be conda:

conda install lstmcpipe

Former versions:

VERSION=0.10.1  # change as desired
wget https://raw.githubusercontent.com/cta-observatory/lstmcpipe/$VERSION/environment.yml
conda env create -f environment.yml
conda activate lstmcpipe
pip install lstmcpipe==$VERSION

As a developer:

git clone https://github.com/cta-observatory/lstmcpipe.git
cd lstmcpipe
conda env create -n lstmcpipe_dev -f environment.yml
conda activate lstmcpipe_dev
pip install -e .
pre-commit install

This will setup a pre-commit hook: Given that you are in the right enviroment, it will run and format files you are about to commit with black. (You need to stage the changes again after that). This ensures the formatting of the code follows our guidelines and there is less work dealing with the code checker in the CI.

Requesting a MC analysis 📊

You may find a longer, more detailed, version of these steps in our documentation.

You may find the list of already run productions in the documentation. Please check in this list that the request you are about to make does not exist already!

To request a MC analysis:

Make sure to be part of the github cta-observatory/lst-dev team. If not, ask one of the admins.
Clone the repository in the cluster at La Palma.
Create a new branch named with you prodID
Make a new directory named date_ProdID in the production_configs dir (have a look at the production_configs/template_prod as an example)
Generate your config (see below)
Open a pull request into lstMCpipe with a clear description (probably the same as in the readme of your config dir)
The requested config must contain:

a lstchain config file (please provide an exhaustive config that will help others and provide a more explicit provenance information)
a lstmcpipe config file (to generate it, please refer to the documentation)
a readme with a short description of why you require this analysis to be run. Do not add information that should not appear publicly (such as source names) here. If you are requesting a production for a specific new source, please edit this table on LST wiki. Also add the command line to generate the lstmcpipe config, that will help debugging.

The proposed configuration will be tested for validity by the continuous integration tests and we will interact with you to run the analysis on the cluster at La Palma.

Depending on the number of requests, we may give priorities.

Need help? Join the CTA North slack and ask for help in the

Launch jobs 🚀

To generate your lstmcpipe configuration file, use lstmcpipe_generate_config command. If the type of production you want is not listed in the existing ones, you may create your own PathConfig class from an existing one, or generate a config from an existing prod type and edit the file manually.

Once you have your configuration file, you way launch the pipeline with the described stages in the config using:

lstmcpipe -c config_MC_prod.yml -conf_lst lstchain_*.json [-conf_cta CONFIG_FILE_CTA] [-conf_rta CONFIG_FILE_RTA] [--debug] [--log-file LOG_FILE]

lstmcpipe is the orchestrator of the pipeline, it schedules the stages specified in the config_MC_prod.yml file. All the configuration related with the MC pipe must be declared in this file (stages, particles to be analysed, zenith, pointing, type of MC production...).

Pipeline-specific configuration options (such as cleaning or model parameters) are declared in a different configuration file, which is passed via the options -conf_lst/-conf_cta/-conf_rta.

Note: You can always launch this command without fear; there is an intermediate step that verifies and shows the configuration that you are passing to the pipeline.

Note that a complete pipeline still requires quite a lot of resources. Think about other LP-IT cluster users.

lstmcpipe's People

Contributors

Stargazers

Watchers

Forkers

garciagenrique misabelber jurysek giorgio-pirola seiyanozaki francacassol alvmas aaguasca cta-epfl mireianievas vuillaut gabriele-panebianco-inaf mdebony katagirihide voutsi

lstmcpipe's Issues

Update lstchain version

Not really a reason for us to sit on 0.7.3

Enable running the pipeline as any user (not only lstanalyzer)

DL2 training files produced

Hi
Looking at last productions, I saw that DL2 training files are produced and it is not clear to me why is that. RF are trained on DL1, only DL2 test files should be produced, no?

trainpipe memory limit

If images are processed, the standard memory might not be enough.
Maybe we should apply some heuristic or just set some large value

Add cpu info in the log to check cluster performances

pass as an optional argument the in the `r0_to_dl3` script the bool value for `--no-image`

'Remaining' little improvements

Test writing full log to .yml instead of .txt (https://stackoverflow.com/questions/12470665/how-can-i-write-data-in-yaml-format-in-a-file)
Add a "Done {stage}" for better traceability of slurm outputs
Move slurm + logs to /analysis_logs and/or
add .o and .e files to remaining sbatched jobs
copy config to /DL2 (Issue #12 )

Change all `hiperta_` to `hipecta_`

Misleading.

configuration file in batched jobs does not contain full path

Reenable N_R0_PER_DL1_JOB

As far as I can tell, this is not actually used when calling r0 to dl3.
Might be useful to set this in the config in order to balance the load on cluster

Misleading information about directories being removed/created

Dear developers,

The information that appears before confirming the execution of the command lstmcpipe is misleading: The information about the directories that will be removed/created does not match the directories being removed/created.

For instance, if I use only stages up to DL1, the message says that RF, DL2, and IRF directories will also be removed/created.

take out completely `running_analysis` directory and just put everything into `DL1`

Merging of large files

Full workflow crashes because of the error that it is rose at the lstchain_merge_files, (it is indeed an lstchain issue)

Create entry point of r0_to_dl3

dl2 training and testing share the same `.o` and `.e` files

Include a 'check_all_jobs'

for not checking manually all the directories

Copy config in DL2 dir

In the full workflow, copy config in final DL2 dir.

Use ctapipe for dl1 merging

I dont know if the lstchain tool supports more than the ctapipe merge tool, but if everything we need is in ctapipe, it might be worth to switch tools?

core scripts

trying to run a lstmcpipe_lst_core_r0_dl1 standalone, the script returns this error

@LukasNickel could you recall me please why we are using this variable ?

$ lstmcpipe_lst_core_r0_dl1 -c /fefs/home/enrique.garcia/software/lstmcpipe/lstmcpipe/lstchain_standard_config_v082_MC_Tel1.json  -f /fefs/aswg/workspace/enrique.garcia/workflow_r0_dl2_lst/running_analysis/20200629_prod5_trans_80/electron/zenith_20deg/south_pointing/20220114_v0.8.4_prod5_trans_80_test_v084_test_merge/file_lists_training/training_0.list --output_dir /fefs/aswg/workspace/enrique.garcia/workflow_r0_dl2_lst/running_analysis/20200629_prod5_trans_80/electron/zenith_20deg/south_pointing/20220114_v0.8.4_prod5_trans_80_test_v084_test_merge/DL1/training
Traceback (most recent call last):
  File "/fefs/aswg/software/conda/envs/lstchain-v0.8.4/bin/lstmcpipe_lst_core_r0_dl1", line 8, in <module>
    sys.exit(main())
  File "/home/enrique.garcia/.local/lib/python3.8/site-packages/lstmcpipe/scripts/script_batch_filelist_lst.py", line 40, in main
    task_id = int(environ["SLURM_ARRAY_TASK_ID"])
  File "/fefs/aswg/software/conda/envs/lstchain-v0.8.4/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'SLURM_ARRAY_TASK_ID'
(lstchain-v0.8.4) enrique.garcia@cp01 job_logs $ ipython
import Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import os

In [2]: from os import environ

In [3]: environ["SLURM_ARRAY_TASK_ID"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [3], in <module>
----> 1 environ["SLURM_ARRAY_TASK_ID"]

File /fefs/aswg/software/conda/envs/lstchain-v0.8.4/lib/python3.8/os.py:675, in _Environ.__getitem__(self, key)
    672     value = self._data[self.encodekey(key)]
    673 except KeyError:
    674     # raise KeyError with the original key value
--> 675     raise KeyError(key) from None
    676 return self.decodevalue(value)

KeyError: 'SLURM_ARRAY_TASK_ID'

Use Logging lib

Move or fix scripts for substages

Currently it is expected to use the onsite_mc_r0_to_dl3 for all stages.
Thats not really apparent from the naming and if the other scripts are not meant to be used independently, they probably dont need that much argparse boiler code

Stage "r0_to_dl1" tries to find merged files despite not using stage "merge_and_copy_dl1"

Dear developers, I ran lstmcpipe using only the stage "r0_to_dl1" in the config file. But the following error appeared:

Traceback (most recent call last): File “/fefs/aswg/workspace/arnau.aguasca/anaconda3/envs/lstmcpipe_dev/bin/lstmcpipe”, line 33, in <module> sys.exit(load_entry_point(‘lstmcpipe’, ‘console_scripts’, ‘lstmcpipe’)()) File “/fefs/aswg/workspace/arnau.aguasca/github_repo/lstmcpipe/lstmcpipe/lstmcpipe_start.py”, line 232, in main dl1_output_dir, all_particles, gamma_offs File “/fefs/aswg/workspace/arnau.aguasca/github_repo/lstmcpipe/lstmcpipe/workflow_management.py”, line 84, in create_dl1_filenames_dict ] = next(Path(dl1_directory.format(particle)).glob(“*training*.h5”) StopIteration

I think this error appears because create_dl1_filenames_dict tries to find the dl1 merged files, but as I only ran stage "r0_to_dl1", they are not produced. Is this behaviour the expected one for the script?

Track problems to solve when merging rta-workflow

Tons of duplicated code.n. make a main code that calls each stage separately and that can can manage correctly weather is hiperta or lstchain.
Development suggestion @vuillaut : Should I
- put everything in data_management.py.
Add lstchain.__version__ to the prod_id to track lstchain modifs.
Modify the way the global PROD_ID is passed (two files needs it and code is duplicated/redundant - prune to errors).
Last version of standard config files.
standarize global paths and create an easy way of passing ALL the global info --> single dict ?
- Where are the r1*.h5 going to be stored ?
Move slurm jobs as well as well as logs once workflow has finished
Copy both configs to analysis_logs. Just the first one used (hiperta) is shuttled.

Check that rta env does or does not need source environment in the merging stage

Same as #75

"Standardise" source environment

r0_dl1 stage source environment is set through the core_list.sh script. User doesn't have knowledge about this. Standardise environments by modifying the .sh script every time the full workflow is run with a grep/sed ? (any other suggestion @vuillaut ?)

Multiple sky pointings

We should make it possible to run on multiple zeniths at once instead of starting multiple PRs.
Not super trivial in the current structure, but will be needed at some point anyway

Clean up config parser args

The r0 to dl3 script defines a lstchain config and a hiperta config, that are both used for r0 to dl1 processing depending on the pipeline. I think it is cleaner to give the config paths per step (e.g. r0_to_dl1 config, dl1_to_dl2 config) or maybe just put the paths in the main config as opposed to cli arguments.

Allow HiPeRTA args to be passed from the lstmcpipe config file

keep_rta_file and debug_mode cannot be configured otherwise (set default values that are almost always used, though).
Related with #150

reorganizer for the RTA

Can it me removed from here?
It this the right place for it?

There are issues I solved in the hiperta_stream repo and should solve here as well otherwise...

Standardize the use of the `source_env` throughout all the workflow

All in the title.

Important to be set correctly at the r0_to_dl1 stage (lstchain/rta workflow ?).
Not declared at merge_ and dl1_to_dl2 stages.
Declared again in train_pipe

Support different pipelines

Currently lstchain is supported officially. hiperta is included as well and ctapipe will be added soon.
Instead of duplicating code (like its done with the hiperta code currently), we should unite it into one r0_to_dl1 part as the differences are very minor

Define a test/demo case

The current code seems to be assume that all MC files will be processed and they are all processed.
A common test folder might be useful

Optimise number of proton files to be analysed per r0_to_dl1 batched job

So that proton jobs won't need slurm -p long argument

Need to update lstchain config file for DL1AB processing

The lstchain config file https://github.com/cta-observatory/lstmcpipe/blob/master/lstmcpipe/standard_configs/lstchain_standard_config_v091.json does not include the MC tuning (PSF and NSB) information as written in https://github.com/cta-observatory/cta-lstchain/blob/master/lstchain/data/lstchain_dl1ab_tune_MC_to_Crab_config.json.

This is essential as this is THE information required for DL1AB processing.

Correctly declare directories for mchdf5 files

MC converted to hdf5 are R1, but are they needed to be stored in a /R1/ directory ? If so, r1_to_dl1 (RUNNING_DIR var) code need to be changed and set as agreed --> (R1 for mchdf5 and DL0 for simtel MCs).

Parse RTA version somehow

See with @devine73 (Pierre aubert)
Make RTA version (r0_dl1) purely lstchain independent --> see how it can be parsed the lstchain version before launching the full pipeline

`n_files_per_dl1` not working as intended

I think the intention was to do:
r0_to_dl1 list_of_r0 with len(list_of_r0) = n_files_per_dl1 and get a single merged dl1.
At least this is what the name suggests.

The way it works right now rather corresponds to a n_files_per_job - but we still get 1 dl1 file for each r0 file and then merged them.

To be discussed and corrected.

Bug checking merge_and_copy_dl1 jobs

launching both prod5 and prod5_trans_80 pipeline, there is a test that is triggered.

Found why and correct it, as it should not check this (not this way at least)

There are errors in the following log files:
 ['job27_test.e', 'job21_train.e', 'job29_test.e', 'job41_train.e', 'job24_train.e', 'job25_test.e', 'job23_test.e', 'job6_test.e', 
'job18_train.e', 'job18_test.e', 'job17_train.e', 'job30_train.e', 'job22_train.e', 'job37_test.e', 'job5_test.e', 'job42_test.e', 
'job39_train.e', 'job37_train.e', 'job30_test.e', 'job45_train.e', 'job1_test.e', 'job17_test.e', 'job43_test.e', 'job49_train.e', 
'job20_train.e', 'job34_test.e', 'job42_train.e', 'job32_test.e', 'job3_train.e', 'job9_train.e', 'job36_test.e', 'job47_test.e', 
'job49_test.e', 'job11_test.e', 'job35_train.e', 'job45_test.e', 'job16_train.e', 'job7_train.e', 'job23_train.e', 'job46_test.e', 
'job26_test.e', 'job40_test.e', 'job43_train.e', 'job10_train.e', 'job13_test.e', 'job47_train.e', 'job32_train.e', 'job20_test.e', 
'job25_train.e', 'job16_test.e', 'job14_test.e', 'job39_test.e', 'job15_train.e', 'job40_train.e', 'job7_test.e', 'job4_test.e', 
'job2_test.e', 'job10_test.e', 'job27_train.e', 'job3_test.e', 'job15_test.e', 'job5_train.e', 'job0_test.e', 'job12_test.e', 'job44_test.e', 
'job28_train.e', 'job0_train.e', 'job4_train.e', 'job33_train.e', 'job41_test.e', 'job34_train.e', 'job31_test.e', 'job21_test.e', 
'job24_test.e', 'job48_test.e', 'job19_test.e', 'job8_train.e', 'job38_train.e', 'job44_train.e', 'job1_train.e', 'job36_train.e', 
'job13_train.e', 'job35_test.e', 'job19_train.e', 'job33_test.e', 'job9_test.e', 'job28_test.e', 'job26_train.e', 'job31_train.e', 
'job22_test.e', 'job38_test.e', 'job48_train.e', 'job2_train.e', 'job29_train.e', 'job6_train.e', 'job14_train.e', 'job11_train.e', 
'job46_train.e', 'job8_test.e', 'job12_train.e']
 Are you sure you want to continue? [y/N]

No parsable version

every time the lib is installed, version==0.0.0 appears.
Any idea of how to implement this @vuillaut

move slurm jobs as well as well as logs once workflow has finished

If workflow reaches the end, move them to analysis_logs, I suggest.

Reprocess dl1 files

Is there any way to reprocess dl1 files (e.g. with different cleaning)? ctapipe-stage1 can be used on dl1 files the same way you would use it on r0 files, so its easy in that case. I dont know about lstchain and hiperta though

improve reorganizer hiperta v300 to lstchain v060

Current reorganizer proudces 4 temp files per reorganization. An improved version of the code is in this branch, although astropy is doing very weird things when trying to copy tables into existing hdf5 files...

documentation modules

Some modules docstring appear as they should in the documentation, some don't...

Add CI test for the workflow

Explain how leftover files and directories are meant to be interpreted if jobs fail

If something fails, the user is left with a lot of files and directories. It took me a while to figure out what to do with that, maybe we should add to the README how to interpret these directories (e.g. running_analysis)

Use Path library, please

redundant libraries

why using from lstchain.io.data_management import * when the data_management library of the LST_scripts repo is going to be used anyway ?
Just in case I'm loosing a point - can this be modified so that only the library from LST_scripts is used ?

Fix installation

After installation, the onsite scripts are available, but they expect some files, e.g. the core_list.sh to lie in the working directory.
I am not sure if there is a way to link them in the installation process or if they would need to be generated by the python scripts ( The source env is already changed that way, so thats probably the easiest way)

batch_dl1_utils-merge_and_copy.py on the other hand just needs a function, that can be imported

get version from GitHub

Training job_logs not saved correctly

Both training and testing jobs logs are named in the same way (i.e., job#.e and job#.o).

Training jobs are sent before and there is no counter that takes into account these n_num jobs sent, so that when the testing jobs are sent, they crush the training logs.

Will take care of updating the scripts.