Code Monkey home page Code Monkey logo

bes-edgeml-models's Introduction

BES EdgeML project

[This repo is archived. The active version is at https://github.com/PlasmaControl/bes-ml.]

BES EdgeML project is an effort to develop machine learning (ML) models for the real-time identification of edge-localized-mode (ELM) events and the turbulence properties of confinement regimes using the 2D Beam Emission Spectroscopy (BES) system at DIII-D. The “edge ML” models will be deployed on a high-throughput FPGA accelerator for integration in the real-time plasma control system (PCS).

The whole project can be structured as follows:

project_structure

The code consists of the PyTorch implementations for various models which are located inside the models/ directory.

data_preprocessing/ consists of scripts responsible for the data preparation from raw BES data for training and evaluation.

notebooks/ directory contains various jupyter notebooks which are used for experimentation with the data preprocessing pipelines and calculating ROC plots.

src/ directory contains scripts for various utility functions, data preprocessing and boilerplate code used for training and evaluation. One of the key features of the data preprocessing pipelines is the way inputs and labels are created. Inputs are 3D-tensors where the leading dimension contains time-steps according to signal_window_size and the last two dimensions contain 8x8 BES spatial data comprising the output from the 64 channels of the detector. This can be understood in more detail by the following fantastic figure created by Dr. David Smith.

signal_window

options/ directory contains helper classes for various command line arguments which can be used to change the various parameters ranging from data-preprocessing to model training and inference.

model_checkpoints/ contains the saved models which can be used for inference.

archives/ directory contains previous code files which are implemented in TensorFlow. It also contains plots generated earlier using PyTorch. It is just for reference and is not in active development. archives/model_tools/ is a Python module and the primary set of tools for training ML models. archives/hpo/ is a directory with python modules and Slurm scripts to perform hyper-parameter optimization with Optuna. archives/multitrain/ is out-of-date but similar to hpo/. The scripts and python modules in multi-train/ are intended to perform multiple training runs for a single set of model parameters, for example, the "best" parameters from HPO.

train.py and analyze.py are the main scripts used for training, and inference respectively.

Getting started

There are certain command line arguments which should be passed along with the training script. You can find more help about these by running

python train.py --help

or

python analyze.py --help

Train the model

Training a model can be easily done by running something like (from the project directory) -

 python train.py --input_file labeled-elm-events.hdf5 --device cuda --model_name multi_features --data_preproc unprocessed --signal_window_size 512 --label_look_ahead 500 --normalize_data --n_epochs 5 --max_elms -1 --filename_suffix _dwt_db4_low_lr --raw_num_filters 48 --fft_num_filters 48 --wt_num_filters 48 --dwt_wavelet db4 --dwt_level 9 --lr 0.0003 --weight_decay 0.0025

train.py script expects the input .hdf5 file to be stored in the data/ directory

Test the model

Testing can be done similarly. The command line arguments would look like-

 python analyze.py --device cuda --model_name multi_features --data_preproc unprocessed --signal_window_size 512 --label_look_ahead 500 --truncate_inputs --normalize_data --n_epochs 20 --max_elms -1 --multi_features --use_fft --plot_data --show_metrics

If you just want to run train.py and analyze.py without saving any output files or plots, you can just add the flag --dry_run to either of the scripts.

Tracking the model and experimentation

All the parameters of interest during training are logged into a pickle file which can be used with Weights and Biases to track the experiments. More details can be found in train.py and wandb_manual_logs.py.

bes-edgeml-models's People

Contributors

lakshyamalhotra avatar drsmith48 avatar jeff-zimmerman avatar

Stargazers

 avatar Gopal M. avatar  avatar Prannav Arora avatar

Watchers

 avatar Azarakhsh avatar  avatar  avatar

bes-edgeml-models's Issues

refactored multi_feature_ds model

@LakshyaMalhotra - When you get a chance, please explore the refactored multi_features_ds model in the branch drsmith. I forked the files train_ds.py and multi_features_ds_model.py from the versions in main. All FFT and DWT calculations are performed at runtime in the forward() call, so preprocessing and storing CWT is no longer necessary. The DWT is efficient like FFT and has minimal memory footprint. I commented out all logic for arguments --multi_features and --use_fft, including in trainer.py and base_arguments.py. I added arguments for the multi_features_ds model in base_arguments.py. Raw, FFT, and DWT features are on by default, and they are turned off by setting, for example, --dwt_num_filters 0. For a small problem size (5 ELM events, signal_window_size=128, and 16 features each for raw/fft/dwt), the FFT and DWT calculations had no impact on epoch elapsed time and ran without a problem on my 16 GB Mac.

The DWT calculations use pytorch_wavelets. It's not on conda-forge, so do git clone and pip install.

sample data file for labeled ELM events

@jeff-zimmerman It appears that the sample data file sample_labeled_elm_events.hdf5 is missing from the refactor branch, as best I can tell. Can you add it back, and can you investigate how the tests passed with the data file missing? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.