install this package as a python module with pip via:
pip install git+https://github.com/erikbuh/erum_data_data.git
The essential function so far is the "load" function to load the training and testing datasets. The datasets features "X" are returned as a list of numpy arrays. The labels are returend directly as a numpy array.
import erum_data_data as edd
# loading training data into RAM (downloads dataset first time)
X_train, y_train = edd.load('top', dataset='train', cache_dir = './', cache_subdir = 'datasets')
# loading test data into RAM (downloads dataset first time)
X_test, y_test = edd.load('top', dataset='test', cache_dir = './', cache_subdir = 'datasets')
Here a subfolder ./datasets is created. The datasets take up a total disk space of about 2.4 GB. For loading the training datasets a free RAM of at at least 5 GB is necessary (depending on the dataset).
Included datasets at the moment with the tags:
1: 'top', 2: 'spinodal', 3: 'EOSL', 4: 'airshower', 5: 'belle'
An description of the datasets can be printed via the function:
edd.print_description('top')
Some example plots can be found in the notebooks in the example folder.
A simple model implementation can be found in the folder 'simple_model'. To run the notebook one needs to additionally install at least tensorflow version >= 2.0 and scikit >= 0.22.
The original datasets can be found here: