Code Monkey home page Code Monkey logo

move's Introduction

MOVE (Multi-Omics Variational autoEncoder)

PyPI version Documentation Status

The code in this repository can be used to run our Multi-Omics Variational autoEncoder (MOVE) framework for integration of omics and clinical variabels spanning both categorial and continuous data. Our approach includes training ensemble VAE models and using in silico perturbation experiments to identify cross omics associations. The manuscript has been published in Nature Biotechnology:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of drug–omics associations in type 2 diabetes with generative deep-learning models. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x

We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT project containing 789 newly diagnosed T2D patients. The cohort and data creation is described in Koivula et al. and Wesolowska-Andersen et al.. For the analysis we included the following data:

Multi-omics data sets:

Genomics
Transcriptomics
Proteomics
Metabolomics
Metagenomics

Other data sets:

Clinical data (blood measurements, imaging data, ...)
Questionnaire data (diet etc)
Accelerometer data
Medication data

Installation

Installing MOVE package

MOVE is written in Python and can therefore be installed using pip:

>>> pip install move-dl

Requirements

MOVE should run on any environmnet where Python is available. The variational autoencoder architecture is implemented in PyTorch.

The training of the VAEs can be done using CPUs only or GPU acceleration. If you do not have powerful GPUs available, it is possible to run using only CPUs. For instance, the tutorial data set consisting of simulated drug, metabolomics and proteomics data for 500 individuals runs fine on a standard macbook.

Note: The pip installation of move-dl does not setup your local GPU automatically

The MOVE pipeline

MOVE has five-six steps:

01. Encode the data into a format that can be read by MOVE
02. Finding the right architecture of the network focusing on reconstruction accuracy
03. Finding the right architecture of the network focusing on stability of the model
04. Use model, determined from steps 02-03, to create and analyze the latent space
05. Identify associations between a categorical and continuous datasets
05a. Using an ensemble of VAEs with the t-test approach
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
06. If both 5a and 5b were run select the overlap between them

How to run MOVE

Please refer to our documentation for examples and tutorials on how to run MOVE.

Additionally, you can copy this notebook and follow its instructions to get familiar with our pipeline.

Data sets

DIRECT data set

The data used in notebooks are not available for testing due to the informed consent given by study participants, the various national ethical approvals for the study, and the European General Data Protection Regulation (GDPR). Therefore, individual-level clinical and omics data cannot be transferred from the centralized IMI-DIRECT repository. Requests for access to summary statistics IMI-DIRECT data, including those presented here, can be made to [email protected]. Requesters will be informed on how summary-level data can be accessed via the DIRECT secure analysis platform following submission of appropriate application. The IMI-DIRECT data access policy is available here.

Simulated and publicaly available data sets

We have therefore provided two datasets to test the workflow: a simulated dataset and a publicly-available maize rhizosphere microbiome data set.

Citation

To cite MOVE, use the following information:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of drug–omics associations in type 2 diabetes with generative deep-learning models. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x

move's People

Contributors

enryh avatar jakobnissen avatar ri-heme avatar rosaallesoe avatar simonrasmu avatar valentas1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

move's Issues

02_optimize_reconstruction write out results

In 02_optimize_reconstruction make it write out a table format with what was the best models. Currently only saved in numpy npy. As the plotting often fails, this could be where one can see what were actually the best models

Clean-up config files

Moving to the task-based format introduced in #37, the configurations files need to be cleaned up:

  • Remove unused fields in move/conf/data/base_data (currently marked with # DEPRECATE).
  • Remove unused config groups: tuning_reconstruction, tuning_stability, training_latent, training_association *.
  • Remove any duplicate fields.
  • Remove any other unused fields (e.g., name in main).

* These will be re-implemented as configs of the task config group. (See #38 and #40)

Error during __tune_reconstruction: score in calculate_accuracy (metrics.py) cannot be calculated

I'm currently training MOVE on proteomics data in combination with lots of categorical data (with a few missing values). My input data is structured as instructed (1 Feature/File, missing values = NA).

When MOVE tries to calculate the score during reconstruction tuning, it struggles with the missing values since num_features has the original length (including masked entries) but y_true and y_pred have lengths n - n_masked. Excluding all categorical features containing missing values results in a successful run. What is the correct way to fix that error?
analysis\metrics.py

The Error thrown is below:

Error executing job with overrides: ['task.batch_size=10', 'task.model.num_hidden=[500]', 'task.training_loop.num_epochs=40', 'experiment=mpn__tune_reconstruction']
Traceback (most recent call last):
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\__main__.py", line 38, in main
    move.tasks.tune_model(config)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\tasks\tune_model.py", line 249, in tune_model
    _tune_reconstruction(task_config)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\tasks\tune_model.py", line 216, in _tune_reconstruction
    accuracy = calculate_accuracy(cat[mask], cat_recon)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\analysis\metrics.py", line 36, in calculate_accuracy
    scores = np.ma.compressed(np.sum(y_true == y_pred, axis=1)) / num_features
ValueError: operands could not be broadcast together with shapes (118,) (131,)

Two functions in 05_identify_associations does not work

Functions/calls

make_files(collected_overlap, groups, con_list_concat, processed_data_path, recon_average_corr_all_indi_new, con_names, continuous_names, drug_h, drug, all_hits, types, version)

and

df_indi_var = get_inter_drug_variation(con_names, drug_h, recon_average_corr_all_indi_new, groups, collected_overlap, drug, con_list_concat, processed_data_path, types)

Currently does not work

05_identify_associations: write results in tsv table format

Currently, 05_identify_associations does not write out significant hits in a table format, but rather individual files. Change to table format with all that could include:

Drug feature, omics dataset, omics feature, cor p-value, estimated change, confidence interval change

Organize code

I suggest the following way to organize the files and folders of the project. As discussed earlier with @valentas1, we can move the diff code snippets into their own modules (e.g., move.analysis and move.models).

.
└── move/
    ├── .github/workflows   <= CI workflows to lint and test code
    ├── conf/               <= Configuration files
    │   ├── data/           <= Default data configuration (e.g., batch size)
    │   ├── model/          <= Default model configuration (e.g., layers)
    │   └── training/       <= Default model training (e.g., epochs, steps)
    │
    ├── notebooks/          <= Jupyter notebooks (step-by-step tutorials)
    ├── src/                <= Source code
    │   └── move/
    │       ├── analysis/   <= Scripts for post-analysis (e.g. feature
    │       │                  importance)
    │       ├── data/       <= Scripts to encode data, create datasets, and 
    │       │                  data loaders
    │       ├── models/     <= Architectures and custom layers
    │       └── viz/        <= Scripts to create visualization
    │
    ├── tests/              <= Unit tests
    ├── LICENSE             <= License
    ├── README.md           <= README.md
    ├── requirements.txt    <= Requirements file for reproducing the analysis
    └── setup.py            <= Setup script

QoL changes

The following is a list of low-priority changes to do to MOVE:

  • #39
  • Rename processed_data_path in config to results_path
  • Add type hints to methods in vae.py module

Add more user info on step 05

Add more information when testing associations in step 05. E.g. Testing: <feature X> where in our case it would be drugs

NA class is created even without NA in categorical data

César and I are applying MOVE to a new dataset.
For the categorical data, we are having issues with the function encode_cat(). We realized the following:

Not an issue:

  1. The class NA (0,...,0) is added even though there are no NAs.

Issues:
2) The function np.isnan(), which is used to check if the array with unique classes contains a NaN class, does not work if the class numbers are introduced as strings. (TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
3) If one class is not represented in the data of a new user ( no patient of class 3,e.g., in a class 1-9 setting), the one-hot encoding is rearranged. This leads to the creation of keys 1-8 plus the nan key with the vector (0...0) assigned. This induces problems at the end of the function, when calling e.g. encodings[lab] for a patient belonging to class 9 (lab=9): the key 9 does not exist anymore in the encodings dict.

Thanks!

Re-structure module 4 (analyze latent)

To match the style of modules 1 and 5 (encode data and identify associations, respectively), this module needs to be refactored.

Task List

  • Define config schema (add visualization targets as fields)
  • Create new module and function
  • Create example config YAML
  • Add to main entry point

Set-up CI workflow

Set up a CI workflow using GitHub Actions to automatically lint code and do unit testing.

Use Hydra multirun mode to do optimization tasks

The hyperparameter optimization can be re-adapted to make use of Hydra's multirun mode (and/or sweeper).

Overview of tasks:

Below a list of tasks to convert these two modules from its current state:

Module and schema

  • Define new schemas for this task: OptimizeTaskConfig (inherits from TaskConfig) class in move.conf.schema.
  • Create new move.tasks.optimize_hyperparameters module and function.
  • Create example experiment config for hyperparameter search. See example YAML in hydra-app-example.

Function

For the function optimize_hyperparamters:

  • Log job number
  • Create objective function TSV (if it doesn't exist)
  • Load pre-processed data
  • Split data into training/test sets
  • Make dataloaders
  • Train model
  • Record values of objective function (append to TSV)

Misc.

  • Repeat for second optimization, taking a similar approach to the "identify associations" task (see move.tasks.identify_associations), i.e., detect type of task and change the value of the objective function (from accuracy to stabillity).
  • Add to move.__main__.
  • Re-format tutorial files for random.small dataset.

Open Questions

  • Is the best set of hyperparameters automatically selected based on the objective function value (e.g., reconstruction accuracy)?
    • If so, I would suggest we also implement the Optuna plugin with a smarter sampler than greedy grid search.
    • If not, then I suggest just saving the results, and then providing some visualization functions so the users can decide on their hyperparameter set.
  • #22

Resources:

04_analyze_latent fails due to hardcoded values

Names are hardcoded (Clinical, Genomics, etc) and should instead be taken from the data input

python -m move.04_analyze_latent

Error executing job with overrides: []
Traceback (most recent call last):
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/move/04_analyze_latent/__main__.py", line 74, in main
    plot_reconstruction_distribs(processed_data_path, cat_total_recon, all_values)
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/move/utils/visualization_utils.py", line 285, in plot_reconstruction_distribs
    df = pd.DataFrame(cat_total_recon + all_values, index = ['Clinical\n(categorical)', 'Genomics', 'Drug data', 'Clinical\n(continuous)', 'Diet +\n wearables','Proteomics','Targeted\nmetabolomics','Untargeted\nmetabolomics', 'Transcriptomics', 'Metagenomics'])
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/frame.py", line 729, in __init__
    mgr = arrays_to_mgr(
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 125, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 628, in _homogenize
    com.require_length_match(val, index)
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/common.py", line 557, in require_length_match
    raise ValueError(
ValueError: Length of values (3) does not match length of index (10)

Remove number of samples from categorical shapes

MOVEDataset stores categorical_shapes and continuous_shapes. The former is a list of tuples containing three numbers: number of samples, number of features, number of categories per feature. The first item is not needed, and it is inconsistent with continuous_shapes (which does not store the number of samples, but only number of features).

Several functions will be affected by this change.

Simplify plotting functions

Is it possible to simplify the plotting functions to they work for all cases? Ie one approach could be to make the plots as individual plots and not composite figures

Best latent representation

We should discuss/analyse how the best latent representation is chose at the end of step 03 stability

Expected MOVE tutorial runtime?

Hi, congrats on this great tool.

I am currently following the tutorial and trying to familiarize myself with MOVE.
How long should I expect the tutorial runtime to be using the random_small dataset?

System specifications:
RAM: 16 GB
Processor: 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz, 2611 Mhz, 4 Core(s), 8 Logical Processor(s)

Many thanks,
Foteini

About the perturbation

Hello, I saw in your article that samples that have received drug treatment will be eliminated when perturbing. What I want to ask is if I use the move tool to disturb, do I need to take the sample that received a certain drug by myself? Remove it, or the current code will automatically remove it for me.
Thanks.

Use of log-file

We should write most information to a log-file instead of the screen.

NNFC Workshop

  • First give an overview of the steps that the student will do

    • Ie which tasks and what they do
  • Are they using CPU or GPUs? Can they use GPUs?

  • Mention that we only use a subset of the original data

  • Explicitly write (in the beginning) what is it that we want to achieve. (Learning objectives)

  • You sometimes use hmp2 and other times ibd for data and configs. This is confusing

  • 2. Inspecting the data

    • 1. How can i inspect the data? Please give directions
    • 2. In the data folder there are all ibd and hmp2 files. MGX, MBX and MTX. Explain that they are for two different experiments (if they are)?
    • 3. Indicate when you make people run commands what it will output (e.g. 2 figures below etc)
    • 4. What are the differences of the two Mutual Information plots? And you only show for hmp2.mbx data whereas the PCC plots are ibd.mbx?
  • 3. Encoding the data

    • 1. Be more informative on how the data is encoded. For instance, most people will not know what a "binary bit flag" is. Instead write for instance, 0 and 1.
    • 2. Before/after normalization. Give more explanation of what you see and what the effect of this is. Why does it help to do the normalization? Also it is not possible to see legend on z-axis
    • 3. Explain what is meant by shape of the datasets and what the output means e.g. "(283,1 ,2)"
  • 4. Hyperparameter optimization

    • 1. Specify which hyperparameters you are testing and why
    • 2. Indicate where this can be changed
    • 3. "The output of the previous command is a TSV table called ..." Can you indicate how to inspect it?
    • 4. The output of the hyperparameter training - why and what does the plots mean? Ie. What is reconstruction accuracy and what do we expect?
    • 5. Summaries this a bit. Which hyperparameters do we end up using?
  • 5. Latent space ...

    • 1. Which architecture is used for training? Is it the architecture from point 4? Also what are the actual numbers?
    • 2. Indicate expected runtime when expected to be longer than just very quick, e.g. for model training
    • 3. A bit more explanation of the plots. Can you print out in the output a brief title/explanation of the plots?
    • 4. Reconstruction plot comes after latent scatter plots in the output, but are described before in the text above
    • 5. Much more explanation of the individual plots, you must guide the user
    • 6. Also what does SHAP mean (intuitively, not in mathematical terms)
  • 6. Identifying associations between features

    • 1. Mention that you only use the Bayes form? (if that is the only one you run)
    • 2. Print associations in a cell as a table
    • 3. Which perturbation are you doing? Please indicate this before
  • 8. "Possible Issues" is numbered 8, but after point 9

Shape of original_input and reconstruction do not match

Running MOVE with two continuous datasets works, but adding a third results in the error below (created with the maize dataset:
Adding the values of the third file to the second file runs without error.

Error executing job with overrides: ['task.batch_size=10', 'task.model.num_hidden=[500]', 'task.training_loop.num_epochs=40', 'experiment=maize__tune_reconstruction']
Traceback (most recent call last):
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\__main__.py", line 38, in main
    move.tasks.tune_model(config)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\tasks\tune_model.py", line 249, in tune_model
    _tune_reconstruction(task_config)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\tasks\tune_model.py", line 230, in _tune_reconstruction
    cosine_sim = calculate_cosine_similarity(con[mask], con_recon)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\analysis\metrics.py", line 55, in calculate_cosine_similarity
    raise ValueError(
ValueError: Original input (4251, 716) and reconstruction (4251, 713) shapes do not match.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I reproduced the error with the Maize dataset by adding a third dataset constructed with R:

### Testing maize (does not work with maize_rnorm.tsv file)
maize_ids <- read.table('MOVE_tutorial/maize/data/maize_ids.txt')
maize_ids$V2 <- rnorm(nrow(maize_ids),10, 2)
maize_ids$V3 <- rpois(nrow(maize_ids),100)

write.table(maize_ids, 'MOVE_tutorial/maize/data/maize_rnorm.tsv', row.names = F, quote = F, sep = '\t')

#Adding similar values to existing file works
maize_microbiome <- read.table('MOVE_tutorial/maize/data/maize_metadata.tsv')
maize_microbiome$V2 <- rnorm(nrow(maize_microbiome),10, 2)
maize_microbiome$V3 <- rpois(nrow(maize_microbiome),100)

write.table(maize_microbiome , 'MOVE_tutorial/maize/data/maize_metadata2.tsv', quote = F, sep = '\t')

Reduce memory of step 05

Reduce memory of step 05 by changing from dicts to another data structure. Seems to be in the function train_model_association when it saves reconstruction results:

---> Works:    with open(path + "results/results_" + version + ".npy", 'wb') as f:
        np.save(f, results)
---> File is truncated   with open(path + "results/results_recon_" + version + ".npy", 'wb') as f:
        np.save(f, recon_results)
    with open(path + "results/results_groups_" + version + ".npy", 'wb') as f:
        np.save(f, groups)
    with open(path + "results/results_recon_mean_baseline_" + version + ".npy", 'wb') as f:
        np.save(f, mean_bas)
    with open(path + "results/results_recon_no_corr_" + version + ".npy", 'wb') as f:
        np.save(f, recon_results_1)

Ie when it tries to save the recon_results as results/results_recon_v1.npy
When i try to load it I get EOFError:

recon_results = np.load(processed_data_path + "results/results_recon_" + version + ".npy", allow_pickle=True).item()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kjv627/miniforge3/envs/move/lib/python3.9/site-packages/numpy/lib/npyio.py", line 430, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/Users/kjv627/miniforge3/envs/move/lib/python3.9/site-packages/numpy/lib/format.py", line 747, in read_array
    array = pickle.load(fp, **pickle_kwargs)
EOFError: Ran out of input

So it seems it wasnt written correctly

tuning_stability.yaml

When step 02 is complete add repeats: 5 in the tuning_stability.yaml, this could potentially be decreased to 3 or 4 to save computational time?

Could learning rate be reduced to only 0.0001 and thus remove 1e-5?

Setting groups

Currently there are no way of loading in groups? These are used to define grouping of drugs for the association analysis. Probably not necessary for other datasets, but for reproduction of DIRECT results.

Update tutorial text

Changes to tutorial:

  • Indicate that one should do hyperparameter optimization before running analyze latent?
  • How to download tutorial data if not cloning via git (perhaps from link/wget on Zenodo or similar)
  • Indicate where and which output files one should look at. E.g. from Tuning the hyperparameters indicate where the output will be
  • in a how-to-run-on-my-data: Indicate what should be done to run on my own data, e.g. as much help as possible will make it easier for beginners to run it
  • Indicate how long time it took to run the tutorial for you using CPUs, e.g. so users know they dont need GPU to run
  • Add that log-file with more details is written as logs/identify_associations.log (for instance)
  • Indicate that identify_associations_ttest will run 4 models each with different latent size. Each model is run 10 times for a total of 40 models
  • Is it possible to add a small helper script to check the overlap of significant features and the truth data (probably done in notebook)
  • Note that running bayes association overwrites the output of ttest association
  • Front readme: Indicate how to install from git clone

plot_reconstruction

plot_reconstruction function in visualization_utils.py has the headers hard-coded so it won't work if other or only a subset af data are used:

  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/move/04_analyze_latent/__main__.py", line 69, in main
    plot_reconstruction_distribs(processed_data_path, cat_total_recon, all_values)
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/move/utils/visualization_utils.py", line 285, in plot_reconstruction_distribs
    df = pd.DataFrame(cat_total_recon + all_values, index = ['Clinical\n(categorical)', 'Genomics', 'Drug data', 'Clinical\n(continuous)', 'Diet +\n wearables','Proteomics','Targeted\nmetabolomics','Untargeted\nmetabolomics', 'Transcriptomics', 'Metagenomics'])
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/frame.py", line 729, in __init__
    mgr = arrays_to_mgr(
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 125, in arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 628, in _homogenize
    com.require_length_match(val, index)
  File "/Users/kjv627/miniforge3/envs/move_dev/lib/python3.9/site-packages/pandas/core/common.py", line 557, in require_length_match
    raise ValueError(
ValueError: Length of values (3) does not match length of index (10)

Issue of "move-dl"cannot be found after I installed "MOVE" successfully.

Dear Sir,
I encountered an issue at a certain step below, where the terminal responded with an error message stating that it could not find "move-dl". I would be immensely grateful if you could provide some guidance or solutions to rectify this issue.
From here:
on the parent directory of the config folder (in this example, it is the tutorial folder), and proceed to run:

cd tutorial
move-dl data=random_small task=encode_data — Cannot find move-dl

Your help would significantly contribute to my understanding and application of your work. I appreciate your time in advance.

Set up configuration system

I suggest using Hydra to set up a configuration file system, so users can easily modify and keep track of hyperparameters and other settings with a file (instead of manually typing them on the command line or a notebook).

Resources:

defining entrypoints

I tried out of curiosity to define an entrypoint in setup.cfg, and I ran in an error regarding the uncommon module naming convention

[options.entry_points]
console_scripts =
    move-encode-data = move.01_encode_data.__main__:main
move-encode-data --help
Traceback (most recent call last):
  File "C:\Users\enryh\anaconda3\envs\move\lib\runpy.py", line 189, in _run_module_as_main
    mod_name, mod_spec, code = _get_main_module_details(_Error)
  File "C:\Users\enryh\anaconda3\envs\move\lib\runpy.py", line 223, in _get_main_module_details
    return _get_module_details(main_name)
  File "C:\Users\enryh\anaconda3\envs\move\lib\runpy.py", line 129, in _get_module_details
    spec = importlib.util.find_spec(mod_name)
  File "C:\Users\enryh\anaconda3\envs\move\lib\importlib\util.py", line 103, in find_spec
    return _find_spec(fullname, parent_path)
  File "<frozen importlib._bootstrap>", line 945, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1439, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1411, in _get_spec
  File "<frozen zipimport>", line 170, in find_spec
  File "<frozen importlib._bootstrap>", line 431, in spec_from_loader
  File "<frozen importlib._bootstrap_external>", line 741, in spec_from_file_location
  File "<frozen zipimport>", line 229, in get_filename
  File "<frozen zipimport>", line 760, in _get_module_code
  File "<frozen zipimport>", line 689, in _compile_source
  File "C:\Users\enryh\anaconda3\envs\move\Scripts\move-encode-data.exe\__main__.py", line 4
    from move.01_encode_data import main
                ^
SyntaxError: invalid decimal literal

Changing the module name from 01_encode_data to encode_data (and changing setup.cfg accordingly, solves the problem

>>> move-encode-data --help
C:\Users\enrhy\Documents\repos\MOVE\src\move\encode_data\__main__.py:6: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="../conf", config_name="main")
__main__ is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)

data: main
model: vae
training: main
training_association: main
training_latent: main
tuning_reconstruction: main
tuning_stability: main


== Config ==
Override anything in the config (foo.bar=value)

name: MOVE
seed: 123456
data:
  user_config: data.yaml
  na_value: NA
  raw_data_path: data/
  interim_data_path: interim_data/
  processed_data_path: processed_data/
  headers_path: headers/
  version: v1
  ids_file_name: baseline_ids.txt
  ids_has_header: true
  ids_colname: 0
  categorical_inputs:
  - name: diabetes_genotypes
    weight: 1
  - name: baseline_drugs
    weight: 1
  - name: baseline_categorical
    weight: 1
  continuous_inputs:
  - name: baseline_continuous
    weight: 2
  - name: baseline_transcriptomics
    weight: 1
  - name: baseline_diet_wearables
    weight: 1
  - name: baseline_proteomic_antibodies
    weight: 1
  - name: baseline_target_metabolomics
    weight: 1
  - name: baseline_untarget_metabolomics
    weight: 1
  - name: baseline_metagenomics
    weight: 1
  data_of_interest: baseline_drugs
  categorical_names: ${names:${data.categorical_inputs}}
  continuous_names: ${names:${data.continuous_inputs}}
  categorical_weights: ${weights:${data.categorical_inputs}}
  continuous_weights: ${weights:${data.continuous_inputs}}
  data_features_to_visualize_notebook4:
  - drug_1
  - clinical_continuous_2
  - clinical_continuous_3
  write_omics_results_notebook5:
  - baseline_target_metabolomics
  - baseline_untarget_metabolomics
model:
  _target_: move.models.vae.VAE
  user_config: model.yaml
  seed: 1
  cuda: false
  lrate: 0.0001
  num_epochs: 500
  patience: 100
  kld_steps:
  - 20
  - 30
  - 40
  - 90
  batch_steps:
  - 50
  - 100
  - 150
  - 200
  - 250
  - 300
  - 350
  - 400
  - 450
tuning_reconstruction:
  user_config: tuning_reconstruction.yaml
  num_hidden:
  - 500
  - 1000
  num_latent:
  - 20
  - 50
  num_layers:
  - 1
  - 2
  dropout:
  - 0.1
  - 0.2
  beta:
  - 1.0e-05
  - 0.0001
  batch_sizes:
  - 10
  repeats: 1
  max_param_combos_to_save: 12
tuning_stability:
  user_config: tuning_stability.yaml
  num_hidden:
  - 500
  - 1000
  num_latent:
  - 20
  - 50
  num_layers:
  - 1
  dropout:
  - 0.1
  - 0.2
  beta:
  - 1.0e-05
  batch_sizes:
  - 10
  repeats: 5
  tuned_num_epochs: 250
training_latent:
  user_config: training_latent.yaml
  num_hidden: 500
  num_latent: 20
  num_layers: 1
  dropout: 0.1
  beta: 1.0e-05
  batch_sizes: 10
  tuned_num_epochs: 250
training_association:
  user_config: training_association.yaml
  num_hidden: 500
  num_latent:
  - 150
  - 200
  - 250
  - 300
  num_layers: 1
  dropout: 0.1
  beta: 1.0e-05
  batch_sizes: 10
  repeats: 10
  tuned_num_epochs: 250


Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.