Code Monkey home page Code Monkey logo

molecule-generation's Introduction

MoLeR: A Model for Molecule Generation

CI license pypi python code style

This repository contains training and inference code for the MoLeR model introduced in Learning to Extend Molecular Scaffolds with Structural Motifs. We also include our implementation of CGVAE, but without integration with the high-level model interface.

Quick start

molecule_generation can be installed via pip, but it additionally depends on rdkit and (if one wants to use a GPU) on setting up CUDA libraries. One can get both through conda:

conda env create -f environment.yml
conda activate moler-env

Our package was tested with python>=3.7, tensorflow>=2.1.0 and rdkit>=2020.09.1; see the environment*.yml files for the exact configurations tested in CI.

To then install the latest release of molecule_generation, run

pip install molecule-generation

Alternatively, pip install -e . within the root folder installs the latest state of the code, including changes that were merged into main but not yet released.

A MoLeR checkpoint trained using the default hyperparameters is available here. This file needs to be saved in a fresh folder MODEL_DIR (e.g., /tmp/MoLeR_checkpoint) and be renamed to have the .pkl ending (e.g., to GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl). Then you can sample 10 molecules by running

molecule_generation sample MODEL_DIR 10

See below for how to train your own model and run more advanced inference.

Troubleshooting

Q: Installing tensorflow on my system does not work, or it works but GPU is not being used.

A: Please refer to the tensorflow website for guidelines. In particular, with recent versions of tensorflow one may get a "libdevice not found" error; in that case please follow the instructions at the bottom of this page.

Q: My particular combination of dependency versions does not work.

A: Please submit an issue and default to using one of the pinned configurations from environment-py*.yml in the meantime.

Q: I am in China and so the figshare checkpoint link does not work for me.

A: You can try this link instead.

Workflow

Working with MoLeR can be roughly divided into four stages:

  • data preprocessing, where a plain text list of SMILES strings is turned into *.pkl files containing descriptions of the molecular graphs and generation traces;
  • training, where MoLeR is trained on the preprocessed data until convergence;
  • inference, where one loads the model and performs batched encoding, decoding or sampling; and (optionally)
  • fine-tuning, where a previously trained model is fine-tuned on new data.

Additionally, you can visualise the decoding traces and internal action probabilities of the model, which can be useful for debugging.

Data Preprocessing

To run preprocessing, your data has to follow a simple GuacaMol format (files train.smiles, valid.smiles and test.smiles, each containing SMILES strings, one per line). Then, you can preprocess the data by running

molecule_generation preprocess INPUT_DIR OUTPUT_DIR TRACE_DIR

where INPUT_DIR is the directory containing the three *.smiles files, OUTPUT_DIR is used for intermediate results, and TRACE_DIR for final preprocessed files containing the generation traces. Additionally, the preprocess command accepts command-line arguments to override various preprocessing hyperparameters (notably, the size of the motif vocabulary). This step roughly corresponds to applying Algorithm 2 from our paper to each molecule in the input data.

After running the above, you should see an output similar to

2022-03-10 11:22:15,927 preprocess.py:239 INFO 1273104 train datapoints, 79568 validation datapoints, 238706 test datapoints loaded, beginning featurization.
2022-03-10 11:22:15,927 preprocess.py:245 INFO Featurising data...
2022-03-10 11:22:15,927 molecule_dataset_utils.py:261 INFO Turning smiles into mol
2022-03-10 11:22:15,927 molecule_dataset_utils.py:79 INFO Initialising feature extractors and motif vocabulary.
2022-03-10 11:44:17,864 motif_utils.py:158 INFO Motifs in total: 99751
2022-03-10 11:44:25,755 motif_utils.py:182 INFO Removing motifs with less than 3 atoms
2022-03-10 11:44:25,755 motif_utils.py:183 INFO Motifs remaining: 99653
2022-03-10 11:44:25,764 motif_utils.py:190 INFO Truncating the list of motifs to 128 most common
2022-03-10 11:44:25,764 motif_utils.py:192 INFO Motifs remaining: 128
2022-03-10 11:44:25,764 motif_utils.py:199 INFO Finished creating the motif vocabulary
2022-03-10 11:44:25,764 motif_utils.py:200 INFO | Number of motifs: 128
2022-03-10 11:44:25,764 motif_utils.py:203 INFO | Min frequency: 3602
2022-03-10 11:44:25,764 motif_utils.py:204 INFO | Max frequency: 1338327
2022-03-10 11:44:25,764 motif_utils.py:205 INFO | Min num atoms: 3
2022-03-10 11:44:25,764 motif_utils.py:206 INFO | Max num atoms: 10
2022-03-10 11:44:25,862 preprocess.py:255 INFO Completed initializing feature extractors; featurising and saving data now.
 Wrote 1273104 datapoints to /guacamol/output/train.jsonl.gz.
 Wrote 79568 datapoints to /guacamol/output/valid.jsonl.gz.
 Wrote 238706 datapoints to /guacamol/output/test.jsonl.gz.
 Wrote metadata to /guacamol/output/metadata.pkl.gz.
(...proceeds to compute generation traces...)

After the preprocessed graphs are saved into OUTPUT_DIR, they will be turned into concrete generation traces, which is typically the most compute-intensive part of preprocessing. During that part, the preprocessing code may print errors, noting molecules that could not have been parsed or failed other assertions; MoLeR's preprocessing is robust to such cases, and will simply skip any problematic samples.

Training

Having stored some preprocessed data under TRACE_DIR, MoLeR can be trained by running

molecule_generation train MoLeR TRACE_DIR

The train command accepts many command-line arguments to override training and architectural hyperparameters, most of which are accessed through passing --model-params-override. For example, the following trains a MoLeR model using GGNN-style message passing (instead of the default GNN_Edge_MLP) and using fewer layers in both the encoder and the decoder GNNs:

molecule_generation train MoLeR TRACE_DIR \
    --model GGNN \
    --model-params-override '{"gnn_num_layers": 6, "decoder_gnn_num_layers": 6}'

As tf2-gnn is highly flexible, MoLeR supports a vast space of architectural configurations.

After running molecule_generation train, you should see an output similar to

(...tensorflow messages, hyperparameter dump...)
Initial valid metric:
Avg weighted sum. of graph losses:  122.1728
Avg weighted sum. of prop losses:   0.4712
Avg node class. loss:                 35.9361
Avg first node class. loss:           27.4681
Avg edge selection loss:              1.7522
Avg edge type loss:                   3.8963
Avg attachment point selection loss:  1.1227
Avg KL divergence:                    7335960.5000
Property results: sa_score: MAE 11.23, MSE 1416.26 (norm MAE: 13.89) | clogp: MAE 10.87, MSE 4620.69 (norm MAE: 5.98) | mol_weight: MAE 407.42, MSE 185524.38 (norm MAE: 3.70).
   (Stored model metadata and weights to trained_model/GNN_Edge_MLP_MoLeR__2022-03-01_18-15-14_best.pkl).
(...training proceeds...)

By default, training proceeds until there is no improvement in validation loss for 3 consecutive mini-epochs, where a mini-epoch is defined as 5000 training steps; this can be controlled through the --patience flag and the num_train_steps_between_valid model parameter, respectively.

Inference

After a model has been trained and saved under MODEL_DIR, we provide two ways to load it: from CLI or directly from Python. Currently, CLI-based loading does not expose all useful functionalities, and is mostly meant for simple tests.

To sample molecules from the model using the CLI, simply run

molecule_generation sample MODEL_DIR NUM_SAMPLES

and, similarly, to encode a list of SMILES stored under SMILES_PATH into latent vectors, and store them under OUTPUT_PATH

molecule_generation encode MODEL_DIR SMILES_PATH OUTPUT_PATH

In all cases MODEL_DIR denotes the directory containing the model checkpoint, not the path to the checkpoint itself. The model loader will only look at *.pkl files under MODEL_DIR, and expect there is exactly one such file, corresponding to the trained checkpoint.

You can load a model directly from Python via

from molecule_generation import load_model_from_directory

model_dir = "./example_model_directory"
example_smiles = ["c1ccccc1", "CNC=O"]

with load_model_from_directory(model_dir) as model:
    embeddings = model.encode(example_smiles)
    print(f"Embedding shape: {embeddings[0].shape}")

    # Decode without a scaffold constraint.
    decoded = model.decode(embeddings)

    # The i-th scaffold will be used when decoding the i-th latent vector.
    decoded_scaffolds = model.decode(embeddings, scaffolds=["CN", "CCC"])

    print(f"Encoded: {example_smiles}")
    print(f"Decoded: {decoded}")
    print(f"Decoded with scaffolds: {decoded_scaffolds}")

which should yield an output similar to

Embedding shape: (512,)
Encoded: ['c1ccccc1', 'CNC=O']
Decoded: ['C1=CC=CC=C1', 'CNC=O']
Decoded with scaffolds: ['C1=CC=C(CNC2=CC=CC=C2)C=C1', 'CNC(=O)C(C)C']

As shown above, MoLeR is loaded through a context manager. Behind the scenes, the following things happen:

  • First, an appropriate wrapper class is chosen: if the provided directory contains a MoLeRVae checkpoint, the returned wrapper will support encode, decode and sample, while MoLeRGenerator will only support sample.
  • Next, parallel workers are spawned, which await queries for encoding/decoding; these processes continue to live as long as the context is active. The degree of paralellism can be configured using a num_workers argument.

Fine-tuning

Fine-tuning proceeds similarly to training from scratch, with a few adjustments. First, data intended for fine-tuning has to be preprocessed accordingly, by running

molecule_generation preprocess INPUT_DIR OUTPUT_DIR TRACE_DIR \
    --pretrained-model-path CHECKPOINT_PATH

Where CHECKPOINT_PATH points to the file (not directory) corresponding to the model that will later be fine-tuned.

The --pretrained-model-path argument is necessary, as otherwise preprocessing would infer various metadata (e.g. set of atom/motif types) solely from the provided set of SMILES, whereas for fine-tuning this has to be aligned with the metadata that the model was originally trained with.

After preprocessing, fine-tuning is run as

molecule_generation train MoLeR TRACE_DIR \
    --load-saved-model CHECKPOINT_PATH \
    --load-weights-only

When fine-tuning on a small dataset, it may not be desirable to update the model until convergence. Training duration can be capped by passing --model-params-override '{"num_train_steps_between_valid": 100}' (to shorten the mini-epochs) and --max-epochs (to limit the number of mini-epochs).

Visualisation

We support two subtly different modes of visualisation: decoding a given latent vector, and decoding a latent vector created by encoding a given SMILES string. In the former case, the decoder runs as normal during inference; in the latter case we know the ground-truth input, so we teacher-force the correct decoding decisions.

To enter the visualiser, run either

molecule_generation visualise cli MODEL_DIR SMILES_OR_SAMPLES_PATH

to get the result printed as plain text in the CLI, or

molecule_generation visualise html MODEL_DIR SMILES_OR_SAMPLES_PATH OUTPUT_DIR

to get the result saved under OUTPUT_DIR as a static HTML webpage.

Code Structure

All of our models are implemented in Tensorflow 2, and are meant to be easy to extend and build upon. We use tf2-gnn for the core Graph Neural Network components.

The MoLeR model itself is implemented as a MoLeRVae class, inheriting from GraphTaskModel in tf2-gnn; that base class encapsulates the encoder GNN. The decoder GNN is instantiated as an external MoLeRDecoder layer; it also includes batched inference code, which forces the maximum likelihood choice at every step.

Authors

Note: as git history was truncated at the point of open-sourcing, GitHub's statistics do not reflect the degree of contribution from some of the authors. All listed above had an impact on the code, and are (approximately) ordered by decreasing contribution.

The code is maintained by the Generative Chemistry group at Microsoft Research, Cambridge, UK. We are hiring.

MoLeR was created as part of our collaboration with Novartis Research. In particular, its design was guided by Nadine Schneider, Finton Sirockin, Nikolaus Stiefl, as well as others from Novartis.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Style Guide

  • For code style, use black and flake8.
  • For commit messages, use imperative style and follow the semmantic commit messages template; e.g.

    feat(moler_decoder): Improve masking of invalid actions

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

molecule-generation's People

Contributors

anamika-yadav99 avatar kmaziarz avatar microsoftopensource avatar mmjb avatar sarahnlewis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

molecule-generation's Issues

Warning when using Load_model_from_directory(dir)

"WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.data_structures has been moved to tensorflow.python.trackable.data_structures. The old module will be deleted in version 2.11."

I would like my scripts to not just stop working one day.

where is the training datasets?

can you tell me where is the training datasets? or the link of the datasets,thank you very much!

(files train.smiles, valid.smiles and test.smiles,

Decode more than one molecule from one latent

Hello, thank you for sharing your work.

I have a question regarding the decoding process. I'm curious about how to decode more than one molecule from a specific latent space, especially those that are neighbors with high log likelihood relative to the returned molecule.

Currently, it seems that for the same latent input, the decoded output remains unchanged, and the sampling process doesn't seem to support starting from a specified latent position:

def sample(self, num_samples: int) -> List[str]:
"""Sample SMILES strings from the model.
Args:
num_samples: Number of samples to return.
Returns:
List of SMILES strings.
"""
return self.decode(self.sample_latents(num_samples))

I was considering customizing the sample method of the GeneratorWrapper to initiate from a specific latent point instead of starting from zeros. However, the provided checkpoint (GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl) is configured for the VaeWrapper, not the GeneratorWrapper.

Note: I have taken into consideration your suggestion to add small noise to address this issue, as discussed in issue 40. However, my primary interest lies in exploring a more refined solution, specifically through adjusting the num_samples parameter here:

num_samples = min(num_samples, num_choices) # Handle cases where we only have few candidates
if sampling_mode == DecoderSamplingMode.GREEDY:
# Note that this will return the top num_samples indices, but not in order:
picked_indices = np.argpartition(logprobs, -num_samples)[-num_samples:]
elif sampling_mode == DecoderSamplingMode.SAMPLING:
p = np.exp(logprobs) # Convert to probabilities
# We can only sample values with non-zero probabilities
num_choices = np.sum(p > 0)
num_samples = min(num_samples, num_choices)

Thank you in advance for your assistance.

Properly support CGVAE

Currently CGVAE is not properly tested and not supported by the inference server used by MoLeR. Even though CGVAE works worse than MoLeR across the board, it may still be worthwhile to add full support for it for any future comparisons, as generalizing the inference server to support both CGVAE and MoLeR shouldn't be too hard.

Unpin dependencies to support all modern Python versions

Due to various dependency issues we pin tensorflow to 2.1.0, which is not compatible with Python 3.8+, meaning that only Python 3.7 is supported. It would be great to unpin or at least update some of the dependencies, so that we can support all modern versions of Python.

Sample molecules based on a specific scaffold

Thanks for this excellent work !

In the README inference part, "molecule_generation encode MODEL_DIR SMILES_PATH OUTPUT_PATH" is used to generate molecules. How can I sample molecules based on a specific scaffold?

Looking forward to your reply

advices of the evaluation options defined by sklearn.metrics

I got some experience about designing model-based algorithm, like the frequently used MLP. I designed a pipeline when I extracted from any resources, like numpy.ndarray, and the most important thing is to unify the memory format to the foundation format which same as the models taken by a released package. As molecule_generation takes tensorflow as the main runtime, and I recommend if there exists a compatible package with the same effect of sklearn.metrics while reduce the hassle of format conversion (tf.Tensor → numpy.ndarray).

mae = metrics.mean_absolute_error(y_true=labels, y_pred=predictions)
mse = metrics.mean_squared_error(y_true=labels, y_pred=predictions)
max_err = metrics.max_error(y_true=labels, y_pred=predictions)
expl_var = metrics.explained_variance_score(y_true=labels, y_pred=predictions)
r2_score = metrics.r2_score(y_true=labels, y_pred=predictions)

These are all equivalent forms. If the combined transformer module can generalize high-dimensional feature datasets, then these evaluations are an objective state reflected by the machine. These evaluation metrics are returned by sklearn.metrics, not by monitoring models using the APIs from tensorflow, like tf.callbacks.ModelCheckpoint, tf.callbacks.EarlyStopping, and tf.callbacks.CSVLogger.

MoLeR inference server hangs on invalid SMILES

If MoLeRInferenceServer receives an invalid SMILES string (e.g. through molecule_generation encode), it just hangs, likely because the corresponding process failed, and the server ends up waiting for it indefinitely. This is far from ideal.

I see two kinds of behaviour we could aim for instead when encoding receives an invalid molecule:

  • The top-level process itself also fails with an exception.
  • The encoding carries on, returning None for the invalid molecules.

If possible, it could be nice to support both by having __init__ accept an fail_on_invalid_smiles: bool flag.

Make CLI support all relevant model types

Currently all CLI entry points only work for MoLeRVae as they use VaeWrapper, while e.g. molecule_generation sample could also support MoLeRGenerator. While generalizing things, we may also want to rethink the choice to do model type discovery based on filenames (@sarahnlewis may have thoughts on this).

Add tests covering CLI and the model wrappers

There are no tests covering the CLI entry points or the model wrappers, making it easy to break the wrappers during refactoring. While there is one integration test, it's slightly lower-level and uses the MoLeRInferenceServer directly. We should add some higher-level end-to-end tests.

Computing likely next actions

Hi! I'm trying to compute the top k most likely next actions given a partially generated molecule. Each "next action" consists of first selecting an atom/motif via PickAtomOrMotif, then picking an attachment and bond via PickAttachment and PickBond (assuming END_GEN isn't sampled). The most likely actions can be determined by the conditional probability of observing each selection (atom/motif, then attachment, then bond) given the initial scaffold. How would I go about doing this with the current API?

Thanks for your help!

Optimising latent vectors for objective

Hello,

I was wondering if you have any examples of doing the multiple swarm optimisation with latent vectors for an arbitrary multiobjective optimisation task.

I can't seem to find this in either the README or the code.

Best,
Min Htoo

Improve and clean up the visualisers

The visualisers were recently improved in #10, but several confusing aspects still remain. This issue is meant for tracking all these quirks while they are being cleaned up.

  • In the pictures of the partial molecules produced in HTML mode, only some of the atoms are displayed with IDs. However, atoms without IDs can still later appear in edge prediction steps, at which point it's not possible to match the IDs shown in the table to atoms in the picture.
  • The atom/motif choice step omits the "no more nodes" class, which can be especially confusing at the last step when this is the ground-truth class.
  • Steps are not numbered consistently (depending on the mode, either 0-based or 1-based), and the first node prediction step is not numbered at all. Moreover, the division of the different choices into steps is centered around edge/attachment decisions, with node decisions being glued to one of these, and this can sometimes lead to confusing outputs. In particular, in the "decode from samples" mode, in some cases the first two node selection steps are shown before the "Step 1" heading is rendered. It would be clearer to just show one step for every node/edge/attachment decision. One complication is that some of these decisions are skipped in the decoder if there is only one choice to be made (e.g. choosing an attachment point in a benzene ring); ideally, the visualisers would explicitly call it out and label it as "skipped".

Provided model cannot be used with the new Tensorflow pickle loader. (module tensorflow.python.training.tracking missing)

Hello,

I tried using the trained model GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl you provided. Trying to generate 10 molecules failed in trying to load it. It appears that the pkl version is no longer handled by the new Tensorflow version. Please, see Python trace below.

Do you have another trained model to try, or should we downgrade to an older version of Tensorflow?

Thank you,

-- Mario

molecule_generation sample /home/azureuser/molecule-generation/model 10
2023-12-08 01:49:50.037579: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions i
n performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/home/azureuser/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in
sys.exit(main())
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
func()
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in
run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/sample.py", line 30, in run_from_args
print_samples(
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/sample.py", line 13, in print_samples
with load_model_from_directory(model_dir, **model_kwargs) as model:
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/wrapper.py", line 187, in load_model_from_directory
model_class = get_model_class(ModelWrapper._get_model_file(model_dir))
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/utils/model_utils.py", line 74, in get_model_class
data_to_load = pickle.load(in_file)
ModuleNotFoundError: No module named 'tensorflow.python.training.tracking'

Data Preprocessing

Could you please provide train.smiles, valid.smiles and test.smiles in data preprocessing ?

Tensorflow warnings are not actually suppressed

We intended to turn off tensorflow warnings (such as those about TensorRT mentioned in #18), but currently that doesn't seem to be working. The problem is likely that we're setting the appropriate environment variable after we import tensorflow (see cli/ and utils/cli_utils.py).

Also, the supress_tensorflow_warnings function has a typo in "suppress".

libdevice not found during training using default conda environment on Ubuntu 22.04.2 with a RTX A4000

Hello, just to let you know that when running molecule-generation train following the Readme.md, with the default conda environment, on Ubuntu 22.04.2 with a RTX A4000 fails by not finding libdevice, log below.

I've found that pinning Tensorflow to version 2.10 instead of 2.11 (latest version and installed automatically at time of writing) as per this stackoverflow question fixes it.

If you wish, I can open a PR to pin the TF version to be 2.10 or lower until this is fixed upstream as it was also cited as a solution for #56 , or else I'm at least posting this here so that other people can find this error and solution more easily.

Error Log
Avg weighted sum. of graph losses:  291.5334
Avg weighted sum. of prop losses:   0.5965
Avg node class. loss:                 71.0492
Avg first node class. loss:           40.7059
Avg edge selection loss:              1.7546
Avg edge type loss:                   4.0202
Avg attachment point selection loss:  1.1500
Avg KL divergence:                    6981316.0000
Property results: sa_score: MAE 10.77, MSE 3818.02 (norm MAE: 13.31) | clogp: MAE 23.54, MSE 13726.24 (norm MAE: 12.95) | mol_weight: MAE 393.53, MSE 168733.92 (norm MAE: 3.57).
   (Stored model metadata and weights to ~/data/moler/saved/GNN_Edge_MLP_MoLeR__2023-06-21_09-51-05_best.pkl).
2023-06-21 09:52:54.588760: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.595713: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.612998: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.618529: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.647620: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.663816: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.683986: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.702780: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.723474: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.741439: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "~/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in <module>
    sys.exit(main())
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in <lambda>
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 179, in run_from_args
    trained_model_path = train(
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 274, in train
    train_loss, train_speed, train_results = model.run_on_data_iterator(
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/models/moler_base_model.py", line 244, in run_on_data_iterator
    task_metrics = self._run_step(batch_features, batch_labels, training)
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 336, in _run_step
    return self._fast_run_step(batch_features_tuple, batch_labels_tuple, training)
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'cond/StatefulPartitionedCall_122' defined at (most recent call last):
    File "~/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in <module>
      sys.exit(main())
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
      run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
      func()
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in <lambda>
      run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 179, in run_from_args
      trained_model_path = train(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 252, in train
      _, _, initial_valid_results = model.run_on_data_iterator(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/models/moler_base_model.py", line 244, in run_on_data_iterator
      task_metrics = self._run_step(batch_features, batch_labels, training)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 336, in _run_step
      return self._fast_run_step(batch_features_tuple, batch_labels_tuple, training)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 363, in _fast_run_step
      tf.cond(training, true_fn=_training_update, false_fn=_no_op)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 357, in _training_update
      self._apply_gradients(zip(gradients, self.trainable_variables))
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 324, in _apply_gradients
      self._optimizer.apply_gradients(gradient_variable_pairs)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'cond/StatefulPartitionedCall_122'
libdevice not found at ./libdevice.10.bc
	 [[{{node cond/StatefulPartitionedCall_122}}]] [Op:__inference__fast_run_step_84892]
Conda Environment before pip install

When I re-created the environment without the restriction this is the dependency list shown before installing molecule-generation:

# packages in environment at ~/miniconda3/envs/moler-env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4           py310h2372a71_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
blinker                   1.6.2              pyhd8ed1ab_0    conda-forge
boost                     1.78.0          py310hc4a4660_4    conda-forge
boost-cpp                 1.78.0               h6582d0a_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.19.1               hd590300_0    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            hbbf8b49_1016    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
contourpy                 1.1.0           py310hd41b1e2_0    conda-forge
cryptography              41.0.1          py310h75e40e8_0    conda-forge
cuda-version              11.8                 h70ddcb2_2    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.8.0.121            h0800d71_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
flatbuffers               23.3.3               hcb278e6_1    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.40.0          py310h2372a71_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
frozenlist                1.3.3           py310h5764c6d_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
google-auth               2.20.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
greenlet                  2.0.2           py310hc6cd4ac_1    conda-forge
grpcio                    1.51.1          py310h4a5735c_1    conda-forge
h5py                      3.9.0           nompi_py310h367e799_100    conda-forge
hdf5                      1.14.0          nompi_hb72d44e_103    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.7.0              pyha770c72_0    conda-forge
keras                     2.11.0             pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py310hbf28c38_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           17_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           17_linux64_openblas    conda-forge
libcurl                   8.1.2                h409715c_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.76.3               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgrpc                   1.51.1               h4fad500_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           17_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libtiff                   4.5.1                h8b53f26_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.4.3              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py310h2372a71_0    conda-forge
matplotlib-base           3.7.1           py310he60537e_0    conda-forge
multidict                 6.0.4           py310h1fa729e_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nccl                      2.18.3.1             h12f7317_0    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.25.0          py310ha4c1d20_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    2.0.2           py310h7cbd5c2_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    9.5.0           py310h582fbeb_1    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.6.0              pyhd8ed1ab_0    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
protobuf                  4.21.12         py310heca2aa9_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycairo                   1.24.0          py310hda9f760_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.7.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.2.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.1.0              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.11         he550d4f_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-flatbuffers        23.5.26            pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
rdkit                     2023.03.2       py310h399bcf7_0    conda-forge
re2                       2023.02.01           hcb278e6_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reportlab                 3.6.13          py310h1a56a1c_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scipy                     1.10.1          py310ha4c1d20_3    conda-forge
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlalchemy                2.0.16          py310h2372a71_0    conda-forge
tensorboard               2.11.2             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.6.1           py310h600f1e7_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.11.1          cuda112py310he87a039_0    conda-forge
tensorflow-base           2.11.1          cuda112py310h4c92a00_0    conda-forge
tensorflow-estimator      2.11.1          cuda112py310h37add04_0    conda-forge
termcolor                 2.3.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
typing-extensions         4.6.3                hd8ed1ab_0    conda-forge
typing_extensions         4.6.3              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
werkzeug                  2.3.6              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0          py310h1fa729e_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.6                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.9.2           py310h2372a71_0    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge
Conda environment after pip install

And after running pip install molecule-generation:

# packages in environment at ~/miniconda3/envs/moler-env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4           py310h2372a71_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
azure-core                1.27.1                   pypi_0    pypi
azure-identity            1.13.0                   pypi_0    pypi
azure-storage-blob        12.16.0                  pypi_0    pypi
blinker                   1.6.2              pyhd8ed1ab_0    conda-forge
boost                     1.78.0          py310hc4a4660_4    conda-forge
boost-cpp                 1.78.0               h6582d0a_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.19.1               hd590300_0    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            hbbf8b49_1016    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
contourpy                 1.1.0           py310hd41b1e2_0    conda-forge
cryptography              41.0.1          py310h75e40e8_0    conda-forge
cuda-version              11.8                 h70ddcb2_2    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.8.0.121            h0800d71_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
docopt                    0.6.2                    pypi_0    pypi
dpu-utils                 0.6.1                    pypi_0    pypi
expat                     2.5.0                hcb278e6_1    conda-forge
flatbuffers               23.3.3               hcb278e6_1    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.40.0          py310h2372a71_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
frozenlist                1.3.3           py310h5764c6d_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
google-auth               2.20.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
greenlet                  2.0.2           py310hc6cd4ac_1    conda-forge
grpcio                    1.51.1          py310h4a5735c_1    conda-forge
h5py                      3.9.0           nompi_py310h367e799_100    conda-forge
hdf5                      1.14.0          nompi_hb72d44e_103    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.7.0              pyha770c72_0    conda-forge
isodate                   0.6.1                    pypi_0    pypi
joblib                    1.2.0                    pypi_0    pypi
keras                     2.11.0             pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py310hbf28c38_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           17_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           17_linux64_openblas    conda-forge
libcurl                   8.1.2                h409715c_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.76.3               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgrpc                   1.51.1               h4fad500_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           17_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libtiff                   4.5.1                h8b53f26_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.4.3              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py310h2372a71_0    conda-forge
matplotlib-base           3.7.1           py310he60537e_0    conda-forge
molecule-generation       0.4.0                    pypi_0    pypi
more-itertools            9.1.0                    pypi_0    pypi
msal                      1.22.0                   pypi_0    pypi
msal-extensions           1.0.0                    pypi_0    pypi
multidict                 6.0.4           py310h1fa729e_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nccl                      2.18.3.1             h12f7317_0    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.25.0          py310ha4c1d20_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    2.0.2           py310h7cbd5c2_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    9.5.0           py310h582fbeb_1    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.6.0              pyhd8ed1ab_0    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
portalocker               2.7.0                    pypi_0    pypi
protobuf                  3.20.3                   pypi_0    pypi
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycairo                   1.24.0          py310hda9f760_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.7.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.2.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.1.0              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.11         he550d4f_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-flatbuffers        23.5.26            pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
rdkit                     2023.03.2       py310h399bcf7_0    conda-forge
re2                       2023.02.01           hcb278e6_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
regex                     2023.6.3                 pypi_0    pypi
reportlab                 3.6.13          py310h1a56a1c_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scikit-learn              1.2.2                    pypi_0    pypi
scipy                     1.10.1          py310ha4c1d20_3    conda-forge
sentencepiece             0.1.99                   pypi_0    pypi
setsimilaritysearch       1.0.1                    pypi_0    pypi
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlalchemy                2.0.16          py310h2372a71_0    conda-forge
tensorboard               2.11.2             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.6.1           py310h600f1e7_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.11.1          cuda112py310he87a039_0    conda-forge
tensorflow-base           2.11.1          cuda112py310h4c92a00_0    conda-forge
tensorflow-estimator      2.11.1          cuda112py310h37add04_0    conda-forge
termcolor                 2.3.0              pyhd8ed1ab_0    conda-forge
tf2-gnn                   2.13.0                   pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tk                        8.6.12               h27826a3_0    conda-forge
tqdm                      4.65.0                   pypi_0    pypi
typing-extensions         4.6.3                hd8ed1ab_0    conda-forge
typing_extensions         4.6.3              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
werkzeug                  2.3.6              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0          py310h1fa729e_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.6                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.9.2           py310h2372a71_0    conda-forge
zipp                      315.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

Tasks

Tensorflow warnings when using encode

with load_model_from_directory(model_dir) as model:
embeddings = model.encode(smiles)

"WARNING:tensorflow:From ****\tensorflow\python\util\deprecation.py:576: calling function (from tensorflow.python.eager.polymorphic_function.polymorphic_function) with experimental_relax_shapes is deprecated and will be removed in a future version.
Instructions for updating:
experimental_relax_shapes is deprecated, use reduce_retracing instead"

and

"WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.data_structures has been moved to tensorflow.python.trackable.data_structures. The old module will be deleted in version 2.11."

I would like my scripts to not just stop working one day.

Query about data split!

Hello,

I have a query about SMILES data split into training, validation and test sets.
Can I do randomly data split or are there any set-up rules for data split?
How many minimum SMILES data points are required to train the model?

Thanks

Missing dependency for GPU

Hi,

First of all, I want to thank you guys for making this available for the community. This is a great step forward in molecular generation.

However, I noticed a slight problem when using GPU. After strictly following the steps in the installation, and saving the pretrained model in the PRETRAINED_MODEL dir, I got the results below. It does still work, but there's something weird going on with the Tensorflow.

I tried installing tensorflow-gpu==2.1.0, but the message persists. What am I missing?

I'm working on a RHEL8 station with CUDA-10.2.

All the best,
Gustavo.

$ molecule_generation sample PRETRAINED_MODEL 10
2022-05-04 16:12:42.673853: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/amber/current/lib:/usr/local/cuda-10.2/lib64:/opt/amber/current/lib:/usr/local/cuda-10.2/lib64::/usr/lib64/openmpi/lib:/usr/lib64/openmpi/lib
2022-05-04 16:12:42.673995: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/amber/current/lib:/usr/local/cuda-10.2/lib64:/opt/amber/current/lib:/usr/local/cuda-10.2/lib64::/usr/lib64/openmpi/lib:/usr/lib64/openmpi/lib
2022-05-04 16:12:42.674011: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loading a trained model from: PRETRAINED_MODEL/GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl
2022-05-04 16:12:44,365 trace_dataset.py:44 INFO Initialising TraceDataset.
2022-05-04 16:12:44,392 trace_dataset.py:44 INFO Initialising TraceDataset.
2022-05-04 16:12:44,403 trace_dataset.py:44 INFO Initialising TraceDataset.
2022-05-04 16:12:44,412 trace_dataset.py:44 INFO Initialising TraceDataset.
2022-05-04 16:12:44,423 trace_dataset.py:44 INFO Initialising TraceDataset.
2022-05-04 16:12:44,434 trace_dataset.py:44 INFO Initialising TraceDataset.
O=C1C2=CC=C(C3=CC=CC=C3)C=C=C2OC2=CC=CC=C12
CC(=O)NC1=NC2=CC(OCC3=CC=CN(CC4=CC=C(Cl)C=C4)C3=O)=CC=C2N1
CCN1C(=O)C2=CC=CC=C2N=C1NC(C)C(=O)NCC(=O)N=[N+]=[N-]
CC(=O)N1CCCC1C1=NC2=CC=C(C(C)(C)CCCC(C)C)C=C2NC1=NC1=CC=C(O)C=C1
N=C(N)NCCCCOC1=CC=C(Br)C(Cl)=N1
O=C1C2=CC=C(C3=NN=CO3)C=C2N=CN1CC1=CC=C(C2=CC=CC=C2)C=C1
O=CCCCCCN1C=CC2=CC=CC=C21
CCOC(=O)C1=CC2=CC(CC(C)C)=CC=C2N=C1C1=CC=C(Br)C=C1
CC1=C(C#N)C=C(NC(=O)NC2CCCCC2)N1CC#N
CC1=CNN=C1NC(=O)COC1=CC=C(Cl)C=C1Cl

M1 Mac problem

Hi,
I am trying to run the command

molecule_generation sample MODEL_DIR 10

On my M1 mac, trying this with the current install instructions leads to the following error:

zsh: illegal hardware instruction molecule_generation sample MODEL_DIR 10

Do you know any fix for this?

Thanks,
Mike

memory overflow with large dataset preprocessing

Dear author, I am trying to train the model with over 10 millions datapoints and even though I set --num-processes as 3 by molecule_generation preprocess data/merged_lib results/merged_lib_full traces/merged_lib_full --pretrained-model-path xxx_best.pkl --num-processes 3, the memory keeps growing and overflow.

Any approach to reduce memory for extremely large dataset?
Thanks!

Clarification: correct_edge_choices is array of all zeros, while valid_edge_choices has a few candidates

Hi,

After running the preprocessing script and looking into some of the individual generation trace steps, I found that some of the steps have valid_edge_choices as a non empty array of possible edges but the corresponding correct_edge_choices for the same step is an array of all zeros. Could I clarify what does this mean in this context? Does that mean in this case that during this step we shouldn't be adding an edge to the molecular graph and should instead be adding an atom/motif? Thanks!

how can i generate large SMILES ? for example generate 100000000?

how can i generate large SMILES ?
for example generate 100000000?

import numpy as np
from molecule_generation import VaeWrapper

model_dir = "./example_model_directory"
scaffold = "C1=CC=CC=C1"
init_mol = "O=CNC1=CC=CC=C1"

with VaeWrapper(model_dir) as model:
    [latent_center] = model.encode([init_mol])
    latents = latent_center + 0.5 * np.random.randn(100000000, latent_center.shape[0]).astype(np.float32)
    for idx, smiles in enumerate(model.decode(latents, scaffolds=[scaffold] * len(latents))):
        print(f"Result #{idx + 1}: {smiles}")

but the code is need too much time and the computer memory is not enough?
can you help me ?(or how can i run the code ?)

preprocess need too many time AND how use my csv to generate the train.smiles and valid.smiles

image
i use the molecule_generation preprocess ./mol ./output ./trace.

but it takes 16 hours ,but just for write the metadata.pkl.gz for 2%,in the paper i see just use one day for K80,i use the 3090,because the V100's 24G memory is small,what's the reason,how can i decrease the time consumption.

And another question,How can i use my train.csv and valid.cvs(which include 4,000,000smiles)to generate the train.smiles and valid.smiles,thanks~

Large amount of error messages when using decode

My script works, but a large amount of error messages is shown:

"RDKit runtime error on base molecule, with message:
unsupported operand type(s) for -: '_vectclass std::vector<int,class std::allocator >' and '_vectclass std::vector<int,class std::allocator >'"

"WARNING:tensorflow:5 out of the last 5 calls to <function MLP.call at 0x00000232D2F44D30> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details."

and

"WARNING:tensorflow:6 out of the last 6 calls to <function MLP.call at 0x00000232D2F44EE0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details."

Decoding 20 embeddings arrays I get more than 700 rows of errors. Then it surprisingly prints the 20 molecules, but this wall of errors is shown. I would like my scripts to not trigger errors.

Motif embeddings

Hi,

It is mentioned in the paper that motif embeddings of size 64 were learned and concatenated to the atom input. Could I check, how do we generate these learned embeddings? Or are they already included in the partial node features when we run the preprocessing command using the CLI?

Additionally what does partial_node_categorical_features refer to in the input data? Thanks!

Question about node_type_predictor_class_loss_weight_factor

Hi,

In this line, it seems that there is a mention of a node_type_predictor_class_loss_weight_factor from the metadata, but I couldn't seem to find where this was generated in the preprocessing script and it also didn't seem to appear after running the preprocessing script. Could I check if this attribute has been deprecated and is no longer being used in this repo? Or did I miss something in the preprocessing steps? Thanks!

IndexError: pop from empty list

Hello, after I preprocessed the data and run train as described in the documentation, I immediately get the following error during data loading. Do you guys have any insight into why it happens? Thank you!

Traceback (most recent call last):
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in
sys.exit(main())
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/cli/cli.py", line 35, in main
run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
func()
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/cli/cli.py", line 35, in
run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/cli/train.py", line 140, in run_from_args
loaded_model_dataset = training_utils.get_model_and_dataset(
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/tf2_gnn/cli_utils/model_utils.py", line 299, in get_model_and_dataset
model = get_model(
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/tf2_gnn/cli_utils/model_utils.py", line 224, in get_model
return model_cls(
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/models/moler_vae.py", line 89, in init
super().init(params, dataset, **kwargs)
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/models/moler_base_model.py", line 60, in init
super().init(params, dataset, **kwargs)
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/tf2_gnn/models/graph_task_model.py", line 55, in init
batch_description = dataset.get_batch_tf_data_description()
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/dataset/trace_dataset.py", line 376, in get_batch_tf_data_description
base_description = super().get_batch_tf_data_description()
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/tf2_gnn/data/graph_dataset.py", line 259, in get_batch_tf_data_description
"node_features": (None,) + self.node_feature_shape,
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/dataset/trace_dataset.py", line 517, in node_feature_shape
return self._get_cached_property("_node_feature_shape")
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/dataset/trace_dataset.py", line 510, in _get_cached_property
self._load_feature_shapes_from_data()
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/dataset/trace_dataset.py", line 500, in _load_feature_shapes_from_data
datum = self._load_one_sample(data_fold)
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/dataset/jsonl_abstract_trace_dataset.py", line 108, in _load_one_sample
return next(iter(graph_it))
File "/zfsauton2/home/chenghuz/miniconda3/envs/moler-env/lib/python3.9/site-packages/molecule_generation/utils/sharded_data_reader.py", line 92, in next
next_datum = self._current_file_data.pop()
IndexError: pop from empty list

Help in finetuning generator

Hi,

I'm trying an experiment, to fine-tune the generator with a small set of molecules with specific properties, (so it will generate new molecules with similar properties) but I'm running into some errors that I have been unable to solve. I'd really appreciate if anyone could shed some light into what I'm doing wrong.

What I'm doing:

  1. From a set of 10 molecules, split into 80:10:10 for train:valid:test. Put into folder finetune_moler/input.
  2. Run MoLeR in pre-process mode:
    $ molecule_generation preprocess finetune_moler/input finetune_moler/output finetune_moler/trace
  3. Then try to finetune the pre-trained model provided with the small set of molecules above: ```
molecule_generation train MoLeR finetune_moler/trace \
				--load-saved-model ./PRETRAINED_MODEL/GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl \
				--load-weights-only \
				--save-dir finetune_moler/tuned_model

The pre-process step seems to run just fine. But in the fine-tuning step, I'm getting the following error:

(dumps a lot of informational messages)
Traceback (most recent call last):
  File "/opt/miniconda3/envs/moler-env/bin/molecule_generation", line 33, in <module>
    sys.exit(load_entry_point('molecule-generation', 'console_scripts', 'molecule_generation')())
  File "/home/seabra/work/source/repos/microsoft/molecule-generation/molecule_generation/cli/cli.py", line 35, in main
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "/home/seabra/work/source/repos/microsoft/molecule-generation/molecule_generation/cli/cli.py", line 35, in <lambda>
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "/home/seabra/work/source/repos/microsoft/molecule-generation/molecule_generation/cli/train.py", line 140, in run_from_args
    loaded_model_dataset = training_utils.get_model_and_dataset(
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/cli_utils/model_utils.py", line 319, in get_model_and_dataset
    load_weights_verbosely(trained_model_file, model)
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/cli_utils/model_utils.py", line 148, in load_weights_verbosely
    K.batch_set_value(tfvar_weight_tuples)
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 911, in assign
    raise ValueError(
ValueError: Cannot assign value to variable ' decoder/node_categorical_features_embedding/categorical_features_embedding:0': Shape mismatch.The variable shape (98, 64), and the assigned value shape (166, 64) are incompatible.

Could someone point me to what I'm doing wrong? Would it be possible to get an example of successfully fine-tuning the model?

Thanks a lot!
Gustavo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.