nvidia / cheminformatics Goto Github PK
View Code? Open in Web Editor NEWFacilitates searching, screening, and organizing large chemical databases
Facilitates searching, screening, and organizing large chemical databases
I tried following the instructions shown in the megamolbart/README, but that does not work for me:
--(Wed Apr 06|15:25 [master]$)- ./launch.sh dev 2
sourcing environment from ./.env
+ local CONTAINER_OPTION=2
+ local CONT=nvcr.io/nvstaging/clara/cheminformatics_demo:latest
+ [[ 2 -eq 2 ]]
+ DOCKER_CMD='docker run --rm --network host --runtime=nvidia -p :8888 -p 9001:9001 -p 5000:5000 -v /home/muammar/git/cheminformatics:/workspace -v /home/muammar/git/cheminformatics/data/data:/data -u 1000:1000 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -e HOME=/workspace -e TF_CPP_MIN_LOG_LEVEL=3 -w /workspace -v /home/muammar/git/cheminformatics/megamolbart/models:/models/megamolbart/'
+ DOCKER_CMD='docker run --rm --network host --runtime=nvidia -p :8888 -p 9001:9001 -p 5000:5000 -v /home/muammar/git/cheminformatics:/workspace -v /home/muammar/git/cheminformatics/data/data:/data -u 1000:1000 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -e HOME=/workspace -e TF_CPP_MIN_LOG_LEVEL=3 -w /workspace -v /home/muammar/git/cheminformatics/megamolbart/models:/models/megamolbart/ -w /workspace/megamolbart/'
+ CONT=nvcr.io/nvstaging/clara/megamolbart:latest
+ docker run --rm --network host --runtime=nvidia -p :8888 -p 9001:9001 -p 5000:5000 -v /home/muammar/git/cheminformatics:/workspace -v /home/muammar/git/cheminformatics/data/data:/data -u 1000:1000 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -e HOME=/workspace -e TF_CPP_MIN_LOG_LEVEL=3 -w /workspace -v /home/muammar/git/cheminformatics/megamolbart/models:/models/megamolbart/ -w /workspace/megamolbart/ -it nvcr.io/nvstaging/clara/megamolbart:latest bash
WARNING: Published ports are discarded when using host network mode
=============
== PyTorch ==
=============
NVIDIA Release 20.11 (build 17345815)
PyTorch Version 1.8.0a0+17f8c32
Container image Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2014-2020 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: No supported GPU(s) detected to run this container
NOTE: MOFED driver for multi-node communication was not detected.
Multi-node communication performance may be reduced.
(base) bash-4.4$
After getting in the SHELL, I do:
(base) bash-4.4$ python launch.py &
[1] 54
(base) bash-4.4$ INFO:megamolbart:Maximum decoded sequence length is set to 512
INFO:megamolbart:Triggering model download...
Downloading model megamolbart to /models/megamolbart...
++ wget -q --show-progress --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/clara/megamolbart/versions/0.1/zip -O /models/megamolbart/megamolbart_0.1.zip
/models/megamolbart/megamolbart_0.1.zip: Permission denied
++ mkdir /models/megamolbart
mkdir: cannot create directory ‘/models/megamolbart’: File exists
++ unzip -q /models/megamolbart/megamolbart_0.1.zip -d /models/megamolbart
unzip: cannot find or open /models/megamolbart/megamolbart_0.1.zip, /models/megamolbart/megamolbart_0.1.zip.zip or /models/megamolbart/megamolbart_0.1.zip.ZIP.
INFO:megamolbart:Model download result: None
INFO:megamolbart:Model download result: None
Traceback (most recent call last):
File "launch.py", line 98, in <module>
main()
File "launch.py", line 94, in main
Launcher()
File "launch.py", line 71, in __init__
self.download_megamolbart_model()
File "launch.py", line 92, in download_megamolbart_model
raise Exception('Error downloading model')
Exception: Error downloading model
The user created in the container does not have permission to write on /models/megamolbart
. I am looking to feed some SMILES strings to megamolbart
and generate embeddings. How can I achieve that? I would appreciate any help you could provide me. Thanks.
Hi, I got the MegaMolBART docker container up and running with the following command:
$docker run --name megamolbart --gpus all --rm -v $(pwd)/megamolbart_v0.1/:/models/megamolbart -v $(pwd)/shared/:/shared nvcr.io/nvidia/clara/megamolbart:latest &
I git cloned this repository in shared/ but can't find a way to even test the model.
In particular, I get the following error when trying to run test_megamolbart.py:
root@5f1951df00f5:/shared# mv cheminformatics/megamolbart/megamolbart/ . && mv cheminformatics/megamolbart/tests/test_megamolbart.py .
root@13b98eed1cbd:/shared# python test_megamolbart.py
using world size: 1 and model-parallel size: 1
using torch.float32 for parameters ...
-------------------- arguments --------------------
adam_beta1 ...................... 0.9
adam_beta2 ...................... 0.999
adam_eps ........................ 1e-08
adlr_autoresume ................. False
adlr_autoresume_interval ........ 1000
apply_query_key_layer_scaling ... False
apply_residual_connection_post_layernorm False
attention_dropout ............... 0.1
attention_softmax_in_fp32 ....... False
batch_size ...................... None
bert_load ....................... None
bias_dropout_fusion ............. False
bias_gelu_fusion ................ False
block_data_path ................. None
checkpoint_activations .......... False
checkpoint_in_cpu ............... False
checkpoint_num_layers ........... 1
clip_grad ....................... 1.0
contigious_checkpointing ........ False
cpu_optimizer ................... False
cpu_torch_adam .................. False
data_impl ....................... infer
data_path ....................... None
dataset_path .................... None
DDP_impl ........................ local
deepscale ....................... False
deepscale_config ................ None
deepspeed ....................... False
deepspeed_activation_checkpointing False
deepspeed_config ................ None
deepspeed_mpi ................... False
distribute_checkpointed_activations False
distributed_backend ............. nccl
dynamic_loss_scale .............. True
eod_mask_loss ................... False
eval_interval ................... 1000
eval_iters ...................... 100
exit_interval ................... None
faiss_use_gpu ................... False
finetune ........................ False
fp16 ............................ False
fp16_lm_cross_entropy ........... False
fp32_allreduce .................. False
gas ............................. 1
hidden_dropout .................. 0.1
hidden_size ..................... 256
hysteresis ...................... 2
ict_head_size ................... None
ict_load ........................ None
indexer_batch_size .............. 128
indexer_log_interval ............ 1000
init_method_std ................. 0.02
layernorm_epsilon ............... 1e-05
lazy_mpu_init ................... None
load ............................ /models/megamolbart/checkpoints
local_rank ...................... None
log_interval .................... 100
loss_scale ...................... None
loss_scale_window ............... 1000
lr .............................. None
lr_decay_iters .................. None
lr_decay_style .................. linear
make_vocab_size_divisible_by .... 128
mask_prob ....................... 0.15
max_position_embeddings ......... 512
merge_file ...................... None
min_lr .......................... 0.0
min_scale ....................... 1
mmap_warmup ..................... False
model_parallel_size ............. 1
no_load_optim ................... False
no_load_rng ..................... False
no_save_optim ................... False
no_save_rng ..................... False
num_attention_heads ............. 8
num_layers ...................... 4
num_unique_layers ............... None
num_workers ..................... 2
onnx_safe ....................... None
openai_gelu ..................... False
override_lr_scheduler ........... False
param_sharing_style ............. grouped
params_dtype .................... torch.float32
partition_activations ........... False
pipe_parallel_size .............. 0
profile_backward ................ False
query_in_block_prob ............. 0.1
rank ............................ 0
report_topk_accuracies .......... []
reset_attention_mask ............ False
reset_position_ids .............. False
save ............................ None
save_interval ................... None
scaled_masked_softmax_fusion .... False
scaled_upper_triang_masked_softmax_fusion False
seed ............................ 1234
seq_length ...................... None
short_seq_prob .................. 0.1
split ........................... 969, 30, 1
synchronize_each_layer .......... False
tensorboard_dir ................. None
titles_data_path ................ None
tokenizer_type .................. GPT2BPETokenizer
train_iters ..................... None
use_checkpoint_lr_scheduler ..... False
use_cpu_initialization .......... False
use_one_sent_docs ............... False
vocab_file ...................... /models/megamolbart/bart_vocab.txt
warmup .......................... 0.01
weight_decay .................... 0.01
world_size ...................... 1
zero_allgather_bucket_size ...... 0.0
zero_contigious_gradients ....... False
zero_reduce_bucket_size ......... 0.0
zero_reduce_scatter ............. False
zero_stage ...................... 1.0
---------------- end of arguments ----------------
> initializing torch distributed ...
Traceback (most recent call last):
File "test_megamolbart.py", line 16, in <module>
wf = MegaMolBART()
File "/shared/megamolbart/inference.py", line 71, in __init__
initialize_megatron(args_defaults=args, ignore_unknown_args=True)
File "/opt/conda/lib/python3.6/site-packages/megatron/initialize.py", line 77, in initialize_megatron
finish_mpu_init()
File "/opt/conda/lib/python3.6/site-packages/megatron/initialize.py", line 59, in finish_mpu_init
_initialize_distributed()
File "/opt/conda/lib/python3.6/site-packages/megatron/initialize.py", line 156, in _initialize_distributed
init_method=init_method)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 448, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 133, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
I am interested in getting the embeddings for a bunch of molecules. Any suggestion?
Thank you for this tutorial! I encountered an error message in the step 5 of "Generating Novel Compounds" section.
I pressed "Generate" and got this Error box message:
"Command '['bash', '-c', 'mkdir -p /data/mounts/cddd && cd /data/mounts/cddd; /tmp/download_default_model.sh']' returned non-zero exit status 9."
The following is the message in the terminal. I think there's something wrong with my path setting, but I can't figure out what it is. Please give me some advice. Thank you in advance.
=============
cuchemUI_1 | WARNING:cuchemcommon.context:data_mount_path not found, returing default.
cuchemUI_1 | % Total % Received % Xferd Average Speed Time Time Time Current
cuchemUI_1 | Dload Upload Total Spent Left Speed
100 2219 0 2219 0 0 5790 0 --:--:-- --:--:-- --:--:-- 5778
cuchemUI_1 | Archive: default_model.zip
cuchemUI_1 | End-of-central-directory signature not found. Either this file is not
cuchemUI_1 | a zipfile, or it constitutes one disk of a multi-part archive. In the
cuchemUI_1 | latter case the central directory and zipfile comment will be found on
cuchemUI_1 | the last disk(s) of this archive.
cuchemUI_1 | unzip: cannot find zipfile directory in one of default_model.zip or
cuchemUI_1 | default_model.zip.zip, and cannot find default_model.zip.ZIP, period.
cuchemUI_1 | Traceback (most recent call last):
cuchemUI_1 | File "/workspace/cuchem/cuchem/utils/init.py", line 41, in func_wrapper
cuchemUI_1 | return func(*args, **kwargs)
cuchemUI_1 | File "/workspace/cuchem/cuchem/interactive/chemvisualize.py", line 271, in handle_generation
cuchemUI_1 | generative_wf = wf_class()
cuchemUI_1 | File "/opt/nvidia/cheminfomatics/common/cuchemcommon/utils/singleton.py", line 25, in call
cuchemUI_1 | *args, **kwargs)
cuchemUI_1 | File "/workspace/cuchem/cuchem/wf/generative/cddd.py", line 20, in init
cuchemUI_1 | self.default_model_loc = download_cddd_models()
cuchemUI_1 | File "/workspace/cuchem/cuchem/utils/data_peddler.py", line 35, in download_cddd_models
cuchemUI_1 | check=True)
cuchemUI_1 | File "/opt/conda/envs/rapids/lib/python3.7/subprocess.py", line 512, in run
cuchemUI_1 | output=stdout, stderr=stderr)
cuchemUI_1 | subprocess.CalledProcessError: Command '['bash', '-c', 'mkdir -p /data/mounts/cddd && cd /data/mounts/cddd; /tmp/download_default_model.sh']' returned non-zero exit status 9.
cuchemUI_1 | INFO:werkzeug:192.168.0.1 - - [02/Mar/2022 03:19:49] "POST /_dash-update-component HTTP/1.1" 200 -
cuchemUI_1 | INFO:werkzeug:192.168.0.1 - - [02/Mar/2022 03:19:49] "POST /_dash-update-component HTTP/1.1" 200 -
When I run launch.sh start
, the cuchemui container fails to launch with the following error:
ImportError: cannot import name 'get_current_traceback' from 'werkzeug.debug.tbtools' (/opt/conda/envs/rapids/lib/python3.7/site-packages/werkzeug/debug/tbtools.py)
This seems to be the same as this Issue.
Workaround:
Add the following line to cuchem/requirements.txt
and run launch.sh build
again.
werkzeug==2.0.0
When attempting to build the containers you get this issue building cuchem
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is not signed.
It seems related to this issue, about NVIDIA rotating their gpg keys.
One can either remove the sources.list or possibly upgrade the base container.
Is there any function,
can get the last embedding ?
eg: [245, 1, 256] -> only take the 256 embedding list , not all embedding 245x256
best regard
I don't use bash, but zsh. When trying ./launch.config config
on the dev
branch, the script assumes bash. Do you plan to support other shells?
Hi,
Is there a way to get megamolbart embedding from smiles as pretrained encoder with the associated tokenizer if needed ?
Thanks a lot
The FROM statement points to an image that appears to be private. Please advise on the location of a public alternative
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.