drorlab / atom3d Goto Github PK

View Code? Open in Web Editor NEW

294.0 17.0 35.0 41.69 MB

ATOM3D: tasks on molecules in three dimensions

Home Page: https://www.atom3d.ai

License: MIT License

Makefile 0.46% Python 99.54%

atom3d's Introduction

ATOM3D: Tasks On Molecules in 3 Dimensions

ATOM3D enables machine learning on three-dimensional molecular structure.

Features

Access to several datasets involving 3D molecular structure.
LMDB data format for storing lots of molecules (and associated metadata).
Utilities for splitting/filtering data based on many criteria.

For more detailed information, read the documentation.

Installation

Install with:

pip install atom3d

To use rdkit functionality, please install within conda:

conda create -n atom3d python=3.6 pip rdkit
conda activate atom3d
pip install atom3d

Usage

Downloading a dataset

From python:

import atom3d.datasets as da
da.download_dataset('lba', PATH_TO_DATASET) # Download LBA dataset.

Or, download and unzip from the website.

Loading a dataset

From python:

import atom3d.datasets as da
dataset = da.load_dataset(PATH_TO_DATASET, {'lmdb','pdb','silent','sdf','xyz','xyz-gdb'})
print(len(dataset))  # Print length
print(dataset[0].keys())  # Print keys

LMDB datasets

LMDB allows for compressed, fast, random access to your structures, all within a single database. Currently, we support creating LMDB datasets from PDB files, silent files, and xyz files.

Creating an LMDB dataset

From command line:

python -m atom3d.datasets PATH_TO_PDB_DIR PATH_TO_DATASET --filetype {pdb,silent,xyz,xyz-gdb}

For more usage, please see the documentation.

Contribute

As a living repository, we welcome contributions of additional datasets, methods, and functionality! See the Contributing section of the documentation for details.

Support

For support, please file an issue at https://github.com/drorlab/atom3d/issues.

License

The project is licensed under the MIT license.

Reference

We provide an overview on ATOM3D and details on the preparation of all datasets in our preprint:

R. J. L. Townshend, M. Vögele, P. Suriana, A. Derry, A. Powers, Y. Laloudakis, S. Balachandar, B. Jing, B. Anderson, S. Eismann, R. Kondor, R. B. Altman, R. O. Dror "ATOM3D: Tasks On Molecules in Three Dimensions", arXiv:2012.04035

Please cite this work if some of the ATOM3D code or datasets are helpful in your scientific endeavours. For specific datasets, please also cite the respective original source(s), given in the preprint.

atom3d's People

Stargazers

Watchers

atom3d's Issues

Importing xyz file

I was trying to make a lmdb file from an xyz file through atom3d.datasets, but I keep receiving this error:
File "$DIR/python3.7/site-packages/atom3d/util/formats.py", line 128, in df_to_bps
atom['element'])
File "$DIR/python3.7/site-packages/Bio/PDB/Atom.py", line 96, in init
assert not element or element == element.upper(), element
AssertionError: Si

The first line of my xyz file is: Si -0.04553300 -1.14436300 0.73426900
But I assume that Si is a proper element. How can I fix this issue?

Missing labels in res-del dataset

Thanks for the great work on this package!
I downloaded the LMDB dataset for residue deletion which unzipped to the following folder:
/raw/RES/data/
data.mdb lock.mdb

When I look at the dataframes for each protein structure, the labels are missing.
dataset = da.load_dataset(lmdb_path, 'lmdb')

 dataset.get('100d')
{'atoms':      ensemble  subunit structure  model chain hetero insertion_code  ...      x      y       z element  name  fullname  serial_number
0    100d.pdb        0  100d.pdb      0     A                        ... -4.549  5.095   4.262       O   O5'       O5'              1
1    100d.pdb        0  100d.pdb      0     A                        ... -4.176  6.323   3.646       C   C5'       C5'              2

[408 rows x 20 columns], 'id': '100d', 'file_path': '/oak/stanford/groups/rbaltman/aderry/graph-pdb/data/raw/100d.pdb', 'labels': Empty DataFrame
Columns: [subunit, label, x, y, z]
Index: [], 'subunit_indices': [], 'types': {'atoms': "<class 'pandas.core.frame.DataFrame'>", 'id': "<class 'str'>", 'file_path': "<class 'str'>", 'labels': "<class 'pandas.core.frame.DataFrame'>", 'subunit_indices': "<class 'list'>", 'types': "<class 'dict'>"}}

Is the idea that one downloads this slightly reformatted PDB data and then runs some feature generation code (ex: generate voxels for 3D CNN) on top of it? Can you please point to the code that can do this (the current code in this repo still seems to use shards and not the lmdb format)?

Thanks.

Missing atom3d.shard package in benchmarking/pytorch-geometric dataloaders.

I was hoping to recreate the ppi benchmarking example for GNNs. It seems like the ppi_dataloader.py and most of the non QM9 data loaders import an atom3d.shard package which no longer exists. What is the best way to recreate the ppi experiment for GNNs?

Thank you!

Multiple sequences for one protein

I'm using the recently uploaded LBA dataset in LMDB format. I found that in many examples there is more than one string in the 'seq' attribute, which I understand to be the amino acids sequence of the protein in the complex. Can you explain why there can be multiple sequences for one protein?

DB5 dataset for Protein Interface Prediction is empty

Hi, thanks for this amazing and comprehensive work!

After I download the full dataset from https://www.atom3d.ai/pip.html, I load the DB5 dataset, and print its length which is 0. The DIPS part data is correct.

Question on Data Splitting

Hi there, I have some questions about the data splitting on LBA.

I'm not sure if these three indices txt files are already generated in the previous steps (https://github.com/drorlab/atom3d/blob/master/examples/lba/dataset/prepare_lmdb.py#L156-L161).

Follow-up:
I checked the code base again, and found the identity split function. But I haven't found the scripts for running them. Then the remaining question is how to set the blast_db variable.

Questions about the LEP dataset

Hi, dear authors of atom3d, thanks for providing the data collection. I encounter a problem in understanding the LEP dataset.

If my understanding is correct, each data point has an 'atoms_active' and an 'atoms_inactive'. These two correspond to two different protein-ligand pairs, with one positive label and one negative label. However, there is also another key named 'label'. It takes two sorts of values: A or I.

I guess A stands for active, and I represents inactive? Therefore, it seems contradictory because what this 'label' is used for?

lba bugs

When i use enn in lba, it seems "cgprod_bounded" is not defined in argsparse and got some errors. Thanks for your time!

`res` dataset source appears to be broken

When I try to pull the res dataset I get:

>>> da.download_dataset(out_path="/path/to/data/atom3d-data", name="res", split="cath-topology")
--2022-02-05 12:50:33--  http://1rjeayyofn0y6pgnljyg0fy5fkqaopoqc/
Resolving 1rjeayyofn0y6pgnljyg0fy5fkqaopoqc (1rjeayyofn0y6pgnljyg0fy5fkqaopoqc)... failed: Name or service not known.
wget: unable to resolve host address ‘1rjeayyofn0y6pgnljyg0fy5fkqaopoqc’

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now

With or without splits this breaks. Pulling the link from the file and checking its also broken.

PSR dataset improvement

Hello, thank you for providing such a great dataset collection to the community. I was recently working on the PSR dataset and noticed some possible improvements can be made. Mainly, about maintainability and some more information.

It would be beneficial to keep the dataset up to date. For example at the time of the dataset publication, CASP 11 stage 2 was selected as a test set. I think this can be updated to a more recent version. This can require some versioning at the dataset level to make it consistent. It can be named according to the last full CASP name.
It is related to the last point, currently, it is challenging to extend it and keep it consistent. Maybe some guidelines can be helpful.
I couldn't find a way to tell if a sample is from stage 1 or stage 2. I'm not sure if I'm missing something but there is no information available for custom splits.

Again, thank you for sharing your great work.

Benchmark for Atom3D

Hi @drorlab,

Thank you for contributing Atom3D. I would like to know where can we know the benchmark information for those datasets? Are you going to maintain a benchmark just like OGB?
thank you!

LBA Dataset Confirmation

Hi there,

Thank you for providing the code base. I have one question about LBA dataset.

After downloading, I found there are 4,463 datapoints under folder pdbbind_2019-refined-set, however, on the PDBBind website, it shows that pdbbind_2018 version has the same number of datapoints. So I just want to double-check which version are you using?

ModuleNotFoundError

ModuleNotFoundError: No module named 'atom3d.datasets.ppi'

Atom3D for a reaction dataset

I was wondering if it is possible to use atom3D package to predict the properties of a reaction, where the input data could be the structure of the reactant and the product molecules, and the output is a property of the reaction itself.

Graphs are not bidirectional

Hi,

prot_df_to_graph and mol_df_to_graph result in graphs with connections that go only one way, which is due to the output of scipy's query_pairs. A possible fix would be to use:

edges = torch.cat((edges, edges.flip(dims=(0,))), dim=1)

issue about tr.GraphTransform

Hi, I have an question about da.load_dataset.

I prepared a new 'lmdb' from pdb, and add the label based on the tutorial. However, after reading from da.load_dataset, the 'y' does not show a single value, instead it would show in a form of Dataframe. please see below. Could you advice me how to deal with this?

train_dataset = da.load_dataset(PATH_TO_LMDB_OUTPUT, 'lmdb',  transform=tr.GraphTransform(atom_key='atoms', label_key='label'))
for i in train_dataset[0]:
    print(i)

('y', label
0 5.72556)

Thus, when doing for the following command, the batch.y does not show the format of a tensor.

batch in train_loader:
    print(batch.y)

The way I generated the lmdb is :

def add_label(item):
    # Remove the file ending ".pdb" from the ID
    name = item['id'][:-4]
    # Get label data
    label_file = os.path.join(PATH_TO_LABELS_DIR, name+'.csv')
    # Add label data in form of a data frame
    item['label'] = pd.read_csv(label_file)
    return item

## Load dataset from directory of PDB files
dataset = da.load_dataset(PATH_TO_INPUT_DIR, 'pdb', transform=add_label)
# Create LMDB dataset from PDB dataset
da.make_lmdb_dataset(dataset, PATH_TO_LMDB_OUTPUT)

If possible, could you give an example of the content in the csv file for add_label? I'd just like to make sure the format is right.
Thank you in advance to looking into this question.

da.load_dataset()加载数据

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 16: invalid continuation byte

Didn't use gpu even 'torch.cuda.is_available()' returns True when I used enn

I followed https://github.com/drorlab/atom3d/blob/master/examples/lba/enn/README.rst, used
cd atom3d/examples/lba/enn
python train.py --target neglog_aff --load
--prefix lba-id30_cutoff-06_maxnumat-600
--datadir $LMDBDIR --format lmdb
--cgprod-bounded
--radius 6 --maxnum 600
--batch-size 1 --num-epoch 150

my terminal and log file show "Beginning training on CUDA/GPU! Device: 0", but nvidia-smi shows No running processes found.
It happened when I tried the lep example. However, at the same virtual env, I tried https://github.com/drorlab/cormorant, python examples/train_qm9.py. It worked well.

how do i get processed data for ENN training

Dear Authors,

I met problems when preparing input for ENN training for SMP, as your example shows. Can you show the preprocessing script? I read your fantastic documentation. But I still need your help during preprocessing for ENN & SMP.

In addition, do you run "python train.py" directly in bash?

Appreciate!

Dataset Preparation

If I wants to prepare a new dataset for LBA task, what should I do to deal with my data?

I prepared protein pdb files with corresponding ligand sdf files for this task.

And I see atom3d/atom3d/datasets/lba/process_pdbbind.py that generate three files of format (sdf, cif, cif). But in prepare_lmdb.py need file with formats: (sdf, pdb, pdb) correspondingly.

In my kind of view, I should first transform my dataset to generate (ligand, pocket, protein) fileset with process_pdbbind.py and then use prepare_lmdb.py to generate final hdf5 file for further loading by LMDBDataset. But it's clearly that now I cannot do in that way.

There also exists multiple scripts like create_hdf5.py that seems to be another way to generate hdf5 lmdb file.

So HOW should I do in depth to prepare?

I'm feeling super confusing on preparing new dataset and doing specific task with atom3d.

Trouble reproducing GNN results on LBA using provided code

Hi, I'm unable to reproduce the results quoted in the paper for the performance achieved by the baseline GNN model on the LBA dataset with 30% sequence identity split. I'm following atom3d/examples/lba/gnn to download the dataset and train the model with hyperparameters given in the README. Over 6 runs with different seeds, I'm getting 1.58 +- 0.04 for test RMSE, 0.53 +- 0.03 for test Pearson and 0.54 +- 0.04 for test Spearman. Only the RMSE is consistent with what's reported in the paper. Can the authors confirm that the hyperparameters and code given in atom3d/examples/lba/gnn are identical to what's used to produce the results in the paper? Thanks!

model.to(device)

Dear Authors,

I use the LBA model and try to use "model = model.to(device)", but it got the error

"device, dtype, non_blocking = torch._C._nn._parse_to(*args, **kwargs)
ValueError: too many values to unpack (expected 3)"

do you know how to change the device (e.g., cpu to gpu) for the CGModule model?

thanks a lot!

Questions about targets of LBA in atom3d paper

Hi, as clarified in the atom3D paper, the metrics for the LBD dataset is ‘-log(K)‘. Do we need to further calculate the log negative score by ourselves, or the value of item['scores']['neglog_aff'] is already preprocessed?

Besides, I notice there are two types of proteins: one is the original one and the other is the pocket. Can I use pocket-ligand pair to predict the binding affinity? Since the pocket contains far less atoms.

example for LBA

Dear Authors,

I saw you provide examples for SMP, which is super helpful. I'm studying LBA and wonder how ENN models LBA, especially how ENN represents proteins. Will you release the example codes these days? Thanks

Questions on `chain` in LEP

Hi there,

Thank you for providing the code base.

I have some questions on the LEP dataset. So it seems that there are multiple values for the chain in atoms, e.g. I have the following 5 chain sets for all atoms:

{'L', 'A'}
{'L', 'D', 'E', 'B', 'G'}
{'C', 'A', 'B', 'L'}
{'L', 'A', 'B'}
{'L', 'D', 'C'}

'L' stands for the ligands, but what about others? And according to this function, you are saying any chain that is not 'L' is treated to be in the pocket? Including 'A', 'B', 'C', 'D', 'E', 'G' right?

Just want to double-check to better understand the dataset. Any help is appreciated.

Providing data access in standard formats

Hello! I was wondering whether you would consider providing your datasets in standard formats for ease of integration into existing frameworks, outside of your API + LMDB files. In particular, I'm looking to evaluate on the LEP dataset with code that operates on PDBbind-like structures (i.e. raw PDB file for protein, mol2/sdf for ligand). Let me know, and thank you!

Training 3D CNN on residue deletion

Thanks for your really cool work and sharing the repository - your neurips paper's results looked very interesting. I am trying to retrain the 3dcnn model on the residue deletion task using the data splits that you have provided. After some changes to the code, I am stuck at the data loader. The data files you provide look like:

data/residue_deletion/split/train_envs_0000_1000.h5
data/residue_deletion/split/train_envs_0001_1000.h5

Whereas the dataloader in benchmarking/cnn3d/train_resdel.py currently expects the format in ResDel_Dataset_PT to be the following:
data = torch.load(os.path.join(self.path, f'data_t_{idx}.pt'))

Can you please provide information on how to convert the h5 files to the format expected by the data loader? I looked at atom3d/datasets/res/convert_resdel_from_hdf5_to_npz.py but it doesn't look like the right conversion?

Any pointers to get the data loading to work would greatly help.

Thanks,
Meghana.

What does the edge feature stand for?

Hello.

I'm doing an SMP experiment, and I found that the edge feature dimension is 4.

However, I couldn't figure out what each element stands for.

Could you give me some instruction about the features of edges of molecule?

Thank you.

question on documentation

Hi, on Example section of this page https://atom3d.readthedocs.io/en/latest/using_datasets.html

When demonstrating how to Extract all atoms within 5 Angstroms of a ligand,
you have following code
`

lig_coords = fo.get_coordinates_from_df(atoms_df[atoms_df['subunit']=='LIG']) # get coords of ligand
df_filtered = distance_filter(atoms_df, lig_coords, dist=5.0)`

But I find that the data type of subunit is int64, so this line of code looks wrong to me, is there something I missing?

Thanks in advance!

Processing pdb data to get data for 3D CNN

Thanks for the nice benchmark!

I am wondering how we could prepare data for 3D CNN, is there any function that generates density for it?

Trained models available?

Hello!
Are any of the trained models (model parameters/weights) available? For instance the 3D CNN trained on the residue deletion task.

Thanks.

Loading LBA dataset.

I'd like to load the LBA dataset and get the following:

Amino acid sequences
SMILES strings for the molecules
Binding affinity values

I downloaded the LBA dataset and loaded it in python with:

# Load dataset from directory of PDB files
dataset = da.load_dataset("ligand_binding_affinity", 'pdb')

But how can I get the amino acid sequences, SMILES strings, and binding affinity from this?

Thanks for your time.

Incorrect URL to Download RES Dataset

I am trying to download the RES dataset using atom3d but the url is incorrect. Following the documentation, I run the following code:

import atom3d.datasets.datasets as da
da.download_dataset('res', 'data')

However, I get the following error message:

`--2021-08-03 15:59:02-- http://1nmsnqayokof9-76l4gyqvodsehnzlxv7/
Resolving 1nmsnqayokof9-76l4gyqvodsehnzlxv7 (1nmsnqayokof9-76l4gyqvodsehnzlxv7)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘1nmsnqayokof9-76l4gyqvodsehnzlxv7’

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now`

When I looked at the source code for the download_datasets() function, I noticed that all the other datasets have URLS in the form of https://zendo.org/record/.... However, the RES dataset tries downloading from 1nmsnqayokof9-76l4gyqvodsehnzlxv7/ which is not a proper URL. I believe this is the problem.

When you get a moment, can you please fix this and provide the correct URL?

Also, this is my first time opening an issue so if more information or context is needed, please let me know!

Error downloading LBA dataset

Hi, I download and unzip the dataset from https://zenodo.org/record/4914718#.YPFQNegzZPY. However, I found no lmdb file since I want to use 'from atom3d.datasets import LMDBDataset
dataset = LMDBDataset(PATH_TO_LMDB)
print(len(dataset)) # Print length
print(dataset[0]) # Print 1st entry
labels = [item['scores']['neglog_aff'] for item in dataset] # Get all labels'.

Can you give me guidance on how to load the LBA dataset correctly?

How to evaluate and tune each model?

Based on lba problems, for each model: cnn3d, enn, and gnn, how to tune the models if I use my own data? How can I evaluate the models and do predictions? If you can provide examples, that will be a great help. Many thanks

How to train the cnn model?

I used
python train.py
--data_dir data/split-by-sequence-identity-30/data/
--mode train
--batch_size 16
--num_epochs 50
--learning_rate 1e-4
(very similar to gnn and enn model training method)
but I got:

Traceback (most recent call last):
File "train.py", line 173, in
parser.add_argument('--data_dir', type=str, default=os.environ['LBA_DATA'])
File "/home/xzhang/miniconda3/envs/atom3d/lib/python3.6/os.py", line 669, in getitem
raise KeyError(key) from None
KeyError: 'LBA_DATA'

Question for protein sequence in LBA dataset

Hi there,
Thank you for the nice opensource datasets and useful util functions in Atom3D.
I have some questions about how to acquire the protein sequence in LBA datasets. After downloaded the dataset from Zenodo, I found the key 'seq' in each item is an empty list. I also tried the get_chain_sequences but also got an empty list. I wonder how can I get the AA sequence for the LBA datasets.

Severe contradiction over the LBA dataset at 60% identity

Hi, dear atom3d,

I used successfully the LBA dataset with a 30% identity. However, there is a serious contradiction over the dataset with a 60% identity. To be explicit, as written in the paper, the split of 60% identity leads to training, validation, and test sets of sizes 3678, 460, and 460, respectively. However, there are only 3563 in the training while 452 samples in the test sets.

Can you please take a look at the splitting setting again and see whether there was a mistake?

Best,

GNN PSR benchmark

When I trained the GNN model provided in examples/psr/gnn/model.py using the hyperparameters specified in examples/psr/gnn, I'm getting a per-target Spearman's rho of 0.503 +/- 0.013 on the validation set, about what is reported in the paper. However, I'm only getting a per-target Spearman's rho of 0.405 +/- 0.013 on the test set. For reference, I'm seeing 0.582 +/- 0.007 for the per-target Spearman on the training data after 50 epochs.

how to specify the relative position?

In PPI and LBA, how do you specify the coordinate of two 3d graph? e.g., 2 proteins, ligand and target.

Face No module named 'data_qm9_for_ptgeom' when training qm9

As the README.md said, I executed and met some problem as follows.
The whole log is below,
(env) [ pytorch_geometric]$ python train_qm9.py --target 7 --prefix qm9-u0
Traceback (most recent call last):
File "train_qm9.py", line 10, in
from data_qm9_for_ptgeom import GraphQM9
ModuleNotFoundError: No module named 'data_qm9_for_ptgeom'

I met some difficulties when I installed your library and use it

First of all, thank you so much for your great repo!

Could you provide a tutorial for installations? When I followed your instructions, I found I have to install pyrr, torch-geometric.
I couldn't find atom3d.models.ff when I run from atom3d.models.ff import FeedForward
when I load dataset, using dataset = da.dataset('data/test_lmdb', 'lmdb', transform=tr.graph-transform), I got model 'atom3d.util.transforms' has no attribute 'graph_transform
which directory should I put the downloaded dataset into?
could you add reamde.rst into /examples/lba/cnn3d like /enn and /gnn?

I think you are changing dataload, there were some conflicts. If you would like to provide some demos, that will be a great help.
many thanks

Cannot find lmdb files in LBA dataset

I cannot find any lmdb files from the LBA dataset. I downloaded the dataset from the link posted here, and it only contained pdbbind_3dcnn.h5 and some txt files. I wonder how I can access the lmdb dataset.

Identity30_splits for PDBBind

Hi, thank you for the efforts towards a common resource for structural biology benchmarks. I was currently working with the PDBBind dataset and wanted to test the model on the splits you provided. From the link for the dataset (https://www.atom3d.ai/lba.html), I was able to find the identity_60 splits, but could not find the identity_30 splits.

Could you please link me to the same ?

Hyperparameters to reproduce the GCN performance on LBA

I'd like to know what batch size and the number of epochs were used to produce the result of GCN on LBA in the paper. In train_pdbbind.py, the batch size is set to 1 and the number of epochs is 100. However, the test RMSE I got using this setting more than ten times higher than the one reported in the paper.

Can't execute create_hdf5.py

It seems we are trying to import from util instead of from atom3d.util:

Traceback (most recent call last):
  File "atom3d/datasets/lba/create_hdf5.py", line 11, in <module>
    from util import datatypes as dt
ImportError: cannot import name 'datatypes'

I changed the imports to:

import atom3d.util.formats as dt
import atom3d.util.file as fi

Thanks!
Allan

Loading PPI Dataset with PyTorch Geometric

Hello.

First off, I want to say that it is a sight for sore eyes seeing a repository as well documented as this. I am very impressed given the scope of it.

On another note, I did encounter an odd ModuleNotFoundError when trying to the train_ppi.py script locally in the provided "geometric" Conda environment:

:665: in _load_unlocked
???
../../../../../anaconda3/envs/geometric/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:170: in exec_module
exec(co, module.dict)
train_ppi.py:11: in
import ppi_dataloader as dl
ppi_dataloader.py:15: in
import atom3d.torch.graph as gr
E ModuleNotFoundError: No module named 'atom3d.torch'

It appears as though a directory (atom3d.torch) did not get added to version control and, as such, is not appearing remotely. Was this omission intentional?

Thank you for your time!

Training GNN on PDBBind

Hi,

I was really intrigued by this work when I heard the presentation at NeurIPS and wanted to try setting it up by running the pytorch-geometric GNN on PDBBind, but wasn't able to due to various issues. Could you please add some documentation on how to download the dataset, how to generate splits and how to train a model? Various functions seemed to have been moved or renamed (maybe some tests would help to ensure this doesn't happen?).

I think most of the pieces are there and this project has a lot of potential, so I'm looking forward to coming back to it.

What is the version of rdkit?

Traceback (most recent call last):
File "data_TL.py", line 4, in
from rdkit import Chem
File "/home/panfulu/anaconda3/envs/atom3d/lib/python3.6/site-packages/rdkit/Chem/init.py", line 18, in
from rdkit import DataStructs
File "/home/panfulu/anaconda3/envs/atom3d/lib/python3.6/site-packages/rdkit/DataStructs/init.py", line 13, in
from rdkit.DataStructs import cDataStructs
ImportError: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /home/panfulu/anaconda3/envs/atom3d/lib/python3.6/site-packages/rdkit/DataStructs/../../../../libRDKitDataStructs.so.1)

When I use the rdkit 2020.09, it will meet a mistake. What is the version of rdkit?

Guidance of running example

Dear Authors,

Thanks for your great works.

I tried to use this repo. But I met some problems running example script.

The first problem is how to create dataset from scratch. For example, in README file, what is "PATH_TO_INPUT_DIR"? Should I download raw data and put it into PATH_TO_INPUT_DIR? Or it is already in the repo?

Then, after I process the dataset, should I run "python data.py" & "python trian.py" in the ./example folder directly?

Thanks in advance for your time!

drorlab / atom3d Goto Github PK

atom3d's Introduction

ATOM3D: Tasks On Molecules in 3 Dimensions

Features

Installation

Usage

Downloading a dataset

Loading a dataset

LMDB datasets

Creating an LMDB dataset

Contribute

Support

License

Reference

atom3d's People

Stargazers

Watchers

Forkers

atom3d's Issues

Recommend Projects

Recommend Topics

Recommend Org