Code Monkey home page Code Monkey logo

molbert's People

Contributors

bai-eng avatar bfabiandev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

molbert's Issues

Dataset size and creation

Hi, first of all congrats on your article and the NeurIPS workshop.

I have a few questions:

  1. Regarding fine-tuning: do you update the pre-trained encoder or do you freeze it ?
  2. You say that any molecule with a ECFP4 similarity higher than 0.323 to 10 drugs was discarded. I assume this was done for generalisation. However what type of similarity did you use (Tanimoto, Dice etc) and why 0.323 ? Also have you performed any clustering based on similarity for the final dataset to ensure that the parsed chemical space is balanced ?

Reproducing MolBERT results on QSAR tasks

Hi MolBERT team,

First of all thank you for releasing this repository and providing the scripts to reproduce your paper,
it is deeply appreciated!

I have an issue reproducing the QSAR results from Table 3 in the paper for MolBERT and MolBERT (finetune),
as detailed below:

  1. I can exactly reproduce the Table 3 entries for RDKit and ECFC4 using scripts/run_qsar_test_molbert.py so that is reassuring
  2. The MolBERT featurizer, however, yields lower AUROCs i.e. for BACE I get 0.835 vs 0.849 from the paper and for BBBP I get versus 0.744 vs 0.750 in the paper.
  3. Similarly for MolBERT (finetune) using scripts/run_finetuning.py for BBBP I get 0.751 vs 0.762 reported in the paper

The pre-trained model I am using is the one provided in the README i.e. https://ndownloader.figshare.com/files/25611290

Could it somehow be that I am using the wrong weights, or the wrong weights were uploaded to figshare? This would effect the results in both 2. & 3. above so would make sense.

Finally, the parameters I have been using for the fine-tuning are the following:

  • freeze_level = 0 taken from the answer in #3
  • learning_rate = 3e-5 taken from the paper although I could only find the value for pre-training and not fine-tuning
  • batch_size=16

All other arguments are left to the defaults provided in the code. Should the above arguments reproduce results similar to the paper?

Thanks in advance!

Tom

Loading finetuned model from checkpoint for inference

Hello @bfabiandev,

Thanks for open-sourcing this great codebase!

I pretrained a model for a regression task on my own dataset. I know have the checkpoints which I would like to deploy. I'm wondering how could we go about deploying the model in inference mode? Do you happen to have a prediction script?

Install error

I am newb. I am installing molbert with jupyter notebook.

I downloaded code zip. After unzip, I made python code within the same folder.

I used the following code
!conda activate molbert
!pip install --user -e .

Error message is too long. I don't know how to fix.

error: subprocess-exited-with-error

pip subprocess to install build dependencies did not run successfully.
exit code: 1

[603 lines of output]
Ignoring numpy: markers 'python_version == "3.5" and platform_system != "AIX"' don't match your environment
Ignoring numpy: markers 'python_version == "3.6" and platform_system != "AIX"' don't match your environment
Ignoring numpy: markers 'python_version == "3.5" and platform_system == "AIX"' don't match your environment
Ignoring numpy: markers 'python_version == "3.6" and platform_system == "AIX"' don't match your environment
Ignoring numpy: markers 'python_version >= "3.7" and platform_system == "AIX"' don't match your environment
Collecting wheel
Using cached wheel-0.40.0-py3-none-any.whl (64 kB)
Collecting setuptools
Using cached setuptools-67.7.2-py3-none-any.whl (1.1 MB)
Collecting Cython>=0.29.2
Using cached Cython-0.29.34-py2.py3-none-any.whl (988 kB)
Collecting numpy==1.14.5
Using cached numpy-1.14.5.zip (4.9 MB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: numpy
Building wheel for numpy (setup.py): started
Building wheel for numpy (setup.py): finished with status 'error'
error: subprocess-exited-with-error

python setup.py bdist_wheel did not run successfully.
exit code: 1

[264 lines of output]
Running from numpy source directory.
C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\misc_util.py:464: SyntaxWarning: "is" with a literal. Did you mean "=="?
  return is_string(s) and ('*' in s or '?' is s)
blas_opt_info:
blas_mkl_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries mkl_rt not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

blis_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries blis not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

openblas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries openblas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
customize GnuFCompiler
Could not locate executable g77
Could not locate executable f77
customize IntelVisualFCompiler
Could not locate executable ifort
Could not locate executable ifl
customize AbsoftFCompiler
Could not locate executable f90
customize CompaqVisualFCompiler
Could not locate executable DF
customize IntelItaniumVisualFCompiler
Could not locate executable efl
customize Gnu95FCompiler
Could not locate executable gfortran
Could not locate executable f95
customize G95FCompiler
Could not locate executable g95
customize IntelEM64VisualFCompiler
customize IntelEM64TFCompiler
Could not locate executable efort
Could not locate executable efc
customize PGroupFlangCompiler
Could not locate executable flang
don't know how to compile Fortran code on platform 'nt'
  NOT AVAILABLE

atlas_3_10_blas_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_3_10_blas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_blas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  self.calc_info()
blas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries blas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  self.calc_info()
blas_src_info:
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Blas (http://www.netlib.org/blas/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [blas_src]) or by setting
    the BLAS_SRC environment variable.
  self.calc_info()
  NOT AVAILABLE

'svnversion'은(는) 내부 또는 외부 명령, 실행할 수 있는 프로그램, 또는
배치 파일이 아닙니다.
non-existing path in 'numpy\\distutils': 'site.cfg'
'svnversion'은(는) 내부 또는 외부 명령, 실행할 수 있는 프로그램, 또는
배치 파일이 아닙니다.
F2PY Version 2
lapack_opt_info:
lapack_mkl_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries mkl_rt not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

openblas_lapack_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries openblas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

openblas_clapack_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries openblas,lapack not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas,tatlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas,tatlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas,tatlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
  NOT AVAILABLE

atlas_3_10_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas,satlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas,satlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas,satlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_3_10_info'>
  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_threads_info'>
  NOT AVAILABLE

atlas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_info'>
  NOT AVAILABLE

lapack_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Lapack (http://www.netlib.org/lapack/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [lapack]) or by setting
    the LAPACK environment variable.
  self.calc_info()
lapack_src_info:
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Lapack (http://www.netlib.org/lapack/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [lapack_src]) or by setting
    the LAPACK_SRC environment variable.
  self.calc_info()
  NOT AVAILABLE

C:\Users\onsky\anaconda3\lib\site-packages\setuptools\_distutils\dist.py:265: UserWarning: Unknown distribution option: 'define_macros'
  warnings.warn(msg)
running bdist_wheel
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building py_modules sources
creating build
creating build\src.win-amd64-3.1
creating build\src.win-amd64-3.1\numpy
creating build\src.win-amd64-3.1\numpy\distutils
building library "npymath" sources
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for numpy
Running setup.py clean for numpy
error: subprocess-exited-with-error

python setup.py clean did not run successfully.
exit code: 1

[10 lines of output]
Running from numpy source directory.

`setup.py clean` is not supported, use one of the following instead:

  - `git clean -xdf` (cleans all files)
  - `git clean -Xdf` (cleans all versioned files, doesn't touch
                      files that aren't checked into the git repo)

Add `--force` to your command to use it anyway if you must (unsupported).

[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed cleaning build dir for numpy

Failed to build numpy
Installing collected packages: wheel, setuptools, numpy, Cython
Running setup.py install for numpy: started
Running setup.py install for numpy: finished with status 'error'
error: subprocess-exited-with-error

Running setup.py install for numpy did not run successfully.
exit code: 1

[269 lines of output]
Running from numpy source directory.

Note: if you need reliable uninstall behavior, then install
with pip instead of using `setup.py install`:

  - `pip install .`       (from a git repo or downloaded source
                           release)
  - `pip install numpy`   (last NumPy release on PyPi)


blas_opt_info:
blas_mkl_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries mkl_rt not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

blis_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries blis not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

openblas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries openblas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
customize GnuFCompiler
Could not locate executable g77
Could not locate executable f77
customize IntelVisualFCompiler
Could not locate executable ifort
Could not locate executable ifl
customize AbsoftFCompiler
Could not locate executable f90
customize CompaqVisualFCompiler
Could not locate executable DF
customize IntelItaniumVisualFCompiler
Could not locate executable efl
customize Gnu95FCompiler
Could not locate executable gfortran
Could not locate executable f95
customize G95FCompiler
Could not locate executable g95
customize IntelEM64VisualFCompiler
customize IntelEM64TFCompiler
Could not locate executable efort
Could not locate executable efc
customize PGroupFlangCompiler
Could not locate executable flang
don't know how to compile Fortran code on platform 'nt'
  NOT AVAILABLE

atlas_3_10_blas_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_3_10_blas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_blas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  self.calc_info()
blas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries blas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  self.calc_info()
blas_src_info:
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Blas (http://www.netlib.org/blas/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [blas_src]) or by setting
    the BLAS_SRC environment variable.
  self.calc_info()
  NOT AVAILABLE

'svnversion'은(는) 내부 또는 외부 명령, 실행할 수 있는 프로그램, 또는
배치 파일이 아닙니다.
non-existing path in 'numpy\\distutils': 'site.cfg'
'svnversion'은(는) 내부 또는 외부 명령, 실행할 수 있는 프로그램, 또는
배치 파일이 아닙니다.
F2PY Version 2
lapack_opt_info:
lapack_mkl_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries mkl_rt not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

openblas_lapack_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries openblas not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

openblas_clapack_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries openblas,lapack not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas,tatlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas,tatlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries tatlas,tatlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
  NOT AVAILABLE

atlas_3_10_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas,satlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas,satlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries satlas,satlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_3_10_info'>
  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries ptf77blas,ptcblas,atlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_threads_info'>
  NOT AVAILABLE

atlas_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\lib
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries f77blas,cblas,atlas not found in C:\Users\onsky\anaconda3\libs
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack_atlas not found in C:\Users\onsky\anaconda3\libs
<class 'numpy.distutils.system_info.atlas_info'>
  NOT AVAILABLE

lapack_info:
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
  libraries lapack not found in ['C:\\Users\\onsky\\anaconda3\\lib', 'C:\\', 'C:\\Users\\onsky\\anaconda3\\libs']
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Lapack (http://www.netlib.org/lapack/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [lapack]) or by setting
    the LAPACK environment variable.
  self.calc_info()
lapack_src_info:
  NOT AVAILABLE

C:\Users\onsky\AppData\Local\Temp\pip-install-9yrprwod\numpy_b427a95ae1794d528bdd41165161d396\numpy\distutils\system_info.py:624: UserWarning:
    Lapack (http://www.netlib.org/lapack/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [lapack_src]) or by setting
    the LAPACK_SRC environment variable.
  self.calc_info()
  NOT AVAILABLE

C:\Users\onsky\anaconda3\lib\site-packages\setuptools\_distutils\dist.py:265: UserWarning: Unknown distribution option: 'define_macros'
  warnings.warn(msg)
running install
C:\Users\onsky\anaconda3\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building py_modules sources
building library "npymath" sources
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

error: legacy-install-failure

Encountered error while trying to install package.

numpy

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

pip subprocess to install build dependencies did not run successfully.
exit code: 1

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

symbol and char in the elements.txt

Hi,

Thank you for providing the source codes on MOLBERT, which is a great work!

I have two questions on the elements.txt.

  1. Why only 'se' is denoted as AromaticSe? How about aromatic C/N/S, etc?
  2. Why only @@ is recorded for chirality? Don't we also need to record @ for counter clockwise spiral, which is a common symbol is SMILES strings...

Many thanks in advance!! =)

molbertfeaturizer with a finetuned model

Hi,
I tried to fine tune the model100 with a dataset with the following code:

python molbert/apps/finetune.py \
    --train_file train.csv \
    --valid_file valid.csv \
    --test_file test.csv \
    --mode regression \
    --output_size 1 \
    --pretrained_model_path molbert_100epochs/checkpoints/last.ckpt \
    --label_column mylabel \
    --default_root_dir output/ \
    --num_workers 4 &> out.txt

When I try to use the finetune model, I have this error:

from molbert.utils.featurizer.molbert_featurizer import MolBertFeaturizer

mycheckpoint='MolBERT/output/lightning_logs/version_0/checkpoints/last.ckpt'
molbert = MolBertFeaturizer(mycheckpoint)

--------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/cecilepereira/opt/anaconda3/envs/molbert/lib/python3.7/site-packages/pytorch_lightning/utilities/parsing.py in __getattr__(self, key)
    113         try:
--> 114             return self[key]
    115         except KeyError:

KeyError: 'named_descriptor_set'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-64-046163d76401> in <module>()
      1 from molbert.utils.featurizer.molbert_featurizer import MolBertFeaturizer
----> 2 molbert = MolBertFeaturizer(mycheckpoint)

molbert/utils/featurizer/molbert_featurizer.py in __init__(self, checkpoint_path, device, embedding_type, max_seq_len, permute)
     63         # load model
     64         self.config = Namespace(**config_dict)
---> 65         self.model = SmilesMolbertModel(self.config)
     66         self.model.load_from_checkpoint(self.checkpoint_path, hparam_overrides=self.model.__dict__)
     67 

molbert/models/base.py in __init__(self, args)
    125         self._datasets = None
    126 
--> 127         self.config = self.get_config()
    128         self.tasks = self.get_tasks(self.config)
    129         if len(self.tasks) == 0:

molbert/models/smiles.py in get_config(self)
     38                 max_position_embeddings=self.hparams.max_position_embeddings,
     39                 num_physchem_properties=self.hparams.num_physchem_properties,
---> 40                 named_descriptor_set=self.hparams.named_descriptor_set,
     41                 is_same_smiles=self.hparams.is_same_smiles,
     42             )

/Users/username/opt/anaconda3/envs/molbert/lib/python3.7/site-packages/pytorch_lightning/utilities/parsing.py in __getattr__(self, key)
    114             return self[key]
    115         except KeyError:
--> 116             raise AttributeError(f'Missing attribute "{key}"')
    117 
    118     def __setattr__(self, key, val):

AttributeError: Missing attribute "named_descriptor_set"

Could you help?

pip install does not include utils/data directory

Description

The project does not include a MANIFEST.in file, which causes pip install . to not include the utils/data/elements.txt in the install directory.
This causes an FileNotFoundError: [Errno 2] File /opt/conda/envs/api/lib/python3.7/site-packages/molbert/utils/featurizer/../data/elements.txt does not exist: '/opt/conda/envs/api/lib/python3.7/site-packages/molbert/utils/featurizer/../data/elements.txt' error when using the MolBertFeaturizer.

Solution

Add a MANIFEST.in file with the following entry:

recursive-include molbert/utils/data *

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.