Code Monkey home page Code Monkey logo

openprotein's Introduction

OpenProtein

A PyTorch framework for tertiary protein structure prediction.

Alt text

Getting started

To run this project, you will need pipenv: https://pipenv-fork.readthedocs.io/en/latest/install.html

After you have installed pipenv, simply git clone this repository, install dependencies using pipenv install and then type pipenv run python __main__.py in the terminal to run the sample experiment:

$ pipenv run python __main__.py
------------------------
--- OpenProtein v0.1 ---
------------------------
Live plot deactivated, see output folder for plot.
Starting pre-processing of raw data...
Preprocessed file for testing.txt already exists.
force_pre_processing_overwrite flag set to True, overwriting old file...
Processing raw data file testing.txt
Wrote output to 81 proteins to data/preprocessed/testing.txt.hdf5
Completed pre-processing.
2018-09-27 19:27:34: Train loss: -781787.696391812
2018-09-27 19:27:35: Loss time: 1.8300042152404785 Grad time: 0.5147676467895508
...

You can view a live dashboard of the model's performance by navigating to https://biolib.com/openprotein. You can customize this dashboard by forking https://github.com/biolib/openprotein-dashboard.

Developing a Predictive Model

See models.py for examples of how to create your own model.

To run pylint on every commit, run git config core.hooksPath git-hooks.

Using a Predictive Model

See prediction.py for examples of how to use pre-trained models.

Memory Usage

OpenProtein includes a preprocessing tool (preprocessing.py) which will transform the standard ProteinNet format into a hdf5 file and save it in data/preprocessed/. This is done in a memory-efficient way (line-by-line).

The OpenProtein PyTorch data loader is memory optimized too - when reading the hdf5 file it will only load the samples needed for each minibatch into memory.

License

Please see the LICENSE file in the root directory.

openprotein's People

Contributors

jeppe-dev avatar jeppehallgren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openprotein's Issues

Padding before and after masking

Hi,

why is padding before and after masking in preprocessing.py needed? It seems to me that padding after masking would be enough.

Best
Christoph

Python 3.9.10 pipenv install errors: 461: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates

Python 3.9.10 on RHEL 9.0 is there a fix for this? I see Python lower than 3.8 is needed?

~/.local/bin/pipenv install
Installing dependencies from Pipfile.lock (66950d)...
[pipenv.exceptions.InstallError]: Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[pipenv.exceptions.InstallError]: Ignoring importlib-metadata: markers 'python_version < "3.8"' don't match your environment
[pipenv.exceptions.InstallError]: Ignoring typed-ast: markers 'implementation_name == "cpython" and python_version < "3.8"' don't match your environment
[pipenv.exceptions.InstallError]: Collecting astroid==2.4.2 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 1))
[pipenv.exceptions.InstallError]: Downloading astroid-2.4.2-py3-none-any.whl (213 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 214.0/214.0 kB 49.9 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting attrs==19.3.0 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 2))
[pipenv.exceptions.InstallError]: Downloading attrs-19.3.0-py2.py3-none-any.whl (39 kB)
[pipenv.exceptions.InstallError]: Collecting biopython==1.68 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 3))
[pipenv.exceptions.InstallError]: Downloading biopython-1.68.tar.gz (14.4 MB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.4/14.4 MB 117.3 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): started
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): finished with status 'done'
[pipenv.exceptions.InstallError]: Collecting certifi==2020.4.5.2 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 4))
[pipenv.exceptions.InstallError]: Downloading certifi-2020.4.5.2-py2.py3-none-any.whl (157 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 157.7/157.7 kB 248.8 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting chardet==3.0.4 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 5))
[pipenv.exceptions.InstallError]: Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.4/133.4 kB 249.5 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting click==7.1.2 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 6))
[pipenv.exceptions.InstallError]: Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82.8/82.8 kB 278.1 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting flask==1.1.1 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 7))
[pipenv.exceptions.InstallError]: Downloading Flask-1.1.1-py2.py3-none-any.whl (94 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.5/94.5 kB 267.0 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting flask-cors==3.0.8 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 8))
[pipenv.exceptions.InstallError]: Downloading Flask_Cors-3.0.8-py2.py3-none-any.whl (14 kB)
[pipenv.exceptions.InstallError]: Collecting future==0.18.2 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 9))
[pipenv.exceptions.InstallError]: Downloading future-0.18.2.tar.gz (829 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 829.2/829.2 kB 129.5 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): started
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): finished with status 'done'
[pipenv.exceptions.InstallError]: Collecting h5py==2.10.0 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 10))
[pipenv.exceptions.InstallError]: Downloading h5py-2.10.0.tar.gz (301 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 301.1/301.1 kB 296.3 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): started
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): finished with status 'done'
[pipenv.exceptions.InstallError]: Collecting idna==2.8 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 11))
[pipenv.exceptions.InstallError]: Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.6/58.6 kB 23.6 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting isort==4.3.21 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 13))
[pipenv.exceptions.InstallError]: Downloading isort-4.3.21-py2.py3-none-any.whl (42 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.3/42.3 kB 120.3 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting itsdangerous==1.1.0 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 14))
[pipenv.exceptions.InstallError]: Downloading itsdangerous-1.1.0-py2.py3-none-any.whl (16 kB)
[pipenv.exceptions.InstallError]: Collecting jinja2==2.11.2 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 15))
[pipenv.exceptions.InstallError]: Downloading Jinja2-2.11.2-py2.py3-none-any.whl (125 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.8/125.8 kB 301.1 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting lazy-object-proxy==1.4.3 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 16))
[pipenv.exceptions.InstallError]: Downloading lazy-object-proxy-1.4.3.tar.gz (34 kB)
[pipenv.exceptions.InstallError]: Installing build dependencies: started
[pipenv.exceptions.InstallError]: Installing build dependencies: finished with status 'done'
[pipenv.exceptions.InstallError]: Getting requirements to build wheel: started
[pipenv.exceptions.InstallError]: Getting requirements to build wheel: finished with status 'done'
[pipenv.exceptions.InstallError]: Preparing metadata (pyproject.toml): started
[pipenv.exceptions.InstallError]: Preparing metadata (pyproject.toml): finished with status 'done'
[pipenv.exceptions.InstallError]: Collecting markupsafe==1.1.1 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 17))
[pipenv.exceptions.InstallError]: Downloading MarkupSafe-1.1.1.tar.gz (19 kB)
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): started
[pipenv.exceptions.InstallError]: Preparing metadata (setup.py): finished with status 'done'
[pipenv.exceptions.InstallError]: Collecting mccabe==0.6.1 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 18))
[pipenv.exceptions.InstallError]: Downloading mccabe-0.6.1-py2.py3-none-any.whl (8.6 kB)
[pipenv.exceptions.InstallError]: Collecting more-itertools==8.3.0 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 19))
[pipenv.exceptions.InstallError]: Downloading more_itertools-8.3.0-py3-none-any.whl (44 kB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.7/44.7 kB 210.6 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Collecting numpy==1.18.5 (from -r /local/pipenv-pq6e4pl6-requirements/pipenv-rl4pj7lz-hashed-reqs.txt (line 20))
[pipenv.exceptions.InstallError]: Downloading numpy-1.18.5.zip (5.4 MB)
[pipenv.exceptions.InstallError]: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 121.2 MB/s eta 0:00:00
[pipenv.exceptions.InstallError]: Installing build dependencies: started
[pipenv.exceptions.InstallError]: Installing build dependencies: finished with status 'done'
[pipenv.exceptions.InstallError]: Getting requirements to build wheel: started
[pipenv.exceptions.InstallError]: Getting requirements to build wheel: finished with status 'done'
[pipenv.exceptions.InstallError]: Preparing metadata (pyproject.toml): started
[pipenv.exceptions.InstallError]: Preparing metadata (pyproject.toml): finished with status 'error'
[pipenv.exceptions.InstallError]: error: subprocess-exited-with-error
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: × Preparing metadata (pyproject.toml) did not run successfully.
[pipenv.exceptions.InstallError]: │ exit code: 1
[pipenv.exceptions.InstallError]: ╰─> [54 lines of output]
[pipenv.exceptions.InstallError]: Running from numpy source directory.
[pipenv.exceptions.InstallError]: :461: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
[pipenv.exceptions.InstallError]: /local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py:75: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
[pipenv.exceptions.InstallError]: required_version = LooseVersion('0.29.14')
[pipenv.exceptions.InstallError]: /local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py:77: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
[pipenv.exceptions.InstallError]: if LooseVersion(cython_version) < required_version:
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: Error compiling Cython file:
[pipenv.exceptions.InstallError]: ------------------------------------------------------------
[pipenv.exceptions.InstallError]: ...
[pipenv.exceptions.InstallError]: for i in range(1, RK_STATE_LEN):
[pipenv.exceptions.InstallError]: self.rng_state.key[i] = val[i]
[pipenv.exceptions.InstallError]: self.rng_state.pos = i
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: self._bitgen.state = &self.rng_state
[pipenv.exceptions.InstallError]: self._bitgen.next_uint64 = &mt19937_uint64
[pipenv.exceptions.InstallError]: ^
[pipenv.exceptions.InstallError]: ------------------------------------------------------------
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: _mt19937.pyx:138:35: Cannot assign type 'uint64_t (*)(void ) except? -1 nogil' to 'uint64_t ()(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.
[pipenv.exceptions.InstallError]: Processing numpy/random/_bounded_integers.pxd.in
[pipenv.exceptions.InstallError]: Processing numpy/random/_mt19937.pyx
[pipenv.exceptions.InstallError]: Traceback (most recent call last):
[pipenv.exceptions.InstallError]: File "/local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py", line 238, in
[pipenv.exceptions.InstallError]: main()
[pipenv.exceptions.InstallError]: File "/local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py", line 234, in main
[pipenv.exceptions.InstallError]: find_process_files(root_dir)
[pipenv.exceptions.InstallError]: File "/local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py", line 225, in find_process_files
[pipenv.exceptions.InstallError]: process(root_dir, fromfile, tofile, function, hash_db)
[pipenv.exceptions.InstallError]: File "/local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py", line 191, in process
[pipenv.exceptions.InstallError]: processor_function(fromfile, tofile)
[pipenv.exceptions.InstallError]: File "/local/pip-install-lw4evu5t/numpy_9a03571d21dc4a8981a9247675648988/tools/cythonize.py", line 80, in process_pyx
[pipenv.exceptions.InstallError]: subprocess.check_call(
[pipenv.exceptions.InstallError]: File "/cm/shared/apps/anaconda3-2022.10/lib/python3.9/subprocess.py", line 373, in check_call
[pipenv.exceptions.InstallError]: raise CalledProcessError(retcode, cmd)
[pipenv.exceptions.InstallError]: subprocess.CalledProcessError: Command '['/burg/home/rk3199/.local/share/virtualenvs/openprotein-PnNMcUVz/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_mt19937.c', '_mt19937.pyx']' returned non-zero exit status 1.
[pipenv.exceptions.InstallError]: Cythonizing sources
[pipenv.exceptions.InstallError]: Traceback (most recent call last):
[pipenv.exceptions.InstallError]: File "/burg/home/rk3199/.local/lib/python3.9/site-packages/pipenv/patched/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
[pipenv.exceptions.InstallError]: main()
[pipenv.exceptions.InstallError]: File "/burg/home/rk3199/.local/lib/python3.9/site-packages/pipenv/patched/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
[pipenv.exceptions.InstallError]: json_out['return_val'] = hook(**hook_input['kwargs'])
[pipenv.exceptions.InstallError]: File "/burg/home/rk3199/.local/lib/python3.9/site-packages/pipenv/patched/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
[pipenv.exceptions.InstallError]: return hook(metadata_directory, config_settings)
[pipenv.exceptions.InstallError]: File "/local/pip-build-env-7ik93v8v/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 396, in prepare_metadata_for_build_wheel
[pipenv.exceptions.InstallError]: self.run_setup()
[pipenv.exceptions.InstallError]: File "/local/pip-build-env-7ik93v8v/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 507, in run_setup
[pipenv.exceptions.InstallError]: super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
[pipenv.exceptions.InstallError]: File "/local/pip-build-env-7ik93v8v/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in run_setup
[pipenv.exceptions.InstallError]: exec(code, locals())
[pipenv.exceptions.InstallError]: File "", line 488, in
[pipenv.exceptions.InstallError]: File "", line 469, in setup_package
[pipenv.exceptions.InstallError]: File "", line 275, in generate_cython
[pipenv.exceptions.InstallError]: RuntimeError: Running cythonize failed!
[pipenv.exceptions.InstallError]: [end of output]
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: note: This error originates from a subprocess, and is likely not a problem with pip.
[pipenv.exceptions.InstallError]: error: metadata-generation-failed
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: × Encountered error while generating package metadata.
[pipenv.exceptions.InstallError]: ╰─> See above for output.
[pipenv.exceptions.InstallError]:
[pipenv.exceptions.InstallError]: note: This is an issue with the package mentioned above, not pip.
[pipenv.exceptions.InstallError]: hint: See above for details.
ERROR: Couldn't install package: {}
Package installation failed...
/cm/shared/apps/anaconda3-2022.10/lib/python3.9/subprocess.py:1052: ResourceWarning: subprocess 3699277 is still running
_warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name=4 encoding='utf-8'>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name=7 encoding='utf-8'>
ResourceWarning: Enable tracemalloc to get the object allocation traceback

soft_to_angle theory

Would it be possible to share a link to the research or reasoning behind the soft_to_angle Module for someone new to structural protein problems?

My current hunch is that you have run a mixture model on the pfam database and found the average angle conformations of the different families. You then use a LogSoftmax activation function to allow each amino acid to choose which of the omega, psi and phi angles it wants from this table of options. You then take these values and use sin, cos and arctan to convert them into angles?
Why does the mixture model have 500 clusters, how was the mixture_model table generated, and why is there a 90:10 pos/neg omega ratio that is then randomly shuffled in?

Again I am a noob so pointers to any papers or other grounded reasoning for this approach would be really appreciated.

Provide a description on how to use it with anaconda

Hi,

I have had struggles to get it running in a conda environment because it did not allow me to install from the pipfile. What I did is this:

conda create -n openprotein

source activate openprotein

conda install pip

~/anaconda3/envs/openprotein/bin/pip install pipenv

~/anaconda3/envs/openprotein/bin/pipenv install

Perhaps provide a requirements file for conda environments?

Dataset construction from .cif files

Hello,
Thank you once again for OpenProtein! Can you please provide a way to use .cif (or .pdb) files for dataset construction?
Probably .cif to proteinnet record converter? Or other way.

Create conda env file

Following #13 we should add a conda-environment.yml that lists all necessary dependencies. Once this is completed users can install these simply by running conda env create -f conda-environment.yml

Running prediction.py

Hi,

Is there an example on running prediction.py with the model under output/models?
I tried running it via the command
python prediction.py --input_sequence='MKNLISFGVKPWWAARWETVEPEPEEPVYT' --model_path="/home/openprotein/output/models/2019-01-30_00_38_46-TRAIN-LR0_01-MB1.model"
but get error that ModuleNotFoundError: No module named 'models' while executing result = unpickler.load() in the script /home/ubuntu/.local/share/virtualenvs/openprotein-_438uSzB/lib/python3.7/site-packages/torch/serialization.py

Would appreciate any pointers.

Thank you!

Hi, I want to help the project.

I think this is a really interesting project and will help me enhance my software development knowledge. Can you guys give me a heads up on where should I star. I am not the best programmer but really want help this grow.

datasetnft.org - NFTized ProteinNet CASP12 to construct datasets for openprotein

Hello,
https://datasetnft.org is a free to use web2 and web3 app/dapp to construct datasets for OpenProtein based on ProteinNet CASP12.
You can filter data there and construct/export dataset in browser (txt button at the bottom of search results on page to export constructed dataset as ready to train .txt file), you can also do it entirely web3 way and query blockchain (sequences, NFT id) and get output of each dataset item NFT (no UI, can do it via CLI or any universal UI for smart contracts with provided ABI).
Thank you

what is the role of mixture_model_pfam_500.txt ?

Hi gays,
I am reading the code of openprotein, it's particularly good! but I don't understand the content in the file named mixture_model_pfam_500.txt, can anyone answer me here? thanks!

the webpage is always blank

Thanks for your working on this project and great sharing code, there are some problems with me when I run the code.
I just git clone and run the command as:
python3 __main__.py
but the errors appeared like this:(I just copy all of them from my terminal, so maybe the format is strange):

`/usr/local/lib/python3.7/site-packages/Bio/PDB/Vector.py:42: BiopythonDeprecationWarning: The module Bio.PDB.Vector has been deprecated in favor of new module Bio.PDB.vectors to solve a name collision with the class Vector. For the class Vector, and vector functions like calc_angle, import from Bio.PDB instead.
"import from Bio.PDB instead.", BiopythonDeprecationWarning)

--- OpenProtein v0.1 ---

Starting pre-processing of raw data...
['data/raw/sample.txt']
Preprocessed file for sample.txt already exists.
Skipping pre-processing for this file...
Completed pre-processing.

  • Serving Flask app "dashboard" (lazy loading)
  • Environment: production
    WARNING: Do not use the development server in a production environment.
    Use a production WSGI server instead.
  • Debug mode: off
    Exception in thread Thread-1:
    Traceback (most recent call last):
    File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
    File "/Users/AlexYU/Desktop/openprotein/dashboard.py", line 37, in run
    app.run(debug=False, host='0.0.0.0')
    File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 943, in run
    run_simple(host, port, self, **options)
    File "/usr/local/lib/python3.7/site-packages/werkzeug/serving.py", line 990, in run_simple
    inner()
    File "/usr/local/lib/python3.7/site-packages/werkzeug/serving.py", line 943, in inner
    fd=fd,
    File "/usr/local/lib/python3.7/site-packages/werkzeug/serving.py", line 786, in make_server
    host, port, app, request_handler, passthrough_errors, ssl_context, fd=fd
    File "/usr/local/lib/python3.7/site-packages/werkzeug/serving.py", line 679, in init
    HTTPServer.init(self, server_address, handler)
    File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socketserver.py", line 452, in init
    self.server_bind()
    File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/server.py", line 137, in server_bind
    socketserver.TCPServer.server_bind(self)
    File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
    OSError: [Errno 48] Address already in use

2019-05-06 15:39:05: Embed time: 0.00033283233642578125
Traceback (most recent call last):
File "main.py", line 67, in
minimum_updates=args.minimum_updates)
File "/Users/AlexYU/Desktop/openprotein/training.py", line 38, in train_model
loss = model.compute_loss(training_minibatch)
File "/Users/AlexYU/Desktop/openprotein/openprotein.py", line 45, in compute_loss
emissions, backbone_atoms_padded, batch_sizes = self._get_network_emissions(original_aa_string)
File "/Users/AlexYU/Desktop/openprotein/models.py", line 49, in _get_network_emissions
(data, bi_lstm_batches), self.hidden_layer = self.bi_lstm(packed_input_sequences, self.hidden_layer)
ValueError: too many values to unpack (expected 2)`

the webpage opened after the errors, unfortunately its blank, just like this:

image

By the way, my environment:

  • python3.7

  • pytorch 1.1

  • node 10.15

  • macOS Mojave

I just want have test with the code on my computer, what should I do to fix the problems? Any help would be greatly appreciated.
Thanks for the great working again.

data not moved to gpu in preprocessing

Hi,
It seems that in utils.calculate_dihedral_angles_over_minibatch, packed angles need to be moved to gpu.
You can observe this issue by adding another test example and try to run the preprocessing with the example experiment_id.
You can have a look at my solution in my fork!

Error in drmsd computation

I think the drmsd computation in the function calc_avg_drmsd_over_minibatch is wrong.

https://github.com/OpenProtein/openprotein/blob/e4e2e0c8597f1f113b7074d0e6b223f8d019138e/util.py#L267

Here actual_coords_list[idx] is a tensor of size [seq_len, 9].
The 9 coordinates are the x,y,z corresponding to C', C-alpha, N
You want to convert it into a tensor of size [seq_len*3, 3]

But the current code does not convert the coordinates correctly.

>>> torch.tensor([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18]]).transpose(0,1).contiguous().view(-1,3) 
tensor([[  1,  10,   2],
        [ 11,   3,  12],
        [  4,  13,   5],
        [ 14,   6,  15],
        [  7,  16,   8],
        [ 17,   9,  18]])

You can see the coordinates are mangled.

I think the correct code should be

actual_coords = actual_coords_list[idx].view(-1, 3)

>>> torch.tensor([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18]]).view(-1,3) 
tensor([[  1,   2,   3],
        [  4,   5,   6],
        [  7,   8,   9],
        [ 10,  11,  12],
        [ 13,  14,  15],
        [ 16,  17,  18]])
                                                                                      

Keys in hdf5

Hi, nice work done here.
I wanted to ask that in after pre processing raw data to hdf5 file there were primary, mask and tertiary keys so this means the model training only looks at amino acid sequence but according to AlQuraishi's paper shouldn't the input be amino acid sequence + PSSM ?

How to use GUI dashboard

Hello there, your project is really cool and thank you.

My question is:
I am currently playing around with the training pipeline - I work with it all in Docker container, and as I see the default dashboard of OpenProtein is a quite simple flask app - it posts training logs as raw text to localhost:port/graph.
Yet, you have this repo https://github.com/biolib/openprotein-dashboard and some nice pictures of UI dashboard.
How do I launch it all together and in Docker, do I need to launch it via few containers and docker-compose or?
If you have exact instructions on how to do it, it will be really nice to have them too:)

Continue model training after halt

Hello,
When model training halted for some reason what is the way to continue model training again and not start it from scratch?
Thank you

__main__.py upon cloning does not work

Thank you so much for making this package!

I am trying to get it working by cloning the repo and calling python __main__.py however, I am running into a lot of issues with dependencies.

  1. I had to clone PeptideBuilder from github
  2. I had an error with h5py and had to install this and restart my terminal
  3. I am now having errors with:

For padding the packed sequences:
Traceback (most recent call last): File "__main__.py", line 165, in <module> train_model_path = train_model("TRAIN", training_file, validation_file, args.learning_rate, args.minibatch_size) File "__main__.py", line 97, in train_model loss = model.compute_loss(primary_sequence, tertiary_positions) File "C:\Users\fmsft\openprotein\openprotein.py", line 41, in compute_loss emissions, backbone_atoms_padded, batch_sizes = self._get_network_emissions(original_aa_string) File "C:\Users\fmsft\openprotein\models.py", line 46, in _get_network_emissions packed_input_sequences = self.embed(original_aa_string) File "C:\Users\fmsft\openprotein\openprotein.py", line 24, in embed torch.nn.utils.rnn.pack_sequence(original_aa_string)) File "D:\Anaconda2\envs\py36\lib\site-packages\torch\nn\utils\rnn.py", line 277, in pack_sequence return pack_padded_sequence(pad_sequence(sequences), [v.size(0) for v in sequences]) File "D:\Anaconda2\envs\py36\lib\site-packages\torch\nn\utils\rnn.py", line 148, in pack_padded_sequence return PackedSequence(torch._C._VariableFunctions._pack_padded_sequence(input, lengths, batch_first)) RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0

And with starting the web server:
self.run() File "C:\Users\fmsft\openprotein\dashboard.py", line 45, in run call(["/bin/bash", "start_web_app.sh"]) File "D:\Anaconda2\envs\py36\lib\subprocess.py", line 267, in call with Popen(*popenargs, **kwargs) as p: File "D:\Anaconda2\envs\py36\lib\subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "D:\Anaconda2\envs\py36\lib\subprocess.py", line 997, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

Tertiary structure preprocessing

Hi,

thanks for putting this online, it is really nice to work with!

I have a question regarding the preprocessing step of the tertiary structure that you mentioned here.
The coordinates before and after converting don't seem be the same. If I run preproccessing with some debug comments, I get the following output:

Starting pre-processing of raw data...
['openprotein/data/raw/validation']
Preprocessed file for validation already exists.
force_pre_processing_overwrite flag set to True, overwriting old file...
Processing raw data file validation
2WXZ_2_C
	Dropping protein as number of sequences too high: 35243
3U88_2_M
	tertiary_masked.shape: torch.Size([56, 9])
	tertiary_masked[0,:]: tensor([-6941.7002,  5551.7002,  2541.6001, -7005.3999,  5538.0000,  2410.3000, -6965.1001,  5409.6001,  2334.6001])
	tertiary_masked[1,:]: tensor([-6835.1001,  5380.7002,  2325.8000, -6792.2998,  5252.7002,  2270.3000, -6812.8999,  5146.2002,  2374.8999])
	tertiary_masked[2,:]: tensor([-6875.3999,  5036.0000,  2335.2000, -6907.0000,  4930.5000,  2430.2000, -6864.2998,  4792.3999,  2382.1001])

	tertiary.shape: torch.Size([56, 9])
	tertiary[0,:]: tensor([ 0.0000,  0.0000,  0.0000,  0.7661,  1.2405,  0.0000, -0.1526,  2.4556, 0.0000])
	tertiary[1,:]: tensor([-1.1318,  2.4505, -0.8981, -2.1765,  3.4677, -0.8982, -3.2035,  3.2030, 0.1952])
	tertiary[2,:]: tensor([-3.5200,  4.2382,  0.9657, -4.4314,  4.1019,  2.0955, -5.5110,  5.1761, 2.0654])

Shouldn't tertiary_reshaped and tertiary give the same coordinates? Or am I on a completely wrong path?

Best
Christoph

Preprocessing.py

I downloaded CASP12 from ProteinNet and put the datafiles in data/raw I then added these lines to the bottom of preprocessing.py:

use_gpu=False process_raw_data(use_gpu, force_pre_processing_overwrite=True)

Running it gave me a read file error: FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/raw\\sample.txt'

So I changed this line (line 25 in the process_raw_data function: filename = file_path.split('/')[-1]

to this: filename = file_path.split('\\')[-1] and it then worked.

Failed to run "python __main__.py"

Hi,
thanks for making this package available.

Just cloned and run "python main.py" and got the error message:
File "C:\Users\keasar\Work\meshiMC\openprotein_new\models.py", line 49, in _get_network_emissions
(data, bi_lstm_batches), self.hidden_layer = self.bi_lstm(packed_input_sequences, self.hidden_layer)

ValueError: too many values to unpack (expected 2)

The full output is below.

Thanks,
Chen

Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 7.5.0 -- An enhanced Interactive Python.

runfile('C:/Users/keasar/Work/meshiMC/openprotein_new/main.py', wdir='C:/Users/keasar/Work/meshiMC/openprotein_new')
C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\site-packages\Bio\PDB\Vector.py:42: BiopythonDeprecationWarning: The module Bio.PDB.Vector has been deprecated in favor of new module Bio.PDB.vectors to solve a name collision with the class Vector. For the class Vector, and vector functions like calc_angle, import from Bio.PDB instead.
"import from Bio.PDB instead.", BiopythonDeprecationWarning)

--- OpenProtein v0.1 ---

Starting pre-processing of raw data...
['data/raw\sample.txt']
Processing raw data file sample.txt

  • Serving Flask app "dashboard" (lazy loading)
  • Environment: production
    WARNING: Do not use the development server in a production environment.
    Use a production WSGI server instead.
  • Debug mode: off
    Wrote output to 1 proteins to data/preprocessed/sample.txt.hdf5
    Completed pre-processing.
    2019-05-18 17:59:06: Embed time: 0.0
    Exception in thread Thread-8:
    Traceback (most recent call last):
    File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
    File "C:\Users\keasar\Work\meshiMC\openprotein_new\dashboard.py", line 45, in run
    call(["/bin/bash", "start_web_app.sh"])
    File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\subprocess.py", line 323, in call
    with Popen(*popenargs, **kwargs) as p:
    File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 143, in init
    super(SubprocessPopen, self).init(*args, **kwargs)
    File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\subprocess.py", line 775, in init
    restore_signals, start_new_session)
    File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
    FileNotFoundError: [WinError 2] The system cannot find the file specified

Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/keasar/Work/meshiMC/openprotein_new/main.py', wdir='C:/Users/keasar/Work/meshiMC/openprotein_new')

File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "C:\Users\keasar\Work\python\Anaconda\envs\pytorchenv\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/keasar/Work/meshiMC/openprotein_new/main.py", line 67, in
minimum_updates=args.minimum_updates)

File "C:\Users\keasar\Work\meshiMC\openprotein_new\training.py", line 38, in train_model
loss = model.compute_loss(training_minibatch)

File "C:\Users\keasar\Work\meshiMC\openprotein_new\openprotein.py", line 45, in compute_loss
emissions, backbone_atoms_padded, batch_sizes = self._get_network_emissions(original_aa_string)

File "C:\Users\keasar\Work\meshiMC\openprotein_new\models.py", line 49, in _get_network_emissions
(data, bi_lstm_batches), self.hidden_layer = self.bi_lstm(packed_input_sequences, self.hidden_layer)

ValueError: too many values to unpack (expected 2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.