isayevlab / pka-ani Goto Github PK

View Code? Open in Web Editor NEW

39.0 8.0 10.0 39.88 MB

Accurate prediction of protein pKa with representation learning

License: Other

Python 100.00%

pka-ani's Introduction

INSTALLATION

Prior to the installation of pKa-ANI, users should make sure they have installed conda.

To install pKa-ANI, navigate to the directory of the source that you've downloaded and;

conda env create -f pkaani_env.yaml

This will create a conda environment named pkaani and install all required packages. After the environment is created;

conda activate pkaani 
python setup.py install

PREREQUISITES:

miniconda/anaconda

If pkaani_env.yaml is not used, users should make sure the following packages are installed.

python=3.8
numpy
scipy
pytorch
torchani=2.2.0
scikit-learn=1.0.2
ase
joblib
ambertools
setuptools=58.2.0

Other libraries the system may require : os,math,sys,io,csv,getopt,shutil,urllib.request,warnings

USAGE

pKa-ANI requires PDB files to have H atoms that are added with default ionization states of residues: ASP, GLU, LYS, TYR, HIE.

Due to this reason, input PDB file(s) are prepared before the calculation of pKa values (output PDB file 'PDBID_pkaani.pdb').

We would like to warn users, that our models are trained to predict pKa values for apo-proteins. Due to this, any residue that is not an aminoacid is removed from PDB file(s) during the preparation.

Example command line usages:

If PDB file doesnt exist, it is downloaded and prepared for pKa calculations.

pkaani -i 1BNZ
      
pkaani -i 1BNZ.pdb

Multiple files can be given as inputs

pkaani -i 1BNZ,1E8L

If a specific directory is wanted:

pkaani -i path_to_file/1BNZ
      
pkaani -i path_to_file/1BNZ,path_to_file/1E8L

Arguments:

-h: Help

-i: Input files. Inputs can be given with or without file extension (.pdb). 
    If PDB file is under a specific directory (or will be downloaded) the path                 
    can also be given as path_to_file/PDBFILE. Multiple PDB files can be given 
    by using "," as separator (i.e. pkaani -i 1BNZ,1E8L).

CITATION

Gokcan, H.; Isayev, O. Prediction of Protein p K a with Representation Learning. Chem. Sci. 2022, 13 (8), 2462–2474. https://doi.org/10.1039/D1SC05610G.

LICENSING

Please read LICENSE file.

pka-ani's People

Contributors

Stargazers

Watchers

Forkers

rnaimehaom byun-jinyoung freeenergylab shunsunsun zzgw sastrys1 taisho caiyingchun gaoshan2006 lpravda

pka-ani's Issues

pKa-ANI produces bogus pKa predictions for non-titratable residues because chain ID is not accounted for

Say you have a list of titratable residues [3, 4, 1, 2, 3, 1, 2] out of a list of 4 residues each (1, 2, 3, 4) for 3 separate chains. In such a case, the current code will try to compute a pKa prediction for residues 1 and 2 for chain 1, even though they are not listed. This is because the list of titratable residues will be [3, 4, 1, 3, 1, 2] and in the current code it only checks if 1 and 2 are in the titratable list, doesn't check for chain ID.

See the response of the program to the test PDB files that I have given. The first one gives a pKa prediction of a bunch of extraneous residues (simply copying the previous titratable residue's descriptors and model). The second one runs into a runtime error because it tries to find a prediction/model when none has been assigned yet (because the "if titratable" clause gets triggered by a future residue with the same number on a different chain).

1brs.pdb.txt
6oge_clean.pdb.txt

I will shortly make a pull request where I tweak the storage of titratable residues to also include chain, so now the pKa prediction will only happen if the residue is one of the titratable residues, and the check for the exact residue and chain is more precise. You can check the output log for the difference in pKa prediction list new vs. old version of the calculate_pkaani() function.

mmcif support

Hello everyone. Pkaani does not support mmcif files issued by wwPDB. With the PDB format being obsolete for quite some time now, it would be great if the package supports the wwPDB standard format. I wonder if you will be adding the mmcif read support any time soon.

Cheers!

Additional flags for pkaani to fine tune computation

Hi guys again,

I've noticed that the pkaani modifies the structure (minimization, atom addition, ligand removal), I wonder if you could add a mode to make pkaani computation more configurable and flexible. What I'm thinking of is:

default - present behaviour
interpret structure as is (no modifications to the structure)
just run structure minimisation.
perhaps any other combination that makes sense for you?

Would you consider a PR with the modified behaviour?

missing scikit-learn dependency

During the first run, the script complained about missing sklearn module:

(pka-ani) user@machine /path/pKa-ANI $ pkaani -i XXX.pdb
Loading pKa-ANI Models and ANI-2x...       
Traceback (most recent call last):            
  File "/path/miniconda3/envs/pka-ani/bin/pkaani", line 33, in <module>
    sys.exit(load_entry_point('pkaani==0.1.0', 'console_scripts', 'pkaani')())              
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/site-packages/pkaani-0.1.0-py3.10.egg/pkaani/run.py", line 131, in main
    pkadict=calculate_pka(pdbfiles,writefile=True)
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/site-packages/pkaani-0.1.0-py3.10.egg/pkaani/pkaani.py", line 30, in calculate_pka
    asp_model=joblib.load(os.path.join(os.path.dirname(__file__),'models/ASP_ani2x_FINAL_MODEL_F100.joblib'))
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode) 
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()                                                                                                                                                              
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)                    
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/pickle.py", line 1538, in load_stack_global
    self.append(self.find_class(module, name))                                                                                                                                          
  File "/path/miniconda3/envs/pka-ani/lib/python3.10/pickle.py", line 1580, in find_class
    __import__(module, level=0)
  ModuleNotFoundError: No module named 'sklearn'

After the installation, it also complains about possible sklearn version conflicts.

Loading pKa-ANI Models and ANI-2x...
/path/miniconda3/envs/pka-ani/lib/python3.10/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 1.0.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/path/miniconda3/envs/pka-ani/lib/python3.10/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator RandomForestRegressor from version 1.0.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Downloading ANI model parameters ...
Finished Loading.

Maybe it would be more convenient to provide the full .yml file with hardcoded versions.

conda pytorch command not working

Following the instructions but got this...

~> conda install -c conda-forge torchani=2.2.0 cudatoolkit=11.2
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

cudatoolkit=11.2

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

update scikit-learn dependency

Hello developers, the version of scikit-learn required for pKa-ANI to run is rather old (1.0.2). The present version is 1.4.2 with 1.5.0 on the way. I wonder if there is any plan on updating that dependency at all. If I run pKa-ANI with the latest scikit-learn I'm getting the following exception:

/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeRegressor from version 1.0.2 when using version 1.4.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Traceback (most recent call last):
  File "/Users/lpravda/mambaforge/envs/fresh/bin/pkaani", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/pkaani/run.py", line 111, in main
    pkadict=calculate_pka(pdbfiles,writefile=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/pkaani/pkaani.py", line 30, in calculate_pka
    asp_model=joblib.load(os.path.join(os.path.dirname(__file__),'models/ASP_ani2x_FINAL_MODEL_F100.joblib'))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
          ^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/joblib/numpy_pickle.py", line 402, in load_build
    Unpickler.load_build(self)
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/pickle.py", line 1718, in load_build
    setstate(state)
  File "sklearn/tree/_tree.pyx", line 865, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1571, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]
- ```

Thank you!