Code Monkey home page Code Monkey logo

alphapulldown's Introduction

AlphaPulldown

Downloads python3.10 GPL3 license

🥳 AlphaPulldown has entered the era of version 2.0 (beta version 2.0.0b2)

We have brought some exciting useful features to AlphaPulldown and updated its computing environment.

AlphaPulldown is a Python package that streamlines protein-protein interaction screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer:

  • provides a convenient command line interface to screen a bait protein against many candidates, calculate all-versus-all pairwise comparisons, test alternative homo-oligomeric states, and model various parts of a larger complex
  • separates the CPU stages (MSA and template feature generation) from GPU stages (the actual modeling)
  • allows modeling fragments of proteins without recalculation of MSAs and keeping the original full-length residue numbering in the models
  • summarizes the results in a CSV table with AlphaFold scores, pDockQ and mpDockQ, PI-score, and various physical parameters of the interface
  • provides a Jupyter notebook for an interactive analysis of PAE plots and models
  • 🆕 refactorised codes and removed redundancy
  • 🆕 added a new way of integrating experimental models into AlphaFold pipeline using custom multimeric databases
  • 🆕 integrates cross-link mass spec data with AlphaFold predictions via AlphaLink2 models

Pre-installation

Check if you have downloaded necessary parameters and databases (e.g. BFD, MGnify etc.) as instructed in AlphFold's documentation. You should have a directory like below:

alphafold_database/                             # Total: ~ 2.2 TB (download: 438 GB)
   bfd/                                   # ~ 1.7 TB (download: 271.6 GB)
       # 6 files.
   mgnify/                                # ~ 64 GB (download: 32.9 GB)
       mgy_clusters_2018_12.fa
   params/                                # ~ 3.5 GB (download: 3.5 GB)
       # 5 CASP14 models,
       # 5 pTM models,
       # 5 AlphaFold-Multimer models,
       # LICENSE,
       # = 16 files.
   pdb70/                                 # ~ 56 GB (download: 19.5 GB)
       # 9 files.
   pdb_mmcif/                             # ~ 206 GB (download: 46 GB)
       mmcif_files/
           # About 180,000 .cif files.
       obsolete.dat
   pdb_seqres/                            # ~ 0.2 GB (download: 0.2 GB)
       pdb_seqres.txt
   small_bfd/                             # ~ 17 GB (download: 9.6 GB)
       bfd-first_non_consensus_sequences.fasta
   uniclust30/                            # ~ 86 GB (download: 24.9 GB)
       uniclust30_2018_08/
           # 13 files.
   uniprot/                               # ~ 98.3 GB (download: 49 GB)
       uniprot.fasta
   uniref90/                              # ~ 58 GB (download: 29.7 GB)
       uniref90.fasta

Create Anaconda environment

Firstly, install Anaconda and create AlphaPulldown environment, gathering necessary dependencies

conda create -n AlphaPulldown -c omnia -c bioconda -c conda-forge python==3.10 openmm==8.0 pdbfixer==1.9 kalign2 cctbx-base pytest importlib_metadata hhsuite

Optionally, if you do not have it yet on your system, install HMMER from Anaconda

source activate AlphaPulldown
conda install -c bioconda hmmer

This usually works, but on some compute systems users may wish to use other versions or optimized builds of already installed HMMER and HH-suite.

Installation using pip

Activate the AlphaPulldown environment and install AlphaPulldown

source activate AlphaPulldown

python3 -m pip install alphapulldown==1.0.4
pip install jax==0.4.23 jaxlib==0.4.23+cuda11.cudnn86 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

For the latest beta version: If you'd like to use AlphaPulldown 2 beta version, feel free to run pip install alphapulldown==2.0.0b2 instead. Please be aware that the beta version is actively being updated/fixed. It might not be 100% stable.

For older versions of AlphaFold: If you haven't updated your databases according to the requirements of AlphaFold 2.3.0, you can still use AlphaPulldown with your older version of AlphaFold database. Please follow the installation instructions on the dedicated branch

How to develop

Follow the instructions at Developing guidelines


Manuals

AlphaPulldown supports four different modes of massive predictions:

  • pulldown - to screen a list of "bait" proteins against a list or lists of other proteins
  • all_vs_all - to model all pairs of a protein list
  • homo-oligomer - to test alternative oligomeric states
  • custom - to model any combination of proteins and their fragments, such as a pre-defined list of pairs or fragments of a complex

AlphaPulldown will return models of all interactions, summarize results in a score table, and will provide a Jupyter notebook for an interactive analysis, including PAE plots and 3D displays of models colored by chain and pLDDT score.

Examples

Example 1 is a case where pulldown mode is used. Manual: example_1

Example 2 is a case where custom and homo-oligomer modes are used. Manual: example_2

Example 3 is demonstrating the usage of multimeric templates for guiding AlphaFold predictions. Manual: example_3

all_vs_all mode can be viewed as a special case of the pulldown mode thus the instructions of this mode are added as Appendix in both manuals mentioned above.

Citations

If you use this package, please cite as the following:

@Article{AlphaPUlldown,
  author  = {Dingquan Yu, Grzegorz Chojnowski, Maria Rosenthal, and Jan Kosinski},
  journal = {Bioinformatics},
  title   = {AlphaPulldowna python package for proteinprotein interaction screens using AlphaFold-Multimer},
  year    = {2023},
  volume  = {39},
  issue  = {1},
  doi     = {https://doi.org/10.1093/bioinformatics/btac749}
}

alphapulldown's People

Contributors

dennissv avatar dimamolod avatar dingquanyu avatar gchojnowski avatar jkosinski avatar kashyapchhatbar avatar kttn8769 avatar maurerv avatar quantixed avatar swanss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alphapulldown's Issues

CUDA out of memory error when running multimer

Hi,

I'm running run_multimer_jobs.py in pulldown mode with one bait protein and 100 candidate proteins. The bait is 300 amino acids and candidates are each 150 amino acids. I've been getting RuntimeError: INTERNAL: Failed to load in-memory CUBIN: CUDA_ERROR_OUT_OF_MEMORY: out of memory every time I run the script after about 6-12 bait-candidate pairs. I can manually delete the completed runs and resume, but it's time-consuming to have to watch the script. Is there a memory leak somewhere that can be remedied?

I have an RTX 3090 with 24GB of GPU RAM.

Unable to initialize backend 'tpu_driver'

Hi,

I was trying to use one protein against 200 protein features in second step and got this error when I running it on one gpu. Can have have some suggestions about this error?

Thanks!

Ning

I0915 16:06:40.291960 46912496410304 run_multimer_jobs.py:158] done creating multimer AcrIF1_and_CAADJO010000003.1_2775
I0915 16:06:40.625929 46912496410304 run_multimer_jobs.py:158] done creating multimer AcrIF1_and_CAADJO010000003.1_278
I0915 16:06:40.790300 46912496410304 run_multimer_jobs.py:158] done creating multimer AcrIF1_and_CAADJO010000003.1_2851
I0915 16:06:40.859936 46912496410304 run_multimer_jobs.py:158] done creating multimer AcrIF1_and_CAADJO010000003.1_2994
I0915 16:06:41.130373 46912496410304 xla_bridge.py:264] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0915 16:06:41.459139 46912496410304 xla_bridge.py:264] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
I0915 16:06:46.201552 46912496410304 run_multimer_jobs.py:236] now running prediction on AcrIF1_and_AAQW01000001.1_2315
I0915 16:06:46.201746 46912496410304 predict_structure.py:40] Checking for /data/duann2/virus_proj/alphapd/pd_parallel/pd_step2/output/models/AcrIF1_and_AAQW01000001.1_2315/ranking_debug.json
I0915 16:06:46.202050 46912496410304 predict_structure.py:48] Running model model_1_multimer_v2_pred_0 on AcrIF1_and_AAQW01000001.1_2315
I0915 16:06:46.202407 46912496410304 model.py:166] Running predict with shape(feat) = {'aatype': (610,), 'residue_index': (610,), 'seq_length': (), 'msa': (2055, 610), 'num_alignments': (), 'template_aatype': (4, 610), 'template_all_atom_mask': (4, 610, 37), 'template_all_atom_positions': (4, 610, 37, 3), 'asym_id': (610,), 'sym_id': (610,), 'entity_id': (610,), 'deletion_matrix': (2055, 610), 'deletion_mean': (610,), 'all_atom_mask': (610, 37), 'all_atom_positions': (610, 37, 3), 'assembly_num_chains': (), 'entity_mask': (610,), 'num_templates': (), 'cluster_bias_mask': (2055,), 'bert_mask': (2055, 610), 'seq_mask': (610,), 'msa_mask': (2055, 610)}
2022-09-15 16:06:46.814915: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Running ptxas --version returned 32512
2022-09-15 16:06:46.931611: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:460] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Fatal Python error: Aborted

Thread 0x00002aaaaaaf1ec0 (most recent call first):
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 648 in backend_compile
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/profiler.py", line 206 in wrapper
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 703 in compile_or_get_cached
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 735 in from_xla_computation
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 640 in compile
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 198 in _xla_callable_uncached
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 116 in xla_primitive_callable
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/util.py", line 212 in cached
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/util.py", line 219 in wrapper
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/dispatch.py", line 97 in apply_primitive
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/core.py", line 678 in process_primitive
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/core.py", line 328 in bind_with_trace
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/core.py", line 325 in bind
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 444 in shift_right_logical
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/prng.py", line 272 in threefry_seed
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/prng.py", line 232 in seed_with_impl
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/jax/_src/random.py", line 125 in PRNGKey
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/alphafold/model/model.py", line 167 in predict
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/alphapulldown/predict_structure.py", line 58 in predict
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 249 in predict_individual_jobs
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 275 in predict_multimers
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 324 in main
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 258 in _run_main
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 312 in run
File "/data/duann2/deeplearning/conda/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 328 in
/var/spool/slurm/slurmd/job47738660/slurm_script: line 16: 40140 Aborted run_multimer_jobs.py --mode=pulldown --num_cycle=3 --num_predictions_per_model=1 --output_path=/data/duann2/virus_proj/alphapd/pd_parallel/pd_step2/output/models --data_dir=/data/duann2/virus_proj/alphapd/db_output/ --protein_lists=baits.txt,candidates.txt --monomer_objects_dir=/data/duann2/virus_proj/alphapd/pd_parallel/pd_output/

run_get_good_pae.sh crashes in homo-oligomeric mode when monomers are present

Command:

singularity exec --no-home --bind /scratch/kosinski/Giardia/interactome/Group1/homooligomers:/mnt     
/g/kosinski/kosinski/devel/AlphaPulldown/alpha-analysis.sif run_get_good_pae.sh --output_dir=/mnt --cutoff=5

Error:

Traceback (most recent call last):
  File "/app/programme_notebook/get_good_inter_pae.py", line 136, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/app/programme_notebook/get_good_inter_pae.py", line 110, in main
    iptm_ptm_score = json.load(open(os.path.join(result_subdir,"ranking_debug.json"),'rb'))['iptm+ptm'][best_model]
KeyError: 'iptm+ptm'

The special character warning is obsolete?

You write:
Please be aware that everything after > will be taken as the description of the protein and make sure do NOT include any special symbol, such as |, after >
But haven't you implemented a function that deals with that? Perhaps it should now say then:
Please be aware that everything after > will be taken as the description of the protein and any special symbol, such as | will be replaced with underscores in the resulting files.

Adding templates for local mmseqs2 mode here?

In principle, you could save msa_lines as a3m or sto after this line

) = unserialize_msa(a3m_lines, self.sequence)

and the run local hmmer run like in the original AlphaFold:
https://github.com/deepmind/alphafold/blob/5cb2f8c480aa8314c02a93c6fbfc3f48f0ce8af0/alphafold/data/pipeline.py#L179

Or just use ColabFold hhsearch using mk_template?:
https://github.com/sokrypton/ColabFold/blob/8771fa10ce233e02efe0191ea5fb83ce3e1ca5f8/colabfold/batch.py#L149
just using the full PDB70 database from AlphaFold?

Installing the project as a package in developer mode

Hi,

I'd like to test out some changes and additional arguments to the scripts in this repo. I made a fork of the repo, made some changes to the code, and then tried to install the repo as a package so I could verify that the changes worked. I tried making a very simple setup.py file and then installing it as package into my conda environment with pip install -e .. I tried testing the package by running the unaltered code, but I got an odd error (will attach for reference at the bottom of this message), which I suspect is due to an alphafold version mismatch.

What is the easiest way for me to install my forked repo in the same way as the official AlphaPulldown package? Is there a setup.py or other file that I could use to ensure that it's installed in the same way?

In case it's informative, here's the error

 File "/home/gridsan/sswanson/miniconda3/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 7, in <module>
 exec(compile(f.read(), __file__, 'exec'))
 File "/home/gridsan/sswanson/local_code_mirror/AlphaPulldown/alphapulldown/run_multimer_jobs.py", line 333, in <module>
 app.run(main)
 File "/home/gridsan/sswanson/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 308, in run
 _run_main(main, args)
 File "/home/gridsan/sswanson/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main
 sys.exit(main(argv))
 File "/home/gridsan/sswanson/local_code_mirror/AlphaPulldown/alphapulldown/run_multimer_jobs.py", line 329, in main
 predict_multimers(multimers)
 File "/home/gridsan/sswanson/local_code_mirror/AlphaPulldown/alphapulldown/run_multimer_jobs.py", line 279, in predict_multimers
 random_seed=random_seed,
 File "/home/gridsan/sswanson/local_code_mirror/AlphaPulldown/alphapulldown/run_multimer_jobs.py", line 252, in predict_individual_jobs
 seqs=multimer_object.input_seqs,
 File "/home/gridsan/sswanson/local_code_mirror/AlphaPulldown/alphapulldown/predict_structure.py", line 62, in predict
 prediction_result.update({"seqs": seqs})
AttributeError: 'tuple' object has no attribute 'update'```

data_dir really not needed?

Have you tested data_dir is not needed? Isn't data_dir used to locate hhsearch database for template search?

```max_template_date``` and ```data_dir``` are not needed here. This part of programme in AlphaPulldown is built upon ColabFold, in which maximum template date is hardcoded.

`--model_preset` flag is not used

Hi,

Is the model preset flag used when running alphafold multimer? I would like to be able to change the model weights, but as far as I can tell the argument is ignored. It seems like the multimer model is used whenever predicting the structure of a MultimericObject and otherwise the monomer_ptm model is used. Would it be possible to allow the other model configurations? I'm particularly interested in using monomer_ptm weights to predict complexes.

foss + gompic doesn't work

These two modules are not compatible in my tests:

module load HMMER/3.1b2-foss-2016b
module load HH-suite/3.3.0-gompic-2020b

Have you tested and works for you? I replaced gompic with foss and runs but with some warnings.

Move the database part to the main manual

The part starting from "Check if you have downloaded necessary parameters and databases" is the same in all examples - move it to the main manual in the Install database part that I have just added

A question about the seq_index argument in job array mode

Hi,

Is it possible to use job arrays, but assign more than one prediction per job? I have ~400 interacting pairs and I would like to be able to request an array of 10 nodes to each predict complexes for 40 of them. If this doesn't already exist and you don't mind, I could implement the feature myself and make a pull request.

Best,
Sebastian

Create msa_output_dir if does not exist in make_features

Can you create this folder:
https://github.com/henrywotton/AlphaPulldown/blob/37c2b1c2b25ded268e37f6ab11e418fd7ecbb7cf/alphapulldown/objects.py#L134
if does not exist, after this line? Otherwise the pipeline does not work properly if I set use_existing_msas=True but the folder doesn't exist yet (e.g. when re-running an array with some jobs partially crashed). With use_existing_msas but MSAs absent, AlphaFold would still run and generate the missing MSAs, but here it cannot because the parent folder does not exist.

The analysis step crashes with No module named 'jax'

[email protected]:/g/kosinski/kosinski/devel/AlphaPulldown$ singularity exec --no-home --bind /scratch/kosinski/testAlphaPulldown_models:/mnt \
>     /g/kosinski/kosinski/devel/AlphaPulldown/alpha-analysis.sif run_get_good_pae.sh --output_dir=/mnt --cutoff=5 --create_notebook=True
I0708 21:38:20.643415 139846045984576 get_good_inter_pae.py:135] now processing O43432_and_P09132
Traceback (most recent call last):
  File "/app/programme_notebook/get_good_inter_pae.py", line 171, in <module>
    app.run(main)
  File "/opt/conda/envs/programme_notebook/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/envs/programme_notebook/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/app/programme_notebook/get_good_inter_pae.py", line 141, in main
    seqs = pickle.load(open(result_path,'rb'))['seqs']
ModuleNotFoundError: No module named 'jax'

Improve loading run_alphafold.py

Replace
https://github.com/alphapulldown-devs/AlphaPulldown/blob/f077980592547320e04e7cfff9076dd5f9ae5dbd/alphapulldown/create_individual_features.py#L48

with sth like:
try:
    run_af = load_module(PATH_TO_RUN_ALPHAFOLD, "run_alphafold")
except FileNotFoundError:
    #try to find in the upper directory as in the normal AlphaFold repo
    PATH_TO_RUN_ALPHAFOLD = os.path.join(
        os.path.dirname(os.path.dirname(alphafold.__file__)), "run_alphafold.py"
    )
    run_af = load_module(PATH_TO_RUN_ALPHAFOLD, "run_alphafold")

to let hard core users like me to overwrite alphafold with the original, possibly modified repo

Do not save output in the notebook?

When I use the notebook and execute cells, the output (PAE plots, 3d renderings) are saved in the notebook automatically. If the notebook has many entries, it becomes unusable at some point. Is it possible in jupyter, dunno, by adding some directive on top of the notebook, to prevent auto-saving of output?

PAE plots crashing

It worked for a couple of jobs but for most it crashes with:

Traceback (most recent call last):
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/bin/run_multimer_jobs.py", line 329, in <module>
    app.run(main)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/bin/run_multimer_jobs.py", line 325, in main
    predict_multimers(multimers)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/bin/run_multimer_jobs.py", line 289, in predict_multimers
    random_seed=random_seed,
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/bin/run_multimer_jobs.py", line 250, in predict_individual_jobs
    create_and_save_pae_plots(multimer_object, output_path)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/alphapulldown/utils.py", line 182, in create_and_save_pae_plots
    multimer_object.input_seqs, order, output_dir, multimer_object.description
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/alphapulldown/plot_pae.py", line 35, in plot_pae
    fig, ax1 = plt.subplots(1, 1)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 451, in wrapper
    return func(*args, **kwargs)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/matplotlib/pyplot.py", line 1287, in subplots
    fig = figure(**fig_kw)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/matplotlib/pyplot.py", line 693, in figure
    **kwargs)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/matplotlib/pyplot.py", line 315, in new_figure_manager
    return _backend_mod.new_figure_manager(*args, **kwargs)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 3494, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/site-packages/matplotlib/backends/_backend_tk.py", line 885, in new_figure_manager_given_figure
    window = tk.Tk(className="matplotlib")
  File "/g/kosinski/kosinski/software/envs/TestAlphaPulldown/lib/python3.7/tkinter/__init__.py", line 2020, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display ":0"

Maybe you need to set matplotlib backend explicitly?

How to speed up the first step for model prediction?

Hi,

Can I allocate more CPUs to the first step of prediction? I can't find the argument to assign more CPUs. I have over 30,000 proteins that need to be predicted, it still needs to take days even though I have split it up into 200 sub-jobs. Can I have some suggestions?

Thanks,

Ning

Error when running SLURM array job — TypeError: all_seq_msa_features() got an unexpected keyword argument 'msa_output_dir'

Hello — when running through the first step of the example 1 md notebook inside a SLURM array, I'm running into the following issue:

Traceback (most recent call last):
  File "/scratch/user/AlphaPulldown1/bin/create_individual_features.py", line 247, in <module>
    app.run(main)
  File "/scratch/user/AlphaPulldown1/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/scratch/user/AlphaPulldown1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/scratch/user/AlphaPulldown1/bin/create_individual_features.py", line 240, in main
    create_and_save_monomer_objects(curr_monomer, pipeline, flags_dict)
  File "/scratch/user/AlphaPulldown1/bin/create_individual_features.py", line 204, in create_and_save_monomer_objects
    save_msa=FLAGS.save_msa_files,
  File "/scratch/user/AlphaPulldown1/lib/python3.7/site-packages/alphapulldown/objects.py", line 135, in make_features
    save_msa=False,use_precomuted_msa=False)
  File "/scratch/user/AlphaPulldown1/lib/python3.7/site-packages/alphapulldown/objects.py", line 116, in execute_pipeline
    use_precomuted_msa=use_precomuted_msa,
TypeError: all_seq_msa_features() got an unexpected keyword argument 'msa_output_dir'

The following command that I'm using inside my bash script is as follows:

[create_individual_features.py --fasta_paths=/scratch/user/AlphaPulldownTest/baits.fasta,\
/scratch/user/AlphaPulldownTest/sequences_shorter.fasta --data_dir=/vast/user/public/alphafold --save_msa_files=False \
--output_dir=/scratch/user/AlphaPulldownMSAOut --use_precomputed_msas=False --max_template_date=2050-01-01 --skip_existing=False \
--seq_index=$SLURM_ARRAY_TASK_ID]

I've tried a couple of configs for the settings, although have gotten no success. A little unsure what is meant by "msa_output_dir" in the error message. Many thanks beforehand!

jackhmmer fails to run

Hi, I'm trying to run pulldown for the first time and I am unable to get the example to work. It seems like an issue with alphafold's jackhmmer script but I'm not sure how to debug. Thanks!

(AlphaPulldown) kduong@glycine:~/AlphaPulldown$ python3 alphapulldown/create_individual_features.py --fasta_paths=example_data/example_1_sequences_shorter.fasta --data_dir=/home/kduong/af2_databases/ --save_msa_files=False --output_dir=/home/kduong/pulldown_input_output/output/ --use_precomputed_msas=False --max_template_date=2050-01-01
2022-08-24 15:53:43.371134: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
I0824 15:53:45.712832 140448664971072 templates.py:857] Using precomputed obsolete pdbs /home/kduong/af2_databases/pdb_mmcif/obsolete.dat.
I0824 15:53:45.718152 140448664971072 objects.py:112] You have chosen not to save msa output files
Traceback (most recent call last):
  File "alphapulldown/create_individual_features.py", line 247, in <module>
    app.run(main)
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "alphapulldown/create_individual_features.py", line 240, in main
    create_and_save_monomer_objects(curr_monomer, pipeline, flags_dict)
  File "alphapulldown/create_individual_features.py", line 204, in create_and_save_monomer_objects
    save_msa=FLAGS.save_msa_files,
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/alphapulldown/objects.py", line 118, in make_features
    input_fasta_path=fasta_file, msa_output_dir=tmpdirname
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/alphafold/data/pipeline.py", line 169, in process
    max_sto_sequences=self.uniref_max_hits)
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/alphafold/data/pipeline.py", line 94, in run_msa_tool
    result = msa_runner.query(input_fasta_path, max_sto_sequences)[0]  # pytype: disable=wrong-arg-count
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/alphafold/data/tools/jackhmmer.py", line 172, in query
    input_fasta_path, self.database_path, max_sequences)
  File "/home/kduong/miniconda3/envs/AlphaPulldown/lib/python3.7/site-packages/alphafold/data/tools/jackhmmer.py", line 133, in _query_chunk
    logging.info('Launching subprocess "%s"', ' '.join(cmd))
TypeError: sequence item 0: expected str instance, NoneType found

pip version out of sync?

In the pip version of alphapulldown Path(msa_output_dir).mkdir(parents=True, exist_ok=True) is not present in make_features ... (ba913a1).

Interpretation of output

Hi,

Thanks for your help! I was successfully running the pipeline with pulldown mode followed the example1 with one bait sequence against 30,000 candidates sequences. There are multiple pdb file as output. I am wondering if all these pdb are the structure of interaction between the bait and candidates? No pdb is only for the bait or candidate? Do I need to run alphafold again to get the structure for each candidates?

Thanks,

Ning

modeling of higher-order oligomers question

I was wondering how to use the high throughput modeling of higher-order oligomers?
is this part of the custom mode?
It reads like it would be possible to circumvent the AF-multimer size limitation by using AlphaPulldown, just not sure if I misunderstood it.
Thx!

Smaller example

I think the translation example should be smaller, to let people test more easily. What about taking top 10 based on iPTM and bottom 5 from the google sheet list, and 5 from those without any good PAE?

Give option to re-calculate features for a fragment

This is really for later and to discuss first:

Sometimes we do want to re-calculate MSA and templates for a chopped fragment, so create_features should have an option to take prot_A,start-end as input and create features that preserve numbering.

--use_precomputed_msas=False is confusing here

In:

--use_precomputed_msas=False \

--use_precomputed_msas=False is rather confusing because the script actually does use use precomputed msas here. Remove this option from the manual then.

So what is the logic with --mseqs2 now? Is this correct?
Regardless of --use_precomputed_msas option
it will check if a3m exists
and if it doesn't
it will run remote mmseqs2
if it does
it will take local a3m

So --use_precomputed_msas has no effect for this mode, should it be somehow blocked then? Or when --use_precomputed_msas=False - always run remote (requiring users to set --use_precomputed_msas=True if local a3ms are to be used?). The latter seems most logical to me.

Create requirements.txt

Hi,

nice application of AF2. For me, conda installation did not work, so I would like to try an other virtual environment. Could you please provide a requirements.txt file for installation?

Thank you very much!

Requesting feature: ability to skip MSA generation step

Hi,

I'm currently running the example and everything seems to be going smoothly. I'm interested in using this in pulldown mode, where I have a natural protein and a list of peptides which are designed or otherwise synthetic. As the peptides have no homolog there is little reason to spend time generating MSAs for each. Would it be possible to control for which proteins the HMM search is performed during feature generation? I'm not sure whether it would be easier to create dummy MSA files (which are empty) or to just modify the run_multimer_jobs.py to handle the case where a peptide has no MSA, but either would work fine for me.

Respectfully,
Sebastian

Add option to zip MSA files

Add a command-line option to zip the output MSAs in create_individual_features.py if --save_msa_files is True

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.