Code Monkey home page Code Monkey logo

alphaflow's Introduction

AlphaFlow

AlphaFlow is a modified version of AlphaFold, fine-tuned with a flow matching objective, designed for generative modeling of protein conformational ensembles. In particular, AlphaFlow aims to model:

  • Experimental ensembles, i.e, potential conformational states as they would be deposited in the PDB
  • Molecular dynamics ensembles at physiological temperatures

We also provide a similarly fine-tuned version of ESMFold called ESMFlow. Technical details and thorough benchmarking results can be found in our paper, AlphaFold Meets Flow Matching for Generating Protein Ensembles, by Bowen Jing, Bonnie Berger, Tommi Jaakkola. This repository contains all code, instructions and model weights necessary to run the method. If you have any questions, feel free to open an issue or reach out at [email protected].

June 2024 update: We have trained a 12-layer version of AlphaFlow-MD+Templates (base and distilled) which runs 2.5x times faster than the 48-layer version at a small loss in performance. We recommend considering this model if reference structures (PDB or AlphaFold) are available and runtime is of high priority.

Table of Contents

  1. Installation
  2. Model weights
  3. Running inference
  4. Evaluation scripts
  5. Training
  6. Ensembles
  7. License
  8. Citation

Installation

In an environment with Python 3.9 (for example, conda create -n alphaflow python=3.9), run:

pip install numpy==1.21.2 pandas==1.5.3
pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install biopython==1.79 dm-tree==0.1.6 modelcif==0.7 ml-collections==0.1.0 scipy==1.7.1 absl-py einops
pip install pytorch_lightning==2.0.4 fair-esm mdtraj==1.9.9 wandb
pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@103d037'

The OpenFold installation requires CUDA 11. If the system has the wrong version, you can install CUDA 11 in the Conda environment:

conda install nvidia/label/cuda-11.8.0::cuda
conda install nvidia/label/cuda-11.8.0::cuda-cudart-dev
conda install nvidia/label/cuda-11.8.0::libcusparse-dev
conda install nvidia/label/cuda-11.8.0::libcusolver-dev
conda install nvidia/label/cuda-11.8.0::libcublas-dev
ln -s $CONDA_PREFIX/lib/libcudart_static.a $CONDA_PREFIX/lib/libcudart.a

Then install OpenFold:

CUDA_HOME=$CONDA_PREFIX pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@103d037'

Model weights

We provide several versions of AlphaFlow (and similarly named versions of ESMFlow).

  • AlphaFlow-PDB—trained on PDB structures to model experimental ensembles from X-ray crystallography or cryo-EM under different conditions
  • AlphaFlow-MD—trained on all-atom, explicit solvent MD trajectories at 300K
  • AlphaFlow-MD+Templates—trained to additionally take a PDB structure as input, and models the corresponding MD ensemble at 300K

For all models, the distilled version runs significantly faster at the cost of some loss of accuracy (benchmarked in the paper).

For AlphaFlow-MD+Templates, the 12l versions have 12 instead of 48 Evoformer layers and run 2.5x times faster at a small loss in performance.

AlphaFlow models

Model Version Weights
AlphaFlow-PDB base https://alphaflow.s3.amazonaws.com/params/alphaflow_pdb_base_202402.pt
AlphaFlow-PDB distilled https://alphaflow.s3.amazonaws.com/params/alphaflow_pdb_distilled_202402.pt
AlphaFlow-MD base https://alphaflow.s3.amazonaws.com/params/alphaflow_md_base_202402.pt
AlphaFlow-MD distilled https://alphaflow.s3.amazonaws.com/params/alphaflow_md_distilled_202402.pt
AlphaFlow-MD+Templates base https://alphaflow.s3.amazonaws.com/params/alphaflow_md_templates_base_202402.pt
AlphaFlow-MD+Templates distilled https://alphaflow.s3.amazonaws.com/params/alphaflow_md_templates_distilled_202402.pt
AlphaFlow-MD+Templates 12l-base https://alphaflow.s3.amazonaws.com/params/alphaflow_12l_md_templates_base_202406.pt
AlphaFlow-MD+Templates 12l-distilled https://alphaflow.s3.amazonaws.com/params/alphaflow_12l_md_templates_distilled_202406.pt

ESMFlow models

Model Version Weights
ESMFlow-PDB base https://alphaflow.s3.amazonaws.com/params/esmflow_pdb_base_202402.pt
ESMFlow-PDB distilled https://alphaflow.s3.amazonaws.com/params/esmflow_pdb_distilled_202402.pt
ESMFlow-MD base https://alphaflow.s3.amazonaws.com/params/esmflow_md_base_202402.pt
ESMFlow-MD distilled https://alphaflow.s3.amazonaws.com/params/esmflow_md_distilled_202402.pt
ESMFlow-MD+Templates base https://alphaflow.s3.amazonaws.com/params/esmflow_md_templates_base_202402.pt
ESMFlow-MD+Templates distilled https://alphaflow.s3.amazonaws.com/params/esmflow_md_templates_distilled_202402.pt

Training checkpoints (from which fine-tuning can be resumed) are available upon request; please reach out if you'd like to collaborate!

Running inference

Preparing input files

  1. Prepare a input CSV with an name and seqres entry for each row. See splits/atlas_test.csv for examples.
  2. If running an AlphaFlow model, prepare an MSA directory and place the alignments in .a3m format at the following paths: {alignment_dir}/{name}/a3m/{name}.a3m. If you don't have the MSAs, there are two ways to generate them:
    1. Query the ColabFold server with python -m scripts.mmseqs_query --split [PATH] --outdir [DIR].
    2. Download UniRef30 and ColabDB according to https://github.com/sokrypton/ColabFold/blob/main/setup_databases.sh and run python -m scripts.mmseqs_search_helper --split [PATH] --db_dir [DIR] --outdir [DIR].
  3. If running an MD+Templates model, place the template PDB files into a templates directory with filenames matching the names in the input CSV. The PDB files should include only a single chain with no residue gaps.

Running the model

The basic command for running inference with AlphaFlow is:

python predict.py --mode alphafold --input_csv [PATH] --msa_dir [DIR] --weights [PATH] --samples [N] --outpdb [DIR]

If running the PDB model, we recommend appending --self_cond --resample for improved performance.

The basic command for running inference with ESMFlow is

python predict.py --mode esmfold --input_csv [PATH] --weights [PATH] --samples [N] --outpdb [DIR]

Additional command line arguments for either model:

  • Use the --pdb_id argument to select (one or more) rows in the CSV. If no argument is specified, inference is run on all rows.
  • If running the MD model with templates, append --templates_dir [DIR].
  • If running any distilled model, append the arguments --noisy_first --no_diffusion.
  • To truncate the inference process for increased precision and reduced diversity, append (for example) --tmax 0.2 --steps 2. The default inference settings correspond to --tmax 1.0 --steps 10. See Appendix B.1 in the paper for more details.

Evaluation scripts

Our ensemble evaluations may be reproduced via the following steps:

  1. Download the ATLAS dataset by runnig from bash scripts/download_atlas.sh from the desired root directory
  2. Prepare the ensemble directory with a PDB file for each ATLAS target, each with 250 structures (see zipped AlphaFlow ensembles below for examples). Some results are not directly comparable for evaluations with a different number of structures.
  3. Run python -m scripts.analyze_ensembles --atlas_dir [DIR] --pdb_dir [DIR] --num_workers [N]. This will produce an analysis file named out.pkl in the pdb_dir.
  4. Run python -m scripts.print_analysis [PATH] [PATH] ... with an arbitrary number of paths to out.pkl files. A formatted comparison table will be printed.

Training

Downloading datasets

To download and preprocess the PDB,

  1. Run aws s3 sync --no-sign-request s3://pdbsnapshots/20230102/pub/pdb/data/structures/divided/mmCIF pdb_mmcif from the desired directory.
  2. Run find pdb_mmcif -name '*.gz' | xargs gunzip to extract the MMCIF files.
  3. From the repository root, run python -m scripts.unpack_mmcif --mmcif_dir [DIR] --outdir [DIR] --num_workers [N]. This will preprocess all chains into NPZ files and create a pdb_mmcif.csv index.
  4. Download OpenProteinSet with aws s3 sync --no-sign-request s3://openfold/ openfold from the desired directory.
  5. Run python -m scripts.add_msa_info --openfold_dir [DIR] to produce a pdb_mmcif_msa.csv index with OpenProteinSet MSA lookup.
  6. Run python -m scripts.cluster_chains to produce a pdb_clusters file at 40% sequence similarity (Mmseqs installation required).
  7. Create MSAs for the PDB validation split (splits/cameo2022.csv) according to the instructions in the previous section.

To download and preprocess the ATLAS MD trajectory dataset,

  1. Run bash scripts/download_atlas.sh from the desired directory.
  2. From the repository root, run python -m scripts.prep_atlas --atlas_dir [DIR] --outdir [DIR] --num_workers [N]. This will preprocess the ATLAS trajectories into NPZ files.
  3. Create MSAs for all entries in splits/atlas.csv according to the instructions in the previous section.

Running training

Before running training, download the pretrained AlphaFold and ESMFold weights into the repository root via

wget https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
tar -xvf alphafold_params_2022-12-06.tar params_model_1.npz
wget https://dl.fbaipublicfiles.com/fair-esm/models/esmfold_3B_v1.pt

The basic command for training AlphaFlow is

python train.py --lr 5e-4 --noise_prob 0.8 --accumulate_grad 8 --train_epoch_len 80000 --train_cutoff 2018-05-01 --filter_chains \
    --train_data_dir [DIR] \
    --train_msa_dir [DIR] \
    --mmcif_dir [DIR] \
    --val_msa_dir [DIR] \
    --run_name [NAME] [--wandb]

where the PDB NPZ directory, the OpenProteinSet directory, the PDB mmCIF directory, and the validation MSA directory are specified. This training run produces the AlphaFlow-PDB base version. All other models are built off this checkpoint.

To continue training on ATLAS, run

python train.py --normal_validate --sample_train_confs --sample_val_confs --num_val_confs 100 --pdb_chains splits/atlas_train.csv --val_csv splits/atlas_val.csv --self_cond_prob 0.0 --noise_prob 0.9 --val_freq 10 --ckpt_freq 10 \
    --train_data_dir [DIR] \
    --train_msa_dir [DIR] \
    --ckpt [PATH] \
    --run_name [NAME] [--wandb]

where the ATLAS MSA and NPZ directories and AlphaFlow-PDB checkpoints are specified.

To instead train on ATLAS with templates, run with the additional arguments --first_as_template --extra_input --lr 1e-4 --restore_weights_only --extra_input_prob 1.0.

Distillation: to distill a model, append --distillation and supply the --ckpt [PATH] of the model to be distilled. For PDB training, we remove --accumulate_grad 8 and recommend distilling with a shorter --train_epoch_len 16000. Note that --self_cond_prob and --noise_prob will be ignored and can be omitted.

ESMFlow: run the same commands with --mode esmfold and --train_cutoff 2020-05-01.

Ensembles

We provide the ensembles sampled from the model which were used for the analyses and results reported in the paper.

AlphaFlow ensembles

Model Version Samples
AlphaFlow-PDB base https://alphaflow.s3.amazonaws.com/samples/alphaflow_pdb_base_202402.zip
AlphaFlow-PDB distilled https://alphaflow.s3.amazonaws.com/samples/alphaflow_pdb_distilled_202402.zip
AlphaFlow-MD base https://alphaflow.s3.amazonaws.com/samples/alphaflow_md_base_202402.zip
AlphaFlow-MD distilled https://alphaflow.s3.amazonaws.com/samples/alphaflow_md_distilled_202402.zip
AlphaFlow-MD+Templates base https://alphaflow.s3.amazonaws.com/samples/alphaflow_md_templates_base_202402.zip
AlphaFlow-MD+Templates distilled https://alphaflow.s3.amazonaws.com/samples/alphaflow_md_templates_distilled_202402.zip
AlphaFlow-MD+Templates 12l-base https://alphaflow.s3.amazonaws.com/samples/alphaflow_12l_md_templates_base_202406.zip
AlphaFlow-MD+Templates 12l-distilled https://alphaflow.s3.amazonaws.com/samples/alphaflow_12l_md_templates_distilled_202406.zip

ESMFlow ensembles

Model Version Samples
ESMFlow-PDB base https://alphaflow.s3.amazonaws.com/samples/esmflow_pdb_base_202402.zip
ESMFlow-PDB distilled https://alphaflow.s3.amazonaws.com/samples/esmflow_pdb_distilled_202402.zip
ESMFlow-MD base https://alphaflow.s3.amazonaws.com/samples/esmflow_md_base_202402.zip
ESMFlow-MD distilled https://alphaflow.s3.amazonaws.com/samples/esmflow_md_distilled_202402.zip
ESMFlow-MD+Templates base https://alphaflow.s3.amazonaws.com/samples/esmflow_md_templates_base_202402.zip
ESMFlow-MD+Templates distilled https://alphaflow.s3.amazonaws.com/samples/esmflow_md_templates_distilled_202402.zip

License

MIT. Other licenses may apply to third-party source code noted in file headers.

Citation

@inproceedings{jing2024alphafold,
  title={AlphaFold Meets Flow Matching for Generating Protein Ensembles},
  author={Jing, Bowen and Berger, Bonnie and Jaakkola, Tommi},
  year={2024},
  booktitle={Forty-first International Conference on Machine Learning}
}

alphaflow's People

Contributors

bjing2016 avatar eltociear avatar eunos-1128 avatar jyaacoub avatar maxbates avatar y1zhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

alphaflow's Issues

Alphaflow not working.

I installed alphaflow as described on the github page. I used a .csv file whose format looks exactly as the atlas example. I get the following error:
python predict.py --mode esmfold --input_csv /data/AI_tools/alphaflow/test/protein.csv --weights /data/AI_tools/alphaflow/weights/esmflow_md_base_202402.pt --samples 10 --outpdb /data/AI_tools/alphaflow/test/output
2024-08-29 12:27:56,173 [ip-10-10-32-206:1533012] [INFO] Loading the model
2024-08-29 12:29:34,063 [ip-10-10-32-206:1533012] [INFO] Model has been loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [03:16<00:00, 19.68s/it]

Traceback (most recent call last):
File "/data/AI_tools/alphaflow/predict.py", line 178, in
main()
File "/data/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data/AI_tools/alphaflow/predict.py", line 153, in main
chains = pd.concat(chains)
File "/data/anaconda3/envs/alphaflow/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/data/anaconda3/envs/alphaflow/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
File "/data/anaconda3/envs/alphaflow/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 425, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

These are the packages I have in the environment:

packages in environment at /data/anaconda3/envs/alphaflow:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 2.1.0 pyhd8ed1ab_0 conda-forge
aiohappyeyeballs 2.4.0 pypi_0 pypi
aiohttp 3.10.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
astor 0.8.1 pyh9f0ad1d_0 conda-forge
async-timeout 4.0.3 pypi_0 pypi
attrs 24.2.0 pypi_0 pypi
biopandas 0.4.0 pypi_0 pypi
biopython 1.79 py39hb9d737c_3 conda-forge
blosc 1.21.5 hc2324a3_1 conda-forge
brotli-python 1.1.0 py39h3d6467e_1 conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.33.1 heb4867d_0 conda-forge
ca-certificates 2024.7.4 hbcca054_0 conda-forge
certifi 2024.7.4 pyhd8ed1ab_0 conda-forge
cffi 1.17.0 py39h49a4b6b_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
click 8.1.7 unix_pyh707e725_0 conda-forge
contextlib2 21.6.0 pypi_0 pypi
cuda 11.3.0 h3b286be_0 nvidia
cuda-command-line-tools 11.3.0 h3b286be_0 nvidia
cuda-compiler 11.3.0 h3b286be_0 nvidia
cuda-cudart 11.3.58 hc1aae59_0 nvidia
cuda-cuobjdump 11.3.58 hc78e225_0 nvidia
cuda-cupti 11.3.58 h9a3dd33_0 nvidia
cuda-cuxxfilt 11.3.58 he670d9e_0 nvidia
cuda-gdb 11.3.58 h531059a_0 nvidia
cuda-libraries 11.3.0 h3b286be_0 nvidia
cuda-libraries-dev 11.3.0 h3b286be_0 nvidia
cuda-memcheck 11.3.58 h8711ecb_0 nvidia
cuda-nvcc 11.3.58 h2467b9f_0 nvidia
cuda-nvdisasm 11.3.58 hd2ea46e_0 nvidia
cuda-nvml-dev 12.6.37 2 nvidia
cuda-nvprof 11.3.58 h860cd9e_0 nvidia
cuda-nvprune 11.3.58 hb917323_0 nvidia
cuda-nvrtc 11.3.58 he300756_0 nvidia
cuda-nvtx 11.3.58 h3fa534a_0 nvidia
cuda-nvvp 11.3.58 hd16380c_0 nvidia
cuda-runtime 11.3.0 h3b286be_0 nvidia
cuda-samples 11.6.101 h8efea70_0 nvidia
cuda-sanitizer-api 11.3.58 h58da6c8_0 nvidia
cuda-thrust 11.3.58 h7b74f08_0 nvidia
cuda-toolkit 11.3.0 h3b286be_0 nvidia
cuda-tools 11.3.0 h3b286be_0 nvidia
cuda-version 12.6 3 nvidia
cuda-visual-tools 11.3.0 h3b286be_0 nvidia
dm-tree 0.1.6 py39h1832856_1 conda-forge
docker-pycreds 0.4.0 py_0 conda-forge
einops 0.8.0 pyhd8ed1ab_0 conda-forge
fair-esm 2.0.0 pypi_0 pypi
filelock 3.15.4 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
gitdb 4.0.11 pyhd8ed1ab_0 conda-forge
gitpython 3.1.43 pyhd8ed1ab_0 conda-forge
h2 4.1.0 pyhd8ed1ab_0 conda-forge
hdf5 1.14.0 nompi_hb72d44e_103 conda-forge
hpack 4.0.0 pyh9f0ad1d_0 conda-forge
hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge
idna 3.8 pyhd8ed1ab_0 conda-forge
ihm 1.3 py39hcd6043d_0 conda-forge
jinja2 3.1.4 pypi_0 pypi
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
ld_impl_linux-64 2.40 hf3520f5_7 conda-forge
libabseil 20240116.2 cxx17_he02047a_1 conda-forge
libaec 1.1.3 h59595ed_0 conda-forge
libblas 3.9.0 20_linux64_openblas conda-forge
libcblas 3.9.0 20_linux64_openblas conda-forge
libcublas 11.4.2.10064 h8a72295_0 nvidia
libcufft 10.4.2.58 h58ccd86_0 nvidia
libcurand 10.2.4.58 h99380db_0 nvidia
libcurl 8.8.0 hca28451_1 conda-forge
libcusolver 11.1.1.58 hec68242_0 nvidia
libcusparse 11.5.0.58 hf5aa513_0 nvidia
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc 14.1.0 h77fa898_1 conda-forge
libgcc-ng 14.1.0 h69a702a_1 conda-forge
libgfortran 14.1.0 h69a702a_1 conda-forge
libgfortran-ng 14.1.0 h69a702a_1 conda-forge
libgfortran5 14.1.0 hc5f4f2c_1 conda-forge
libgomp 14.1.0 h77fa898_1 conda-forge
liblapack 3.9.0 20_linux64_openblas conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnpp 11.3.3.44 h8df316f_0 nvidia
libnsl 2.0.1 hd590300_0 conda-forge
libnvjpeg 11.4.1.58 h3d06750_0 nvidia
libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge
libprotobuf 4.25.3 h08a7969_0 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx 14.1.0 hc0a3c3a_1 conda-forge
libstdcxx-ng 14.1.0 h4852527_1 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
lightning-utilities 0.11.6 pypi_0 pypi
lz4-c 1.9.4 hcb278e6_0 conda-forge
lzo 2.10 hd590300_1001 conda-forge
markupsafe 2.1.5 pypi_0 pypi
mdtraj 1.9.4 py39h1dbbcdb_2 conda-forge
ml-collections 0.1.0 pypi_0 pypi
modelcif 0.7 pyhd8ed1ab_0 conda-forge
mpmath 1.3.0 pypi_0 pypi
msgpack-python 1.0.8 py39h95fdab5_0 conda-forge
multidict 6.0.5 pypi_0 pypi
ncurses 6.5 he02047a_1 conda-forge
networkx 3.2.1 pypi_0 pypi
nomkl 1.0 h5ca1d4c_0 conda-forge
numexpr 2.10.0 py39he85e4be_100 conda-forge
numpy 1.21.2 py39hdbf815f_0 conda-forge
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.20.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.6.20 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
openfold 1.0.1 pypi_0 pypi
openssl 3.3.1 hb9d3cd8_3 conda-forge
packaging 24.1 pyhd8ed1ab_0 conda-forge
pandas 1.5.3 py39h2ad29b5_1 conda-forge
pip 24.2 pyhd8ed1ab_0 conda-forge
platformdirs 4.2.2 pyhd8ed1ab_0 conda-forge
protobuf 4.25.3 py39h1be52a0_0 conda-forge
psutil 6.0.0 py39hd3abc70_0 conda-forge
pycparser 2.22 pyhd8ed1ab_0 conda-forge
pyparsing 3.1.4 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
pytables 3.7.0 py39hf8baa48_4 conda-forge
python 3.9.19 h0755675_0_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python_abi 3.9 5_cp39 conda-forge
pytorch-lightning 2.0.4 pypi_0 pypi
pytz 2024.1 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.2 py39hcd6043d_0 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.32.3 pyhd8ed1ab_0 conda-forge
scipy 1.7.1 py39hee8e79c_0 conda-forge
sentry-sdk 2.13.0 pyhd8ed1ab_0 conda-forge
setproctitle 1.3.3 py39hd1e30aa_0 conda-forge
setuptools 72.2.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
smmap 5.0.0 pyhd8ed1ab_0 conda-forge
snappy 1.2.1 ha2e4443_0 conda-forge
sympy 1.13.2 pypi_0 pypi
tk 8.6.13 noxft_h4845f30_101 conda-forge
torch 1.12.1+cu113 pypi_0 pypi
torchmetrics 1.4.1 pypi_0 pypi
tqdm 4.66.5 pypi_0 pypi
triton 3.0.0 pypi_0 pypi
typing_extensions 4.12.2 pyha770c72_0 conda-forge
tzdata 2024a h8827d51_1 conda-forge
urllib3 2.2.2 pyhd8ed1ab_1 conda-forge
wandb 0.16.6 pyhd8ed1ab_1 conda-forge
wheel 0.44.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.9.4 pypi_0 pypi
zlib 1.2.13 h4ab18f5_6 conda-forge
zstandard 0.23.0 py39h623c9ba_0 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge

Request to update Dockerfile

Hi! I tried running using the dockerfile but there are incompatibilities with the openfold installation. Can I request if there is an updated verion of the Dockerfile? Thank you so much!

CA filtering in the evaluation code

Hi @bjing2016,

Thanks for open-sourcing this awesome work!

I am running through the evaluation code scripts/analyze_ensembles.py but found this line might not correctly filter CA atoms for the sample ensemble aftraj.

out['ca_mask'] = ca_mask = [a.index for a in traj_aa.top.atoms if a.name == 'CA']

# For example: name = 6y2x_A. aftraj is the ensemble generated by AlphaFlow-MD.
>>> [a for a in traj.top.atoms]
[GLY1-CA,
 SER2-CA,
 GLU3-CA,
 PRO4-CA,
 GLU5-CA,
 PRO6-CA,
 GLU7-CA,
...]

>>> [a for a in aftraj.top.atoms]
[GLY1-CA,
 SER2-CA,
 GLU3-CA,
 PRO4-C,
 GLU5-CA,
 PRO6-C,
 GLU7-CA,
...]

It is likely due to different atom ordering in the topology files for certain amino acids. Would need to filter them separately to ensure getting the CA-only ensemble:

af_ca_mask = [a.index for a in aftraj_aa.top.atoms if a.name == 'CA']
aftraj_fixed = aftraj_aa.atom_slice(af_ca_mask, False)

Diffusion scheduling code making abnormal protein output

I believe your code has some discrepancies when compared to the pseudocode in your article.


Algorithm 1 TRAINING
Input: Training examples of structures, sequences, and
MSAs {(Si,Ai,Mi)}
for all (Si,Ai,Mi) do
Extract x1 ← BetaCarbons(Si)
Sample x0 ∼ HarmonicPrior(length(Ai))
Align x0 ← RMSDAlign(x0, x1)
Sample t ∼ Uniform[0, 1]
Interpolate xt ← t · x1 + (1 − t) · x0
Predict ˆ Si ← AlphaFold(Ai,Mi, xt, t)
Optimize loss L = FAPE2( ˆ Si, Si)

Does this pseudocode correspond to your code in wrapper.py ModelWrapper.distillation_training_step?


for t, s in zip(schedule[:-1], schedule[1:]):
output = self.teacher(batch, prev_outputs=prev_outputs)
pseudo_beta = pseudo_beta_fn(batch['aatype'], output['final_atom_positions'], None)
noisy = rmsdalign(pseudo_beta, noisy)
noisy = (s / t) * noisy + (1 - s / t) * pseudo_beta

This holds the same in ModelWrapper.inference.

The atoms in the PDB output seems to be clustered together very densely, which makes it an abnormal protein structure.

image

Is biopandas needed?

When I try to run python predict.py --mode esmfold --input_csv /data/AI_tools/alphaflow/test/GPRC_B.csv --weights /data/AI_tools/alphaflow/weights/esmflow_md_base_202402.pt --samples 10 --outpdb /data/AI_tools/alphaflow/test/output.
I get an error saying biopandas is not found. However, I see that the docker file and the conda environment do not specify that biopandas needs to be installed.

Generating Abnormal PDB results

Thanks for open source this amazing work AlphaFlow.
After I deployed this project to my computer, I found that the generated PDB results seems abnormal, whether using AlphaFLow or ESMFlow.
Has anyone else met this problem?
P41440_3
Q15758_1

The size of tensor a (184) must match the size of tensor b (183) at non-singleton dimension 1

I have pasted the error below. I am attempting to use the AlphaFlow MD + Template model and I am using this model + sequence:

https://alphafold.ebi.ac.uk/entry/A0A2P6NC61

predict.py 133
main()

_contextlib.py 115 decorate_context
return func(*args, **kwargs)

predict.py 119 main
prots = model.inference(batch, as_protein=True, noisy_first=args.noisy_first,

wrapper.py 374 inference
output = self.model(batch, prev_outputs=prev_outputs)

module.py 1532 _wrapped_call_impl
return self._call_impl(*args, **kwargs)

module.py 1541 _call_impl
return forward_call(*args, **kwargs)

alphafold.py 240 forward
extra_pseudo_beta = pseudo_beta_fn(batch['aatype'], batch['extra_all_atom_positions'], None)

feats.py 38 pseudo_beta_fn
pseudo_beta = torch.where(

RuntimeError:
The size of tensor a (184) must match the size of tensor b (183) at non-singleton dimension 1
[!!] 2024-05-15 16:55:47,353 Command 'source activate AlphaFlow; python alphaflow/predict.py --mode alphafold --input_csv alphaflow_input.csv --msa_dir AlphaFlow_MSA_Results --weights alphaflow/alphaflow_md_templates_base_202402.pt --samples 10 --outpdb upload/ --templates_dir alphaflow_template' returned non-zero exit status 1. (main.py:252)

Appropriate `model_config` arguments for initial training upon predicting with long sequence

Hi,

I have some long sequences that can't be predicted using alphaflow default settings.

To deal with long ones I changed arguments to predict as below.

config = model_config(
    'initial_training',
    train=False, 
    low_prec=False
    long_sequence_inference=True
) 

For long sequence prediction initial training is not needed? I'm afraid it results in decrease in precision. I want conformer ensembles with wide ranges of conformations and high precision.

There seems no description about the best practice or settings to predict proteins with long sequences.

Dockerfile or CUDA 12

Hi,

Thanks for the wonderful work. I am planning on doing some conformation sampling using this work, but unfortunately it seems like the hard requirement of CUDA 11.6 is an issue. I've tried different installers on CUDA 12 and can't get it to work but unfortunately the machines I have access to are all CUDA 12.

It seems like OpenFold has a branch pl_upgrades that can support CUDA 12+ - would it be possible to do a port forward on this branch or provide a Dockerfile?

Thanks.

PDB as input

Hi @bjing2016 ! Extremely useful work!
I have some PDBs with jumps in sequences (i.e. excluded IDRs).
I wonder if PDB as input is possible?
If not, would appreciate if it becomes available in the future.

Availability of MD Ensemble Evaluation Scripts

Hi all! Thanks for your work.

I'm reaching out to inquire about the availability of MD ensemble evaluation scripts, particularly for metrics beyond RMSD and RMSF. While these two metrics are relatively straightforward to generate, I've found challenges reproducing others like Root Mean W2-Dist and PCAs.
Could u provide guidance or scripts to assist with calculating these metrics? Really thanks for your help.

Best,
Shaoning

Command options' help messages in predict.py missing

Hi,

Thank you for your impressive work.

There are no help messages for prediction options.

parser.add_argument('--input_csv', type=str, default='splits/transporters_only.csv')
parser.add_argument('--templates_dir', type=str, default='./data')
parser.add_argument('--msa_dir', type=str, default='./alignment_dir')
parser.add_argument('--mode', choices=['alphafold', 'esmfold'], default='alphafold')
parser.add_argument('--samples', type=int, default=10)
parser.add_argument('--steps', type=int, default=10)
parser.add_argument('--outpdb', type=str, default='./outpdb/default')
parser.add_argument('--weights', type=str, default=None)
parser.add_argument('--ckpt', type=str, default=None)
parser.add_argument('--original_weights', action='store_true')
parser.add_argument('--pdb_id', nargs='*', default=[])
parser.add_argument('--subsample', type=int, default=None)
parser.add_argument('--resample', action='store_true')
parser.add_argument('--tmax', type=float, default=1.0)
parser.add_argument('--templates', action='store_true')
parser.add_argument('--no_diffusion', action='store_true', default=False)
parser.add_argument('--self_cond', action='store_true', default=False)
parser.add_argument('--noisy_first', action='store_true', default=False)
parser.add_argument('--runtime_json', type=str, default=None)
parser.add_argument('--no_overwrite', action='store_true', default=False)

Could you add these messages or describe for what each option is in README?

I could figure out some of them after reading the paper and README but some are still not clear so much.

TypeError: __init__() missing 2 required positional arguments: 'opm_first' and 'fuse_projection_weights'

Hello,I put weights, csv and a3m in folders (Especially, a3m in /cluster/home/xxx/alphaflow/splits/6DS0_A/a3m/6DS0_A.a3m), and run following code:

python predict.py --mode alphafold --input_csv /cluster/home/xxx/alphaflow/splits/6DS0_test.csv --msa_dir /cluster/home/xxx/alphaflow/splits --weights /cluster/home/xxx/alphaflow/splits/alphaflow_pdb_base_202402.pt --samples 200 --outpdb /cluster/home/xxx/alphaflow/splits/output --self_cond --resample

then I meet the error:

2024-02-22 13:45:48,506 [node83:61063] [INFO] Loading the model
Traceback (most recent call last):
  File "/cluster/home/xxx/alphaflow/predict.py", line 132, in <module>
    main()
  File "/cluster/home/xxx/.conda/envs/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/cluster/home/xxx/alphaflow/predict.py", line 78, in main
    model = model_class(**ckpt['hyper_parameters'], training=False)
  File "/cluster/home/xxx/alphaflow/alphaflow/model/wrapper.py", line 496, in __init__
    self.model = AlphaFold(config,
  File "/cluster/home/xxx/alphaflow/alphaflow/model/alphafold.py", line 73, in __init__
    self.extra_msa_stack = ExtraMSAStack(
TypeError: __init__() missing 2 required positional arguments: 'opm_first' and 'fuse_projection_weights'

Could you offer me some help to solve it? Thanks.

Colab notebook

Thank you for this fantastic repository! Would it be possible to provide a Google Colab demo for running the selected model? It would be extremely helpful for quick tests.

Thank you!

Example Input Files Not Working

Great work on the latest version of the paper and thanks for putting this repo out.
I was trying to test the basic inference you outlined using either the ESMFlow or AlphaFlow models and weights and ran into problems at every corner. I'll detail my specific issues below but repos always get increased usage when authors provide at least one full example input line for inference, so if you provide that I'm sure it would help many people checking out your code. Thanks!

Trying ESMFlow Model

mkdir output
mkdir weights
python predict.py --mode esmfold --input_csv splits/atlas_test.csv --weights weights/esmflow_md_distilled_202402.pt --samples 5 --outpdb output/

Output

2024-02-26 12:54:34,511 [---] [INFO] Loading the model
2024-02-26 12:55:16,878 [---] [INFO] Model has been loaded
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00, 5.08s/it]
Traceback (most recent call last):
File "/---/alphaflow/predict.py", line 132, in
main()
File "/---/miniconda3/envs/AlphaFlow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/---/alphaflow/predict.py", line 126, in main
f.write(protein.prots_to_pdb(result))
File "/---/alphaflow/alphaflow/utils/protein.py", line 163, in prots_to_pdb
prot = to_pdb(prot)
File "/---/miniconda3/envs/AlphaFlow/lib/python3.9/site-packages/openfold/np/protein.py", line 341, in to_pdb
chain_index = prot.chain_index.astype(np.int32)
AttributeError: 'NoneType' object has no attribute 'astype'

Tried with esmflow_pdb_base_202402.pt weights as well...same result.

Trying AlphaFlow Model
Preparing the MSA

python -m scripts.mmseqs_query --split splits/atlas_test.csv --outdir output
COMPLETE: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 450/450 [elapsed: 00:02 remaining: 00:00]

SUCCESS!

Running Inference

python predict.py --mode alphafold --input_csv splits/atlas_test.csv --msa_dir output/ --weights weights/alphaflow_pdb_distilled_202402.pt --samples 5 --outpdb output/
2024-02-26 13:17:56,383 [---] [INFO] Loading the model
Traceback (most recent call last):
File "/---/alphaflow/predict.py", line 132, in
main()
File "/---/miniconda3/envs/AlphaFlow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/---/alphaflow/predict.py", line 78, in main
model = model_class(**ckpt['hyper_parameters'], training=False)
File "/---/alphaflow/alphaflow/model/wrapper.py", line 496, in init
self.model = AlphaFold(config,
File "/---/alphaflow/alphaflow/model/alphafold.py", line 73, in init
self.extra_msa_stack = ExtraMSAStack(
TypeError: init() missing 2 required positional arguments: 'opm_first' and 'fuse_projection_weights'

Thanks again for your assistance. Looking forward to trying out this great work.

Issues with CUDA12 and or G++17

Hi,

I'm trying to install AlphaFlow on a machine with A30 GPUs with CUDA 12.1 and even tough I found a compatible pytorch version I gett the following error after running the command: pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@103d037':

"In file included from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
from openfold/utils/kernel/csrc/softmax_cuda_kernel.cu:18:
/home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is required to use PyTorch.
4 | #error C++17 or later compatible compiler is required to use PyTorch.
| ^~~~~
In file included from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/string_view.h:4,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/StringUtil.h:6,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/Exception.h:5,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/core/Device.h:5,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:11,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/core/Tensor.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/Tensor.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
from openfold/utils/kernel/csrc/softmax_cuda_kernel.cu:18:
/home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/c10/util/C++17.h:27:2: error: #error You need C++17 to compile PyTorch
27 | #error You need C++17 to compile PyTorch
| ^~~~~
In file included from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:4,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/all.h:9,
from /home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/torch/extension.h:5,
from openfold/utils/kernel/csrc/softmax_cuda_kernel.cu:18:
/home/raraya/.conda/envs/alpha_flow/lib/python3.9/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
4 | #error C++17 or later compatible compiler is required to use ATen.
| ^~~~~
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1"

My gcc version is 11.3

PPI ensembles

Any chance to adopt this for protein-protein interfaces?

Training with PDBs

Could you provide a version that can be trained using PDBs?
In addition, you must specify the mdtraj version for the installation, otherwise the updated mdtraj may install other versions numpy and scipy.

alphaflow not working -- keyError: 'ptm'

I had been running alphaflow in an environment for the last months and it was working fine. This weekend I had not been able to run it. I have been getting this error:

[INFO] Loading the model
[INFO] Model has been loaded
0%| | 0/50 [00:08<?, ?it/s]
Traceback (most recent call last):
File "/GIT/alphaflow/predict.py", line 175, in
main()
File "/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/GIT/alphaflow/predict.py", line 129, in main
dat = {col: dat[col] for col in cols}
File "/GIT/alphaflow/predict.py", line 129, in
dat = {col: dat[col] for col in cols}
KeyError: 'ptm'

Another issue I found while running alphaflow is that in the file templates.py in lines 88 and 89 it is ussing np.object but a warning is displayed because in newer versions of numpy this doesn't exists and it recommends to change that lines from np.object to object.

Multimer

Hi! Great work!
Is multimer supported as in ESMFold?
I was trying to use a separation token ":" as in ESM but it doesn't seem to work.

predict.py does not run AttributeError: module 'numpy' has no attribute 'object'.

I built the docker file from commit 2c27c69. When running predict.py the following stack trace is generated,

root@3f4467776483:/opt/alphaflow# /opt/conda/bin/python -V
Python 3.9.7

root@3f4467776483:/opt/alphaflow# /opt/conda/bin/python /opt/alphaflow/predict.py 

/opt/conda/lib/python3.9/site-packages/openfold-1.0.1-py3.9-linux-x86_64.egg/openfold/data/templates.py:88: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
  "template_domain_names": np.object,
Traceback (most recent call last):
  File "/opt/alphaflow/predict.py", line 29, in <module>
    from alphaflow.data.data_modules import collate_fn
  File "/opt/alphaflow/alphaflow/data/data_modules.py", line 32, in <module>
    from alphaflow.data import data_pipeline, feature_pipeline
  File "/opt/alphaflow/alphaflow/data/data_pipeline.py", line 22, in <module>
    from openfold.data import templates, parsers, mmcif_parsing
  File "/opt/conda/lib/python3.9/site-packages/openfold-1.0.1-py3.9-linux-x86_64.egg/openfold/data/templates.py", line 88, in <module>
    "template_domain_names": np.object,
  File "/opt/conda/lib/python3.9/site-packages/numpy/__init__.py", line 324, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. 
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I note that when the container is built that numpy 1.21.2 is first installed, and later, uninstalled when mdtraj is built, leaving 1.26 instead.

 Stored in directory: /root/.cache/pip/wheels/4f/17/89/f855cce8e6394e9029e1b972cb623c8813b706d3d1ca81832f
Successfully built mdtraj
Installing collected packages: fair-esm, typing-extensions, pyparsing, numpy, astunparse, scipy, mdtraj, pytorch_lightning
  Attempting uninstall: typing-extensions
    Found existing installation: typing-extensions 3.10.0.2
    Uninstalling typing-extensions-3.10.0.2:
      Successfully uninstalled typing-extensions-3.10.0.2
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.2
    **Uninstalling numpy-1.21.2:
      Successfully uninstalled numpy-1.21.2**
  Attempting uninstall: scipy
    Found existing installation: scipy 1.7.3
    Uninstalling scipy-1.7.3:
      Successfully uninstalled scipy-1.7.3
  Attempting uninstall: pytorch_lightning
    Found existing installation: pytorch-lightning 1.5.10
    Uninstalling pytorch-lightning-1.5.10:
      Successfully uninstalled pytorch-lightning-1.5.10
Successfully installed astunparse-1.6.3 fair-esm-2.0.0 mdtraj-1.10.0 **numpy-1.26.4** pyparsing-3.1.2 pytorch_lightning-2.0.4 scipy-1.13.1 typing-extensions-4.12.2

I am going to try pinning numpy 1.21.2 through out the pip installations, please advise if there is a better/different route.

when traing alphaflow from scratch, show AttributeError: 'AlphaFoldWrapper' object has no attribute 'extra_msa_stack'

Hi BoWen, i am using alphaflow train.py to trying run, but find error with below:
AttributeError: 'AlphaFoldWrapper' object has no attribute 'extra_msa_stack
and i am using:
python train.py --lr 5e-4 --noise_prob 0.8 --accumulate_grad 8 --train_epoch_len 80000 --train_cutoff 2018-05-01 --filter_chains --train_data_dir ../unpack_mmcif_out --train_msa_dir ../openfold/pdb --mmcif_dir ../pdb_mmcif --val_msa_dir ../openfold/alignment_db --run_name alphaflow_train
Is need add extra_msa_stack in AlphaFoldWrapper class or drop this line?

scripts.unpack_mmcif.py reference to outdated betafold?

I tried to preprocess pdbs into NPZ files using "scripts/unpack_mmcif.py"
e.g.:
$ python -m scripts.unpack_mmcif.py --mmcif_dir ../testpdb/data_dir/ --outdir ../testpdb/outesmflow/

but it tries to load betafold:
"from betafold.data.data_pipeline import DataPipeline"

and fails:
"ModuleNotFoundError: No module named 'betafold'"

I tried to substitute "betafold.data.data_pipeline" for "alphaflow.data.data_pipeline" but I run into other issues that make me believe that "unpack_mmcif.py" is an outdated file.

Can you confirm this?

Installation fails on Cuda12

Hi,
I was wondering if there is a new wheel for cuda12 installation? I tried this on a debian 12 system with cuda12, however the setup fails to build the wheel. I also tried with cuda11.3, where it fails because debian12 has a g++ version 12 and it requires a g++ version <=10.
Thanks

TypeError: __init__() missing 1 required positional argument: 'no_column_attention'

Hi,
I've been trying to do some experiments using your model and scripts and running into a problem. The error arises when using the following command from a testing directory in the main project location:

python3 ../predict.py \
    --weights ../model_weights/alphaflow_pdb_distilled_202402.pt \
    --mode alphafold \
    --input_csv ghsr.csv \
    --msa_dir ./msas \
    --samples 10 \
    --outpdb ./out \
    --noisy_first \
    --no_diffusion

I get the error:

2024-02-27 08:10:02,292 [iwe547170:52263] [INFO] Loading the model
Traceback (most recent call last):
  File "/media/data/software/alphaflow/workdir/../predict.py", line 138, in <module>
    main()
  File "/home/iwe34/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/media/data/software/alphaflow/workdir/../predict.py", line 84, in main
    model = model_class(**ckpt['hyper_parameters'], training=False)
  File "/media/data/software/alphaflow/alphaflow/model/wrapper.py", line 496, in __init__
    self.model = AlphaFold(config,
  File "/media/data/software/alphaflow/alphaflow/model/alphafold.py", line 77, in __init__
    self.evoformer = EvoformerStack(
TypeError: __init__() missing 1 required positional argument: 'no_column_attention'

To figure out why that happened, I modified the alphaflow/config.py file by manually adding the flag no_column_attention: False and realized later, that this config is only used when the predict.py script is called with the additional flag --original_weights=True.

However, inspecting the loaded config ckpt from the lines

if args.weights:
    ckpt = torch.load(args.weights, map_location='cpu')
    model = model_class(**ckpt['hyper_parameters'], training=False)

showed that the model weights alphaflow_pdb_distilled_202402.pt doesn't contain the no_column_attention field. It worked fine when I used the original weights params_model_1.npz (apart from getting a CUDA error, another problem).

Simple question: What am I doing wrong? Why can I provide new model weights when these could never be used because the EvoformerStack in openfold requires this argument?

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

2024-09-02 15:42:54,465 [] [INFO] Loading the model
2024-09-02 15:42:55,222 [] [INFO] Model has been loaded
  0%|                                                                         | 0/10 [00:00<?, ?it/s]INPZ torch.Size([1, 488, 488, 128])
MASK torch.Size([1, 488, 488])
  0%|                                                                         | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/predict.py", line 183, in <module>
    main()
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/predict.py", line 129, in main
    prots, outputs = model.inference(batch, as_protein=False, noisy_first=args.noisy_first,
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/alphaflow/model/wrapper.py", line 363, in inference
    output = self.model(batch)
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/alphaflow/model/alphafold.py", line 223, in forward
    inp_z = self._get_input_pair_embeddings(
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/alphaflow/model/alphafold.py", line 130, in _get_input_pair_embeddings
    inp_z = self.input_pair_stack(inp_z, mask, chunk_size=None)
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/alphaflow/model/input_stack.py", line 276, in forward
    t, = checkpoint_blocks(
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/openfold/utils/checkpointing.py", line 85, in checkpoint_blocks
    return exec(blocks, args)
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/openfold/utils/checkpointing.py", line 72, in exec
    a = wrap(block(*a))
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/dionessa/Documents/cryptochrome_codes/alphaflow/alphaflow/model/input_stack.py", line 122, in forward
    self.tri_att_end(
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/openfold/model/triangular_attention.py", line 114, in forward
    x = self.layer_norm(x)
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/openfold/model/primitives.py", line 211, in forward
    out = nn.functional.layer_norm(
  File "/opt/anaconda3/envs/alphaflow/lib/python3.9/site-packages/torch/nn/functional.py", line 2503, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.