Comments (7)
The advanced notebook is under active development. I would avoid trying to deploy it locally (unless you are willing to track daily bug fixes and implement them yourself). For a more stable setup see alphafold2_mmseqs2 notebook.
from colabfold.
The old K40 GPUs (12GB RAM) we have locally ran all but one (it was 900-1000aa) CASP FM target without issues with the official pipeline, so AF2 doesn't necessarily need very new GPUs.
You might still want to poke at the python code in the Colab, as this will be a lot easier to supply your own MSAs to than the official pipeline. Ideally we want to make the Colabs also runnable on the command line, but haven't started working on that yet.
from colabfold.
from colabfold.
Ideally we want to make the Colabs also runnable on the command line, but haven't started working on that yet.
This is also mentioned in #20. It would be great to have either a local command-line interface or the local notebook version so that we can run on inputs with >1000 amino acids and predict complexes (dimers/trimers) of it.
I'm not that familiar with all the steps involved in the code. Mostly use it as an end-to-end tool. I tried to localize the AlphaFold2_advanced
notebook. After solving several package issues, now stuck at No module named 'colabfold'
. I also see the database_path
all go to googleapis
which will work fine on Colab but less smoothly on local I guess. I have a local version of AlphaFold2 running fine. Would be much appreciated if you can give some hints on how to localize the AlphaFold2_advanced
notebook. Thanks.
from colabfold.
from colabfold.
We now have an internal version that runs on a cluster. The main issue still remains that the MMseqs2 API runs on one single server and will probably not scale to multiple research group submitting jobs.
We are still preparing databases, scripts etc. so people could deploy their own server. However, to use MMseqs2 as we use it for ColabFold we do require that all databases are fully in RAM (currently requiring 535GB of RAM + some RAM for each worker process).
We can change the local ColabFold version to work with MMseqs2's usual batch mode where the memory requirements are not as high.
If you want to run a few thousand sequences please contact me directly (email, twitter etc). I can give you access to the local version. We still need to figure something out how to scale the API better though.
from colabfold.
Thanks! I have a local version of AlphaFold2 installed with docker on a server. (I met some problems during installation. And then I was trying to install the non_docker version on a cluster as well but later dropped it as the one on the server worked out fine after changing the Cuda version.)
I have 4 NVIDIA RTX A6000
and 1.0TB RAM on that server. But I still have not got AlphaFold2_advanced.ipynb run through. I would like to predict homotrimers of a protein with more than 1000 aa (more details at issue #93 in AlphaFold2's repo). With trimer settings, the total length is more than 3000 aa.
I am facing the below error if I run them on Colab.
Exception: Input sequence is too long: 3867 amino acids, while the maximum is 2500. Please use the full AlphaFold system for long sequences.
Exception: Input sequence is too long: 3078 amino acids, while the maximum is 2500. Please use the full AlphaFold system for long sequences.
I'm trying to run the notebooks locally on the server now. The previously mentioned No module named 'colabfold'
error was because I was launching the notebook within the AlphaFold2's folder and it bypassed this line if not os.path.isdir("alphafold"):
in the notebook. I moved the notebook to another folder. After several pip installs and conda installs for the missing packages, I didn't change the database_path
, so it should be using googleapis
I suppose. I changed the max length to MAX_SEQUENCE_LENGTH = 5000
(I only changed this line when trying to fix the aforementioned error).
And now, the #@title Search against genetic databases
cell runs fine and plotted the sequence coverage figure. However, the #@title run alphafold
cell gives below error.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_40057/1661749859.py in <module>
151 cfg.data.eval.num_ensemble = num_ensemble
152
--> 153 params = data.get_model_haiku_params(name,'./alphafold/data')
154 model_runner = model.RunModel(cfg, params, is_training=is_training)
155 COMPILED = compiled
~/.conda/envs/AF/lib/python3.9/site-packages/alphafold/model/data.py in get_model_haiku_params(model_name, data_dir)
37 params = np.load(io.BytesIO(f.read()), allow_pickle=False)
38
---> 39 return utils.flat_params_to_haiku(params)
~/.conda/envs/AF/lib/python3.9/site-packages/alphafold/model/utils.py in flat_params_to_haiku(params)
77 if scope not in hk_params:
78 hk_params[scope] = {}
---> 79 hk_params[scope][name] = jnp.array(array)
80
81 return hk_params
~/.conda/envs/AF/lib/python3.9/site-packages/jax/_src/numpy/lax_numpy.py in array(object, dtype, copy, order, ndmin)
3085 _inferred_dtype = object.dtype and dtypes.canonicalize_dtype(object.dtype)
3086 lax._check_user_dtype_supported(_inferred_dtype, "array")
-> 3087 out = _device_put_raw(object, weak_type=weak_type)
3088 if dtype: assert _dtype(out) == dtype
3089 elif isinstance(object, (DeviceArray, core.Tracer)):
~/.conda/envs/AF/lib/python3.9/site-packages/jax/_src/lax/lax.py in _device_put_raw(x, weak_type)
1607 else:
1608 aval = raise_to_shaped(core.get_aval(x), weak_type=weak_type)
-> 1609 return xla.array_result_handler(None, aval)(*xla.device_put(x))
1610
1611 def zeros_like_shaped_array(aval):
~/.conda/envs/AF/lib/python3.9/site-packages/jax/interpreters/xla.py in device_put(x, device)
156 x = canonicalize_dtype(x)
157 try:
--> 158 return device_put_handlers[type(x)](x, device)
159 except KeyError as err:
160 raise TypeError(f"No device_put handler for type: {type(x)}") from err
~/.conda/envs/AF/lib/python3.9/site-packages/jax/interpreters/xla.py in _device_put_array(x, device)
164 if x.dtype is dtypes.float0:
165 x = np.zeros(x.shape, dtype=np.dtype(bool))
--> 166 return (backend.buffer_from_pyval(x, device),)
167
168 def _device_put_scalar(x, device):
RuntimeError: Resource exhausted: Out of memory while trying to allocate 2097152 bytes.
However, all 4 GPUs and the RAM are available as shown below.
~$ nvidia-smi
Mon Aug 23 14:04:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:18:00.0 Off | Off |
| 30% 24C P8 6W / 300W | 460MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A6000 Off | 00000000:3B:00.0 Off | Off |
| 30% 28C P8 14W / 300W | 550MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:86:00.0 Off | Off |
| 30% 26C P8 7W / 300W | 456MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:AF:00.0 Off | Off |
| 30% 24C P8 17W / 300W | 452MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
$ free -h
total used free shared buff/cache available
Mem: 1.0T 7.3G 808G 40M 191G 994G
Swap: 7.5G 0B 7.5G
Any help would be very appreciated. Thanks
from colabfold.
Related Issues (20)
- Find out the exact version of the remote sequence libraries used to build the msa with colabfold_batch HOT 1
- HHSearch failed: hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument HOT 1
- Releases past 1.5.2? HOT 2
- Pending. Issue running cell line 331
- Model weights download every time when running colabfold_batch HOT 2
- Jax dependency Issue HOT 12
- Local generation of .m8 file causes wrong template selection in colabfold_batch HOT 4
- pdb100_230517_seq does not exist issue HOT 3
- Cannot find output PDB files HOT 2
- can't reproduce results with same a3m file. HOT 1
- Possible Memory Leak or Excessive RAM Usage Error in batch model of colabfold_batch Locally HOT 3
- Error in using custom mode and pdb file HOT 1
- evoformer embeddings
- No info in output.cvs LazyAF colab AlphaFold2-Multimer HOT 1
- Difference between AF and AF advanced
- Failing to convert tsv to exprofiledb for Uniref30 files in setup_databases.sh HOT 2
- RoseTTAFold.ipynb crashes at "Install and import libraries" step in the google colab HOT 2
- Supplying alternative model weights HOT 7
- Index out of range during the display of 3D structure. HOT 2
- can colabfold be used to predict loops while leaving the remaining of the protein untouched? HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colabfold.