Comments (4)
@yzhang-github-pub,
I could create a working environment with the conda installation I submitted in #2
My numpy version is 1.25.
With that environment I can run the example in PeSTo/ /apply_model.ipynb, which I have edited to convert into a python script below. I tested it running it interactively from PeSTo/ install folder, using the example in PeSTo/examples/issue_19_04_2023/2CUA_A.pdb, and compared the output with the website https://pesto.epfl.ch/. Note that for the results to match, you have to select "chainA" only on the website (if you select chainA and chainB, the PPI probabilities change).
import os
import sys
import h5py
import json
import numpy as np
import torch as pt
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from glob import glob
from src.dataset import StructuresDataset, collate_batch_features, select_by_sid, select_by_interface_types
from src.data_encoding import encode_structure, encode_features, extract_topology, categ_to_resnames, resname_to_categ
from src.structure import data_to_structure, encode_bfactor, concatenate_chains, split_by_chain
from src.structure_io import save_pdb, read_pdb
from src.scoring import bc_scoring, bc_score_names
# data parameters
data_path = "examples/issue_19_04_2023"
# model parameters
save_path = "model/save/i_v4_1_2021-09-07_11-21" # 91
# select saved model
model_filepath = os.path.join(save_path, 'model_ckpt.pt')
# add module to path
if save_path not in sys.path:
sys.path.insert(0, save_path)
# load functions
from config import config_model, config_data
from data_handler import Dataset
from model import Model
# define device
device = pt.device("cpu")
# create model
model = Model(config_model)
# reload model
model.load_state_dict(pt.load(model_filepath, map_location=pt.device("cpu")))
# set model to inference
model = model.eval().to(device)
# find pdb files and ignore already predicted oins
pdb_filepaths = glob(os.path.join(data_path, "*.pdb"), recursive=True)
pdb_filepaths = [fp for fp in pdb_filepaths if "_i" not in fp]
# create dataset loader with preprocessing
dataset = StructuresDataset(pdb_filepaths, with_preprocessing=True)
# debug print
print(len(dataset))
# run model on all subunits
with pt.no_grad():
for subunits, filepath in tqdm(dataset):
# concatenate all chains together
structure = concatenate_chains(subunits)
# encode structure and features
X, M = encode_structure(structure)
#q = pt.cat(encode_features(structure), dim=1)
q = encode_features(structure)[0]
# extract topology
ids_topk, _, _, _, _ = extract_topology(X, 64)
# pack data and setup sink (IMPORTANT)
X, ids_topk, q, M = collate_batch_features([[X, ids_topk, q, M]])
# run model
z = model(X.to(device), ids_topk.to(device), q.to(device), M.float().to(device))
# for all predictions
for i in range(z.shape[1]):
# prediction
p = pt.sigmoid(z[:,i])
# encode result
structure = encode_bfactor(structure, p.cpu().numpy())
# save results
output_filepath = filepath[:-4]+'_i{}.pdb'.format(i)
save_pdb(split_by_chain(structure), output_filepath)
from pesto.
Thanks @rubenalv !
I compared my converted python script with yours. The difference was I re-organized the import statements so that all modules to be imported from git repo were in one block:
...
_# select saved model
model_filepath = os.path.join(save_path, 'model_ckpt.pt')
add module to path
if save_path not in sys.path:
sys.path.insert(0, save_path)
from src.dataset import StructuresDataset, collate_batch_features, select_by_sid, select_by_interface_types
from src.data_encoding import encode_structure, encode_features, extract_topology, categ_to_resnames, resname_to_categ
from src.structure import data_to_structure, encode_bfactor, concatenate_chains, split_by_chain
from src.structure_io import save_pdb, read_pdb
from src.scoring import bc_scoring, bc_score_names
from config import config_model, config_data
#from data_handler import Dataset # not used
from model import Model_
...
It seems moving "sys.path.insert(0, save_path)" before the 5 lines of "from src.* import *" caused the error. I still don't understand why, but the order seems to be important.
Thanks again.
from pesto.
@yzhang-github-pub, the sys.path.insert(0, save_path)
line allows the modules within /main/model/save/i_v4_1_2021-09-07_11-21 to be loaded. In the script I wrote I use the modules in /main/model instead, so it may be that they are slightly different. I mention this because I had an issue with /main/src/structure_io.py and main/model/save/i_v4_1_2021-09-07_11-21/src/structure_io.py not being exactly the same when compared with linux diff
. With one I could get a file identical to 2CUA_A_i0.pdb , and with the other .py I could not (again compared initially with diff
), just because a difference in which columns each .py wrote in the pdb output.
Glad you solved your issue!
from pesto.
The chain naming in the PDB format is not ideal, especially for other molecules (HETATM) associated with the closest chain. There can be also multiple copies of the same chain name (symmetric structures). So I'm appending a tag to the chain name to indicate differentiate between those. I represent the chain name per atom using an array of string. However the unicode string in Numpy has a fixed length (cannot append to the string directly) and they removed the generic numpy.object
type recent version of numpy.
I updated the code with a quick fix in commit d00b0d6
For now, it is using a buffer size of 10 characters for the chain name. The solution is not optimal but it should fix this issue in all cases.
The numpy.object
version of the code works with numpy 1.23
from pesto.
Related Issues (12)
- .yml file HOT 3
- interface_ppi_benchmark.ipynb, incorrect paths and missing benchmark data HOT 1
- interface_ppi_profiling_analysis.ipynb, missing
- interface_ppi_confidence.ipynb, missing data and incorrect module calling HOT 6
- PeSTo/md_analysis/data_manager /data_manager.py, "meta" does not exist HOT 2
- Datasets about protien-lipid interactions HOT 1
- missing files HOT 1
- Is there a ready-made docker image ? HOT 3
- Question: which prediction file(s) to use HOT 1
- Questions about training
- The longe range context seems to affect local interface prediction scores
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pesto.