SelfPAD:

Author: Talip Ucar ([email protected])

The official implementation of Improving Antibody Humanness Prediction using Patent Data

Model

Pre-training	Fine-tuning

Environment

We used Python 3.7 for our experiments. The environment can be set up by following three steps:

pip install pipenv             # To install pipenv if you don't have it already
pipenv install --skip-lock     # To install required packages. 
pipenv shell                   # To activate virtual env

If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".

Configuration

There are two types of configuration files:

1. pad.yaml         # Defines parameters and options for pre-training
2. humanness.yaml   # Defines parameters and options for fine-training

Training and Evaluation

You can train and evaluate the model by using:

python selfpad_pretrain.py        # For pre-training
python selfpad_finetune.py        # For fine-tuning it for humanness
python selfpad_eval.py -ev test    # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns

Structure of the repo

- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py

- src
    |-selfpad.py
    |-selfpad_humanness.py

- config
    |-pad.yaml
    |-humanness.yaml
    
- utils_common
    |-arguments.py
    |-utils.py
    |-tokenizer.py
    ...
    
- utils_pretrain
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- utils_finetune
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- data
    |-test.csv
    ...
    
- results
    |-pretraining
    |-humanness
    ...

Results

Results at the end of training is saved under ./results directory. Results directory structure is as following:

- results
    |-task e.g. humanness, or pretraining
            |-evaluation
                |-clusters (for plotting t-SNE and PCA plots of embeddings)
            |-training
                |-model
                |-plots
                |-loss

You can save results of evaluations under "evaluation" folder.

Experiment tracking

You can turn on Weight and Biases (W&B) in the config file for logging

Citing the paper

@article{ucar2024SelfPAD,
  title={Improving Antibody Humanness Prediction using Patent Data},
  author={Ucar, Talip and 
          Ramon, Aubin and 
          Oglic, Dino and 
          Croasdale-Wood, Rebecca and 
          Diethe, Tom and 
          Sormanni, Pietro},
  journal={arXiv preprint arXiv:2110.04361},
  year={2024}
}

Citing this repo

If you use SelfPAD framework in your own studies, and work, please cite it by using the following:

@Misc{talip_ucar_2024_SelfPAD,
  author =   {Talip Ucar},
  title =    {{Improving Antibody Humanness Prediction using Patent Data}},
  howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},
  month        = January,
  year = {since 2024}
}

SelfPAD_eval.py is failing

Hello, thanks for the good tool. I had some issues with installing dependencies, but made it work with Docker. Here is a Dockerfile.txt (extension added so upload is allowed) if someone finds it useful to build a tool in a container environment like Docker.

However, when I ran selfpad_eval.py to evaluate humanness I am getting this error:

Building the models for training and evaluation in SubTab framework...
Traceback (most recent call last):
  File "/opt/SelfPAD/selfpad_eval.py", line 250, in <module>
    main()
  File "/opt/SelfPAD/selfpad_eval.py", line 224, in main
    f1, recall, prec, auc, acc, pr_auc = eval(data_loader, config=config)
  File "/opt/SelfPAD/selfpad_eval.py", line 45, in eval
    model = PADFintune(config)
  File "/opt/SelfPAD/src/selfpad_humanness.py", line 47, in __init__
    self.set_autoencoder()
  File "/opt/SelfPAD/src/selfpad_humanness.py", line 53, in set_autoencoder
    self.transformer = PADFT(
  File "/opt/SelfPAD/utils_finetune/model_utils.py", line 49, in __init__
    self.transformer = SelfPAD.load_from_checkpoint(
  File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/utilities/model_helpers.py", line 100, in wrapper
    return self.method(cls, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1561, in load_from_checkpoint
    loaded = _load_from_checkpoint(
  File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 61, in _load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=map_location)
  File "/usr/local/lib/python3.9/site-packages/lightning_fabric/utilities/cloud_io.py", line 55, in _load
    with fs.open(path_or_url, "rb") as f:
  File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1293, in open
    f = self._open(
  File "/usr/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 184, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 306, in __init__
    self._open()
  File "/usr/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 311, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/opt/SelfPAD/results/pretraining/training/model/pretrained_model.ckpt'

Input file I've used for testing is attached below. And this is the command line I've used:
python3 /opt/SelfPAD/selfpad_eval.py --evaluate therapeutic_ABs.selfpad_input

Can you help me figure out what the issue is?

therapeutic_ABs.selfpad_input.csv
Dockerfile.txt

astrazeneca / selfpad Goto Github PK

selfpad's Introduction

SelfPAD:

Author: Talip Ucar ([email protected])

Table of Contents:

Model

Environment

Configuration

Training and Evaluation

Structure of the repo

Results

Experiment tracking

Citing the paper

Citing this repo

selfpad's People

Contributors

Stargazers

Watchers

Forkers

selfpad's Issues

Recommend Projects

Recommend Topics

Recommend Org