Code Monkey home page Code Monkey logo

selfpad's Introduction

SelfPAD:

Author: Talip Ucar ([email protected])

The official implementation of Improving Antibody Humanness Prediction using Patent Data

Table of Contents:

  1. Model
  2. Environment
  3. Configuration
  4. Training and Evaluation
  5. Structure of the repo
  6. Results
  7. Experiment tracking
  8. Citing the paper
  9. Citing this repo

Model

Pre-training Fine-tuning
SelfPAD SelfPAD

Environment

We used Python 3.7 for our experiments. The environment can be set up by following three steps:

pip install pipenv             # To install pipenv if you don't have it already
pipenv install --skip-lock     # To install required packages. 
pipenv shell                   # To activate virtual env

If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".

Configuration

There are two types of configuration files:

1. pad.yaml         # Defines parameters and options for pre-training
2. humanness.yaml   # Defines parameters and options for fine-training

Training and Evaluation

You can train and evaluate the model by using:

python selfpad_pretrain.py        # For pre-training
python selfpad_finetune.py        # For fine-tuning it for humanness
python selfpad_eval.py -ev test    # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns

Structure of the repo

- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py

- src
    |-selfpad.py
    |-selfpad_humanness.py

- config
    |-pad.yaml
    |-humanness.yaml
    
- utils_common
    |-arguments.py
    |-utils.py
    |-tokenizer.py
    ...
    
- utils_pretrain
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- utils_finetune
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- data
    |-test.csv
    ...
    
- results
    |-pretraining
    |-humanness
    ...
    

Results

Results at the end of training is saved under ./results directory. Results directory structure is as following:

- results
    |-task e.g. humanness, or pretraining
            |-evaluation
                |-clusters (for plotting t-SNE and PCA plots of embeddings)
            |-training
                |-model
                |-plots
                |-loss

You can save results of evaluations under "evaluation" folder.

Experiment tracking

You can turn on Weight and Biases (W&B) in the config file for logging

Citing the paper

@article{ucar2024SelfPAD,
  title={Improving Antibody Humanness Prediction using Patent Data},
  author={Ucar, Talip and 
          Ramon, Aubin and 
          Oglic, Dino and 
          Croasdale-Wood, Rebecca and 
          Diethe, Tom and 
          Sormanni, Pietro},
  journal={arXiv preprint arXiv:2110.04361},
  year={2024}
}

Citing this repo

If you use SelfPAD framework in your own studies, and work, please cite it by using the following:

@Misc{talip_ucar_2024_SelfPAD,
  author =   {Talip Ucar},
  title =    {{Improving Antibody Humanness Prediction using Patent Data}},
  howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},
  month        = January,
  year = {since 2024}
}

selfpad's People

Contributors

talipucar avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

chemphy

selfpad's Issues

SelfPAD_eval.py is failing

Hello, thanks for the good tool. I had some issues with installing dependencies, but made it work with Docker. Here is a Dockerfile.txt (extension added so upload is allowed) if someone finds it useful to build a tool in a container environment like Docker.

However, when I ran selfpad_eval.py to evaluate humanness I am getting this error:

Building the models for training and evaluation in SubTab framework...
Traceback (most recent call last):
  File "/opt/SelfPAD/selfpad_eval.py", line 250, in <module>
    main()
  File "/opt/SelfPAD/selfpad_eval.py", line 224, in main
    f1, recall, prec, auc, acc, pr_auc = eval(data_loader, config=config)
  File "/opt/SelfPAD/selfpad_eval.py", line 45, in eval
    model = PADFintune(config)
  File "/opt/SelfPAD/src/selfpad_humanness.py", line 47, in __init__
    self.set_autoencoder()
  File "/opt/SelfPAD/src/selfpad_humanness.py", line 53, in set_autoencoder
    self.transformer = PADFT(
  File "/opt/SelfPAD/utils_finetune/model_utils.py", line 49, in __init__
    self.transformer = SelfPAD.load_from_checkpoint(
  File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/utilities/model_helpers.py", line 100, in wrapper
    return self.method(cls, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1561, in load_from_checkpoint
    loaded = _load_from_checkpoint(
  File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 61, in _load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=map_location)
  File "/usr/local/lib/python3.9/site-packages/lightning_fabric/utilities/cloud_io.py", line 55, in _load
    with fs.open(path_or_url, "rb") as f:
  File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1293, in open
    f = self._open(
  File "/usr/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 184, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 306, in __init__
    self._open()
  File "/usr/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 311, in _open
    self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/opt/SelfPAD/results/pretraining/training/model/pretrained_model.ckpt'

Input file I've used for testing is attached below. And this is the command line I've used:
python3 /opt/SelfPAD/selfpad_eval.py --evaluate therapeutic_ABs.selfpad_input

Can you help me figure out what the issue is?

therapeutic_ABs.selfpad_input.csv
Dockerfile.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.