Code Monkey home page Code Monkey logo

bioblp's Introduction

BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs





This is the official repository implementing BioBLP, presented in "BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs".

BioBLP is a framework that allows encoding a diverse set of multimodal data that can appear in biomedical knowledge graphs. It is based on the idea of learning embeddings for each modality separately, and then combining them into a single multimodal embedding space. The framework is modular, and allows for easy integration of new modalities.

Usage

1. Install the requirements

We recommend using Anaconda to manage the dependencies. The following command will create and activate a new conda environment with all the required dependencies.

conda create -f environment.yml && conda activate bioblp

2. Download the data

The data can be downloaded from here as a tar.gz file. This corresponds to our version of BioKG that has been decoupled from the benchmarks (see the paper for more details), and it also includes the necessary attribute data for proteins, molecules, and diseases. The file should be placed inside the data folder and decompressed:

tar xzf biokgb.tar.gz

3. Training link prediction models

Use the bioblp.train module to train a link prediction model. For example, to train a BioBLP-D model (which encodes disease descriptions) using the RotatE scoring function, use:

python -m bioblp.train \
    --train_triples=data/biokgb/graph/biokg.links-train.csv \
    --valid_triples=data/biokgb/graph/biokg.links-valid.csv \
    --test_triples=data/biokgb/graph/biokg.links-test.csv \
    --text_data=data/biokgb/properties/biokg_meshid_to_descr_name.tsv \
    --model=rotate --dimension=256 --loss_fn=crossentropy --optimizer=adam \
    --learning_rate=2e-5 --warmup_fraction=0.05 --num_epochs=100 \
    --batch_size=1024 --eval_batch_size=64 --num_negatives=512 --in_batch_negatives=True

The above command on a NVIDIA A100 40G GPU takes about 9 hours to train.

We use Weights and Biases to log the experiments, which is disabled by default. To enable it, add --log_wandb=True to the command above.

More examples will be added soon.

4. Benchmark tasks

  • Pre-generate the input dataset with flags indicating if they are known or novel links.
  • Run bioblp.benchmarking.preprocess.py to prepare BM dataset for ML by shuffling, splits, etc.
  • bioblp.benchmarking.featurize.py can be used to featurize a list of pair wise entities into vectors composed from individual vector entities.

Custom usage:

$ python -m bioblp.benchmarking.featurize -i data/benchmarks/processed/dpi_benchmark_p2n-1-10.tsv -o data/features -t kgem -f models/1baon0eg/ -j concatenate

bioblp's People

Contributors

dfdazac avatar dimitrisalivas avatar pmitra01 avatar thompijnenburg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.