Code Monkey home page Code Monkey logo

biobert-pytorch's Introduction

BioBERT-PyTorch

This repository provides the PyTorch implementation of BioBERT. You can easily use BioBERT with transformers. This project is supported by the members of DMIS-Lab @ Korea University including Jinhyuk Lee, Wonjin Yoon, Minbyul Jeong, Mujeen Sung, and Gangwoo Kim.

Installation

# Install huggingface transformers
pip install transformers==3.0.0

# Download all datasets including NER/RE/QA
./download.sh

Note that you should also install torch (see download instruction) to use transformers. If the download script does not work, you can manually download the datasets here which should be unzipped in the current directory (tar -xzvf datasets.tar.gz).

Models

We provide following versions of BioBERT in PyTorch (click here to see all). You can use BioBERT in transformers by setting --model_name_or_path as one of them (see example below).

  • dmis-lab/biobert-base-cased-v1.1: BioBERT-Base v1.1 (+ PubMed 1M)
  • dmis-lab/biobert-large-cased-v1.1: BioBERT-Large v1.1 (+ PubMed 1M)
  • dmis-lab/biobert-base-cased-v1.1-mnli: BioBERT-Base v1.1 pre-trained on MNLI
  • dmis-lab/biobert-base-cased-v1.1-squad: BioBERT-Base v1.1 pre-trained on SQuAD

For other versions of BioBERT or for Tensorflow, please see the README in the original BioBERT repository. You can convert any version of BioBERT into PyTorch with this.

Example

For instance, to train BioBERT on the NER dataset (NCBI-disease), run as:

# Pre-process NER datasets
cd named-entity-recognition
./preprocess.sh

# Choose dataset and run
export DATA_DIR=../datasets/NER
export ENTITY=NCBI-disease
python run_ner.py \
    --data_dir ${DATA_DIR}/${ENTITY} \
    --labels ${DATA_DIR}/${ENTITY}/labels.txt \
    --model_name_or_path dmis-lab/biobert-base-cased-v1.1 \
    --output_dir output/${ENTITY} \
    --max_seq_length 128 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 32 \
    --save_steps 1000 \
    --seed 1 \
    --do_train \
    --do_eval \
    --do_predict \
    --overwrite_output_dir

Please see each directory for different examples. Currently, we provide

Most examples are modifed from examples in Hugging Face transformers.

Citation

@article{10.1093/bioinformatics/btz682,
    author = {Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo},
    title = "{BioBERT: a pre-trained biomedical language representation model for biomedical text mining}",
    journal = {Bioinformatics},
    year = {2019},
    month = {09},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz682},
    url = {https://doi.org/10.1093/bioinformatics/btz682},
}

License and Disclaimer

Please see the LICENSE file for details. Downloading data indicates your acceptance of our disclaimer.

Contact

For help or issues using BioBERT-PyTorch, please create an issue.

biobert-pytorch's People

Contributors

dengchuanbio avatar jhyuklee avatar minstar avatar wonjininfo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.